GRPO

找到 1 篇关于此标签的文章

From GRPO to BCR: The Battle to Cut LLM Reasoning Costs

New training methods like Sample Routing and BCR target wasteful chain-of-thought token usage, cutting inference costs significantly.