LLM Benchmarks

找到 3 篇关于此标签的文章

MiniMax-M1Gemma 4

NYT Connections Benchmark: MiniMax-M1 Leads Local LLMs at 34.4

Community benchmark ranks MiniMax-M1 at 34.4, Gemma 4 31B at 30.1, Arcee Trinity Large Thinking at 29.5 on NYT Connections puzzles.

DeepSeek-R1Qwen2.5-Math

Mathematical Methods and Human Thought in the Age of AI

arxiv paper examines how AI reshapes mathematical reasoning and what it means for human cognitive processes.

Why Programmers Should Ignore the AI Replacement Anxiety Hype

AI lowers the floor for output but not the ceiling for quality. Here is why effort still compounds.