Back to home
Code Generation
4 articles tagged with this topic
XiaomiMiMo
Xiaomi MiMo Wastes 6x Compute on Junk Code; LLMs Shift to Delivery Efficiency
Xiaomi MiMo burned 6x compute for junk code while DeepSeek excelled. Benchmarks no longer reflect true dev capability; focus on delivery and costs.
May 62 min read
MistralDevstral
Devstral Small 2 Breaks 80% Code Benchmark — Mistral May Be Seriously Underrated
Developer's custom benchmark: Mistral's Devstral Small 2 scores 80%+ on 8 code tasks—first local model to beat multiple closed-source rivals.
Apr 302 min read
GLMOpen-Source Models
GLM 5.1 Dominates Open-Source Code Arena: China's Programming AI Inflection Point
Zhipu's GLM 5.1 topping open-source code rankings signals that low-cost programming AI is within reach, with software outsourcing and IT service prici
Apr 102 min read
Qwen3.5Gemma4
Qwen3.5 vs Gemma4 vs Cloud LLMs: Python Turtle Drawing Benchmark
A Reddit user benchmarks local and cloud LLMs on Python turtle graphics, revealing Gemma4 and Gemini share visual style.
Apr 62 min read