Code Generation

4 articles tagged with this topic

Xiaomi MiMo Wastes 6x Compute on Junk Code; LLMs Shift to Delivery Efficiency

Xiaomi MiMo burned 6x compute for junk code while DeepSeek excelled. Benchmarks no longer reflect true dev capability; focus on delivery and costs.

May 62 min read

MistralDevstral

Devstral Small 2 Breaks 80% Code Benchmark — Mistral May Be Seriously Underrated

Developer's custom benchmark: Mistral's Devstral Small 2 scores 80%+ on 8 code tasks—first local model to beat multiple closed-source rivals.

Apr 302 min read

GLMOpen-Source Models

GLM 5.1 Dominates Open-Source Code Arena: China's Programming AI Inflection Point

Zhipu's GLM 5.1 topping open-source code rankings signals that low-cost programming AI is within reach, with software outsourcing and IT service prici

Apr 102 min read

Qwen3.5Gemma4

Qwen3.5 vs Gemma4 vs Cloud LLMs: Python Turtle Drawing Benchmark

A Reddit user benchmarks local and cloud LLMs on Python turtle graphics, revealing Gemma4 and Gemini share visual style.

Apr 62 min read