Gemma 4
16 articles tagged with this topic
Gemma 4 audio with MLX
Google's Gemma 4 E2B model can transcribe audio locally on macOS using MLX and a single uv run command.
Fixing Gemma 4 Tool Calls in llama.cpp: Root Causes Explained
Four bugs in llama.cpp's Gemma 4 chat template handling caused tool call results to crash or loop.
Controlling Gemma 4 Thinking Tokens via System Prompts
Users struggle to reliably toggle Gemma 4's reasoning mode via system prompts, unlike Qwen-30B-A3B.
Gemma 4 Has Hidden MTP Heads Disabled by Google at Launch
A developer found multi-token prediction weights inside Gemma 4's LiteRT files; Google confirmed MTP exists but was intentionally disabled.
Gemma 4 31B Ranks Top-3 in Five European Languages on EuroEval
Gemma 4 31B scores 1st in Finnish, 2nd in Danish/French/Italian on EuroEval multilingual leaderboard.
Gemma 4 Local CUDA Setup: Precision Traps and Real Benchmarks
Running Gemma 4 locally on CUDA requires strict dtype matching at KV cache boundaries or output degenerates silently.
Inside Google DeepMind's Gemma 4 Launch: What It Actually Took
A Reddit thread breaks down the engineering and logistics behind launching Gemma 4, Google DeepMind's open model.
Running Gemma 4 26B-A4B on vLLM: Community Troubleshooting Notes
Developers report mixed results deploying Gemma 4 26B-A4B on vLLM, with INT4 quants too slow on DGX Spark GB10.
Run a Private AI Phone Agent On-Device with Gemma 4 and PokeClaw
PokeClaw runs Gemma 4 locally on Android to control any app—no cloud, no data leakage, no subscription.
Gemma 4 26B: Q8 mmproj Unlocks 60K+ Context With Vision
Switching from F16 to Q8_0 mmproj on Gemma 4 26B adds ~30K context tokens with no vision quality loss.
Gemma 4 31B Matches Gemini 2.5 Pro on Local Hardware Benchmarks
Community benchmarks show Gemma 4 31B achieving Gemini 2.5 Pro-level scores when run locally via llama.cpp harness.
Per-Layer Embeddings: How Gemma 4's Small Models Work
Gemma 4's E2B and E4B models use per-layer embeddings, not MoE, enabling new inference performance tradeoffs.
TurboQuant KV Cache Quantization Beats Baselines on Gemma 4 and Qwen
Community benchmarks show TurboQuant KV quantization achieves near-zero accuracy loss at 3.1 bits on Gemma 4 with 34% long-context speedup.
Gemma 4 27B vs Qwen 3.5 27B: SVG Generation Benchmark
Reddit users compare Gemma 4 31B and Qwen 3.5 27B Q4 quants on SVG creation, coding, and function calling tasks.
NYT Connections Benchmark: MiniMax-M1 Leads Local LLMs at 34.4
Community benchmark ranks MiniMax-M1 at 34.4, Gemma 4 31B at 30.1, Arcee Trinity Large Thinking at 29.5 on NYT Connections puzzles.
Gemma 4 llama.cpp Issues Resolved With Recent Fixes
Google Gemma 4 models now run correctly in llama.cpp after critical fixes for output quality and crashes