Gemma 4

16 articles tagged with this topic

Gemma 4MLX

Gemma 4 audio with MLX

Google's Gemma 4 E2B model can transcribe audio locally on macOS using MLX and a single uv run command.

Apr 133 min read

Gemma 4llama.cpp

Fixing Gemma 4 Tool Calls in llama.cpp: Root Causes Explained

Four bugs in llama.cpp's Gemma 4 chat template handling caused tool call results to crash or loop.

Apr 83 min read

Gemma 4Qwen3

Controlling Gemma 4 Thinking Tokens via System Prompts

Users struggle to reliably toggle Gemma 4's reasoning mode via system prompts, unlike Qwen-30B-A3B.

Apr 83 min read

Gemma 4LiteRT

Gemma 4 Has Hidden MTP Heads Disabled by Google at Launch

A developer found multi-token prediction weights inside Gemma 4's LiteRT files; Google confirmed MTP exists but was intentionally disabled.

Apr 74 min read

Gemma 4Google DeepMind

Gemma 4 31B Ranks Top-3 in Five European Languages on EuroEval

Gemma 4 31B scores 1st in Finnish, 2nd in Danish/French/Italian on EuroEval multilingual leaderboard.

Apr 74 min read

Gemma 4llama.cpp

Gemma 4 Local CUDA Setup: Precision Traps and Real Benchmarks

Running Gemma 4 locally on CUDA requires strict dtype matching at KV cache boundaries or output degenerates silently.

Apr 72 min read

Gemma 4Google DeepMind

Inside Google DeepMind's Gemma 4 Launch: What It Actually Took

A Reddit thread breaks down the engineering and logistics behind launching Gemma 4, Google DeepMind's open model.

Apr 61 min read

Gemma 4vLLM

Running Gemma 4 26B-A4B on vLLM: Community Troubleshooting Notes

Developers report mixed results deploying Gemma 4 26B-A4B on vLLM, with INT4 quants too slow on DGX Spark GB10.

Apr 61 min read

Gemma 4LiteRT

Run a Private AI Phone Agent On-Device with Gemma 4 and PokeClaw

PokeClaw runs Gemma 4 locally on Android to control any app—no cloud, no data leakage, no subscription.

Apr 62 min read

llama.cppGemma 4

Gemma 4 26B: Q8 mmproj Unlocks 60K+ Context With Vision

Switching from F16 to Q8_0 mmproj on Gemma 4 26B adds ~30K context tokens with no vision quality loss.

Apr 62 min read

Gemma 4Google

Gemma 4 31B Matches Gemini 2.5 Pro on Local Hardware Benchmarks

Community benchmarks show Gemma 4 31B achieving Gemini 2.5 Pro-level scores when run locally via llama.cpp harness.

Apr 61 min read

Gemma 4Per-Layer Embeddings

Per-Layer Embeddings: How Gemma 4's Small Models Work

Gemma 4's E2B and E4B models use per-layer embeddings, not MoE, enabling new inference performance tradeoffs.

Apr 52 min read

llama.cppTurboQuant

TurboQuant KV Cache Quantization Beats Baselines on Gemma 4 and Qwen

Community benchmarks show TurboQuant KV quantization achieves near-zero accuracy loss at 3.1 bits on Gemma 4 with 34% long-context speedup.

Apr 52 min read

Gemma 4Qwen3.5

Gemma 4 27B vs Qwen 3.5 27B: SVG Generation Benchmark

Reddit users compare Gemma 4 31B and Qwen 3.5 27B Q4 quants on SVG creation, coding, and function calling tasks.

Apr 52 min read

MiniMax-M1Gemma 4

NYT Connections Benchmark: MiniMax-M1 Leads Local LLMs at 34.4

Community benchmark ranks MiniMax-M1 at 34.4, Gemma 4 31B at 30.1, Arcee Trinity Large Thinking at 29.5 on NYT Connections puzzles.

Apr 51 min read

llama.cppGemma 4

Gemma 4 llama.cpp Issues Resolved With Recent Fixes

Google Gemma 4 models now run correctly in llama.cpp after critical fixes for output quality and crashes

Apr 41 min read