local-inference

2 articles tagged with this topic

llama .cppGemma-4

llama.cpp Adds Audio Processing Support via Gemma-4 E2 A/E4A Models

ll ama-server now supports speech-to-text via Google's Gemma-4 E2A and E4A multimodal models.

Apr 123 min read

Gemma 4llama.cpp

Gemma 4 Local CUDA Setup: Precision Traps and Real Benchmarks

Running Gemma 4 locally on CUDA requires strict dtype matching at KV cache boundaries or output degenerates silently.

Apr 72 min read