Back to home
Local Inference
4 articles tagged with this topic
MinimaxLocalLLaMA
Minimax 2.7 Update Anticipated by Local LLM Community
Reddit's LocalLLaMA community signals anticipation for Minimax 2.7, but details remain sparse.
Apr 61 min read
llama.cppGLM-4.7
Best Local LLM for Agentic Coding on a Single RTX 4090
A 4090 owner benchmarks GLM-4.7, Nemotron-30B, and Qwen3-Coder for local agentic coding via llama.cpp.
Apr 61 min read
HunyuanOCRGGUF
HunyuanOCR 1B Runs at 90 t/s on GTX 1060 via GGUF
Tencent's HunyuanOCR 1B model runs at 90 tokens/sec on a GTX 1060 via GGUF, enabling local OCR on budget hardware.
Apr 61 min read
Gemma 4Per-Layer Embeddings
Per-Layer Embeddings: How Gemma 4's Small Models Work
Gemma 4's E2B and E4B models use per-layer embeddings, not MoE, enabling new inference performance tradeoffs.
Apr 52 min read