返回首页
Local Inference
找到 4 篇关于此标签的文章
MinimaxLocalLLaMA
Minimax 2.7 Update Anticipated by Local LLM Community
Reddit's LocalLLaMA community signals anticipation for Minimax 2.7, but details remain sparse.
Apr 61 分钟
llama.cppGLM-4.7
Best Local LLM for Agentic Coding on a Single RTX 4090
A 4090 owner benchmarks GLM-4.7, Nemotron-30B, and Qwen3-Coder for local agentic coding via llama.cpp.
Apr 61 分钟
HunyuanOCRGGUF
HunyuanOCR 1B Runs at 90 t/s on GTX 1060 via GGUF
Tencent's HunyuanOCR 1B model runs at 90 tokens/sec on a GTX 1060 via GGUF, enabling local OCR on budget hardware.
Apr 61 分钟
Gemma 4Per-Layer Embeddings
Per-Layer Embeddings: How Gemma 4's Small Models Work
Gemma 4's E2B and E4B models use per-layer embeddings, not MoE, enabling new inference performance tradeoffs.
Apr 52 分钟