Local LLM
找到 13 篇关于此标签的文章
llama.cpp llama-bench Adds -fitc and -fitt Benchmark Flags
llama-bench gains -fitc and -fitt flags from build b4679, enabling finer control over benchmark timing output.
MiniMax-M2.7 Open-Source Release Delayed to This Weekend
MiniMax delays M2.7 open-source release due to infrastructure work, now targeting this weekend.
37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results
Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.
Qwen3.5 vs Gemma4 vs Cloud LLMs: Python Turtle Drawing Benchmark
A Reddit user benchmarks local and cloud LLMs on Python turtle graphics, revealing Gemma4 and Gemini share visual style.
Local AI Goes Mainstream When the Tooling Becomes Boring Infrastructure
A Reddit argument: local LLM adoption hinges on reliable tooling stacks, not benchmark gains, mirroring Docker's container revolution.
Real-Time Multimodal AI Runs Locally on M3 Pro with Gemma E2B
Developer runs Gemma 4 E2B locally on Apple M3 Pro for real-time audio/video input with voice output using the Parlor repo.
Dev Trains LLM on Pre-1900 Text to Rediscover Relativity
A developer trained a small LLM from scratch on pre-1900 texts; the model partially rediscovered quantum and relativity concepts.
Local Inference vs Distributed Training: Where the Real Gap Is
Indie devs run models locally, but training still requires datacenter scale. Can distributed training ever close that gap?
Gemma 4 27B vs Qwen 3.5 27B: SVG Generation Benchmark
Reddit users compare Gemma 4 31B and Qwen 3.5 27B Q4 quants on SVG creation, coding, and function calling tasks.
Run Claude Code Fully Offline Using Qwen3.5 27B and llama.cpp
A developer runs Claude Code CLI against a local llama.cpp server using Qwen3.5 27B, achieving 9+ t/s on Strix Halo hardware.
OpenClaw Runs Local AI Agents on MacBook Air 16GB via TurboQuant
OpenClaw uses llama.cpp TurboQuant cache compression to run agentic AI models on 16GB MacBook Air at 10-15 tokens/sec.
TurboQuant and Vector Quantization: A Beginner's Breakdown
A Reddit user unpacks Google's TurboQuant blog from first principles, making LLM quantization accessible without heavy prerequisites.
How to Enable Reasoning Mode in Gemma 3 via LM Studio
A Reddit user found the correct tokens to activate Gemma's chain-of-thought reasoning in LM Studio using /think in the system prompt.