Back to home

Local LLM

15 articles tagged with this topic

AMDLM Studio

一个 Reddit 帖子揭示的真相:本地跑 AI 大模型,硬件门槛比厂商说的要高得多

A user's 24GB AMD mini PC could only allocate 8GB VRAM to AI. The fix isn 't simple—and that gap exposes a wider industry problem .

Apr 203 min read
Local LLMCompute Cost

llama.cpp Tensor Parallelism Breakthrough: Local AI Compute Barrier Drops Another Level

Multi-GPU local inference enables enterprises to run LLMs without cloud dependency. Private deployment compute costs and technical barriers decline si

Apr 92 min read
llama.cppllama-bench

llama.cpp llama-bench Adds -fitc and -fitt Benchmark Flags

llama-bench gains -fitc and -fitt flags from build b4679, enabling finer control over benchmark timing output.

Apr 61 min read
MiniMaxMiniMax-M2.7

MiniMax-M2.7 Open-Source Release Delayed to This Weekend

MiniMax delays M2.7 open-source release due to infrastructure work, now targeting this weekend.

Apr 61 min read
llama.cppQwen

37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results

Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.

Apr 62 min read
Qwen3.5Gemma4

Qwen3.5 vs Gemma4 vs Cloud LLMs: Python Turtle Drawing Benchmark

A Reddit user benchmarks local and cloud LLMs on Python turtle graphics, revealing Gemma4 and Gemini share visual style.

Apr 62 min read
llama.cppOllama

Local AI Goes Mainstream When the Tooling Becomes Boring Infrastructure

A Reddit argument: local LLM adoption hinges on reliable tooling stacks, not benchmark gains, mirroring Docker's container revolution.

Apr 62 min read
Gemmallama.cpp

Real-Time Multimodal AI Runs Locally on M3 Pro with Gemma E2B

Developer runs Gemma 4 E2B locally on Apple M3 Pro for real-time audio/video input with voice output using the Parlor repo.

Apr 51 min read
Local LLMTraining from Scratch

Dev Trains LLM on Pre-1900 Text to Rediscover Relativity

A developer trained a small LLM from scratch on pre-1900 texts; the model partially rediscovered quantum and relativity concepts.

Apr 52 min read
llama.cppDistributed Training

Local Inference vs Distributed Training: Where the Real Gap Is

Indie devs run models locally, but training still requires datacenter scale. Can distributed training ever close that gap?

Apr 52 min read
Gemma 4Qwen3.5

Gemma 4 27B vs Qwen 3.5 27B: SVG Generation Benchmark

Reddit users compare Gemma 4 31B and Qwen 3.5 27B Q4 quants on SVG creation, coding, and function calling tasks.

Apr 52 min read
llama.cppQwen3.5

Run Claude Code Fully Offline Using Qwen3.5 27B and llama.cpp

A developer runs Claude Code CLI against a local llama.cpp server using Qwen3.5 27B, achieving 9+ t/s on Strix Halo hardware.

Apr 52 min read
llama.cppQWEN

OpenClaw Runs Local AI Agents on MacBook Air 16GB via TurboQuant

OpenClaw uses llama.cpp TurboQuant cache compression to run agentic AI models on 16GB MacBook Air at 10-15 tokens/sec.

Apr 52 min read
TurboQuantVector Quantization

TurboQuant and Vector Quantization: A Beginner's Breakdown

A Reddit user unpacks Google's TurboQuant blog from first principles, making LLM quantization accessible without heavy prerequisites.

Apr 51 min read
LM StudioGemma 3

How to Enable Reasoning Mode in Gemma 3 via LM Studio

A Reddit user found the correct tokens to activate Gemma's chain-of-thought reasoning in LM Studio using /think in the system prompt.

Apr 42 min read