Local LLM

找到 13 篇关于此标签的文章

llama.cppllama-bench

llama.cpp llama-bench Adds -fitc and -fitt Benchmark Flags

llama-bench gains -fitc and -fitt flags from build b4679, enabling finer control over benchmark timing output.

Apr 61 分钟

MiniMaxMiniMax-M2.7

MiniMax-M2.7 Open-Source Release Delayed to This Weekend

MiniMax delays M2.7 open-source release due to infrastructure work, now targeting this weekend.

Apr 61 分钟

llama.cppQwen

37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results

Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.

Apr 62 分钟

Qwen3.5Gemma4

Qwen3.5 vs Gemma4 vs Cloud LLMs: Python Turtle Drawing Benchmark

A Reddit user benchmarks local and cloud LLMs on Python turtle graphics, revealing Gemma4 and Gemini share visual style.

Apr 62 分钟

llama.cppOllama

Local AI Goes Mainstream When the Tooling Becomes Boring Infrastructure

A Reddit argument: local LLM adoption hinges on reliable tooling stacks, not benchmark gains, mirroring Docker's container revolution.

Apr 62 分钟

Gemmallama.cpp

Real-Time Multimodal AI Runs Locally on M3 Pro with Gemma E2B

Developer runs Gemma 4 E2B locally on Apple M3 Pro for real-time audio/video input with voice output using the Parlor repo.

Apr 51 分钟

Local LLMTraining from Scratch

Dev Trains LLM on Pre-1900 Text to Rediscover Relativity

A developer trained a small LLM from scratch on pre-1900 texts; the model partially rediscovered quantum and relativity concepts.

Apr 52 分钟

llama.cppDistributed Training

Local Inference vs Distributed Training: Where the Real Gap Is

Indie devs run models locally, but training still requires datacenter scale. Can distributed training ever close that gap?

Apr 52 分钟

Gemma 4Qwen3.5

Gemma 4 27B vs Qwen 3.5 27B: SVG Generation Benchmark

Reddit users compare Gemma 4 31B and Qwen 3.5 27B Q4 quants on SVG creation, coding, and function calling tasks.

Apr 52 分钟

llama.cppQwen3.5

Run Claude Code Fully Offline Using Qwen3.5 27B and llama.cpp

A developer runs Claude Code CLI against a local llama.cpp server using Qwen3.5 27B, achieving 9+ t/s on Strix Halo hardware.

Apr 52 分钟

llama.cppQWEN

OpenClaw Runs Local AI Agents on MacBook Air 16GB via TurboQuant

OpenClaw uses llama.cpp TurboQuant cache compression to run agentic AI models on 16GB MacBook Air at 10-15 tokens/sec.

Apr 52 分钟

TurboQuantVector Quantization

TurboQuant and Vector Quantization: A Beginner's Breakdown

A Reddit user unpacks Google's TurboQuant blog from first principles, making LLM quantization accessible without heavy prerequisites.

Apr 51 分钟

LM StudioGemma 3

How to Enable Reasoning Mode in Gemma 3 via LM Studio

A Reddit user found the correct tokens to activate Gemma's chain-of-thought reasoning in LM Studio using /think in the system prompt.

Apr 42 分钟