Back to home
MoE
6 articles tagged with this topic
MiniMax-M2.7llama.cpp
MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon
MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.
Apr 123 min read
llama.cppQwen
37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results
Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.
Apr 62 min read
Gemma 4vLLM
Running Gemma 4 26B-A4B on vLLM: Community Troubleshooting Notes
Developers report mixed results deploying Gemma 4 26B-A4B on vLLM, with INT4 quants too slow on DGX Spark GB10.
Apr 61 min read
llama.cppQwen Coder
APEX Quantization vs K-Quants: Why MoE Coding Models Need Different Compression
APEX quantization targets MoE architecture coherence layers at Q8, outperforming generic K-quants for multi-file coding agents.
Apr 62 min read
TritonMoE
Pure Triton MoE Kernel Beats Megablocks on Mixtral at Batch Sizes Under 512
A fused Triton kernel cuts MoE forward pass from 24+ launches to 5, beating Megablocks by 31% at batch size 128.
Apr 52 min read
Qwen3Alibaba Cloud
Qwen3.6-397B-A17B: First Open Model to Match Claude Sonnet in Real Use
Community testing finds Qwen3.6-397B-A17B matches Claude Sonnet reliability in real tasks, beating GLM-5.1 and Kimi-k2.5.
Apr 42 min read