MoE

6 articles tagged with this topic

MiniMax-M2.7llama.cpp

MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon

MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.

Apr 123 min read

37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results

Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.

Apr 62 min read

Running Gemma 4 26B-A4B on vLLM: Community Troubleshooting Notes

Developers report mixed results deploying Gemma 4 26B-A4B on vLLM, with INT4 quants too slow on DGX Spark GB10.

Apr 61 min read

llama.cppQwen Coder

APEX Quantization vs K-Quants: Why MoE Coding Models Need Different Compression

APEX quantization targets MoE architecture coherence layers at Q8, outperforming generic K-quants for multi-file coding agents.

Apr 62 min read

Pure Triton MoE Kernel Beats Megablocks on Mixtral at Batch Sizes Under 512

A fused Triton kernel cuts MoE forward pass from 24+ launches to 5, beating Megablocks by 31% at batch size 128.

Apr 52 min read

Qwen3Alibaba Cloud

Qwen3.6-397B-A17B: First Open Model to Match Claude Sonnet in Real Use

Community testing finds Qwen3.6-397B-A17B matches Claude Sonnet reliability in real tasks, beating GLM-5.1 and Kimi-k2.5.

Apr 42 min read