Back to home
Apple-Silicon
3 articles tagged with this topic
MLXQwen3.5
DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max)
Open-source DFlash achiev es 4.13x speedup on Qwen3.5-9B using MLX on M5 Max with 89.4% token acceptance rate.
Apr 134 min read
OllamaGemma4
Deploy Gemma 4 Locally on Mac with Public Remote Access
Full- stack guide: Ollama + OrbStack + frp + Nginx exposes local Gemma 4 inference to the public internet via HTTPS.
Apr 133 min read
MiniMax-M2.7llama.cpp
MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon
MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.
Apr 123 min read