Apple-Silicon

3 articles tagged with this topic

DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max)

Open-source DFlash achiev es 4.13x speedup on Qwen3.5-9B using MLX on M5 Max with 89.4% token acceptance rate.

Apr 134 min read

Deploy Gemma 4 Locally on Mac with Public Remote Access

Full- stack guide: Ollama + OrbStack + frp + Nginx exposes local Gemma 4 inference to the public internet via HTTPS.

Apr 133 min read

MiniMax-M2.7llama.cpp

MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon

MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.

Apr 123 min read