MoE推理

找到 1 篇关于此标签的文章

Qwen 3.6 is the first local model that actually feels worth the effort for me

阿里巴巴 Qwen 3.6 35B-A3B 在双消费级 GPU 上以 Q8 量化运行，实现 170 tokens/秒与完整 260K 上下文，社区称其首次真正替代云端编程助手。