找到 1 篇关于此标签的文章
GGML now supports Q1_0 1-bit quantization, shrinking Bonsai 8B models to 1.15GB for CPU-only inference.