1 article tagged with this topic
GGML now supports Q1_0 1-bit quantization, shrinking Bonsai 8B models to 1.15GB for CPU-only inference.