Bonsai

1 article tagged with this topic

GGML Adds Q1_0 1-Bit Quantization: Run 8B Models at 1.15GB

GGML now supports Q1_0 1-bit quantization, shrinking Bonsai 8B models to 1.15GB for CPU-only inference.