IBM Open-Sources Granite 4.1: 21 Quantized Versions Prove Bottleneck Isn't Size

This week, IBM open-sourced the Granite 4.1 series, offering three sizes from 3B to 30B under the Apache 2.0 license—but a pelican experiment shows us that small models' problems lie in base capability, not compression precision.

What this is

IBM released the Granite 4.1 series of large language models, available in three parameter sizes: 3B, 8B, and 30B, all under the Apache 2.0 open-source license. Apache 2.0 means any enterprise can freely use, modify, and commercialize them without additional restrictions.

Community developer Unsloth subsequently created 21 GGUF quantized versions of the 3B model (GGUF is a compression format that allows large models to run on ordinary computers), with file sizes ranging from 1.2GB to 6.34GB.

Developer Simon Willison did one thing: he had all 21 versions generate SVG vector graphics of "a pelican riding a bicycle." His expectation was that versions retaining more information would draw better. The result: from 1.2GB to 6.34GB, the image quality was all equally terrible, with no discernible pattern of quality difference.

Industry view

We believe the value of this experiment lies not in the pelican, but in the judgment it reveals: when a model's base capability is insufficient, the impact of quantization precision (i.e., how much original information is retained after compression) on output quality is nearly zero. In other words, giving someone who can't draw a thicker pencil still results in an equally bad drawing.

We see Apache 2.0 open-sourcing as a tangible boon for small and medium-sized enterprises. Most open-source large models use restrictive licenses, requiring payment or adherence to additional terms for commercial use; Granite 4.1 has none of these barriers.

But we note a valid opposing voice: this experiment only tested the single task of SVG generation, and the 3B model itself was not designed for complex creative generation. In tasks more suited to small models' capabilities, like text classification and summarization, the impact of quantization precision could be entirely different. Using the pelican test to dismiss quantization strategies carries the risk of over-extrapolation.

Impact on regular people

For enterprise IT: The Apache 2.0 license means companies can confidently embed it into internal products, and the 3B size is suitable for server or even edge device deployment without needing GPU clusters.

For individual careers: It is increasingly viable to run small models locally for simple text tasks, but creative and generative output still requires the support of large models; we shouldn't hold misplaced expectations for 3B.

For the consumer market: The open-source small model track is already crowded (Llama, Qwen, Mistral). IBM's entry adds choices but differentiation has yet to emerge; we believe users should focus more on specific task performance than parameter counts.

IBM Open-Sources Granite 4.1: 21 Quantized Versions Prove Bottleneck Isn't Size

What this is

Industry view

Impact on regular people

Related Reading

LangChain Dismantles Omnipotent AI: Multi-Agent Becomes Pragmatic Enterprise Choice

AI Stuck in Chatbox? 3 Weekend Moves Peers Made

Customers hang up at 2s? OpenAI cuts voice AI latency to <1s

Ditch Manual Searching: AI Agent Skills Do It For You

1500 Bytes for Llama 2 Inference: Framework Bloat is a Choice, Not Inevitable

GPU Agent Utilization at 30-40%: Purpose-Built Inference Chip Window Opens