1 article tagged with this topic
Gemma 4's E2B and E4B models use per-layer embeddings, not MoE, enabling new inference performance tradeoffs.