1 article tagged with this topic
Gemma 4's per-layer embeddings spark debate: Can knowledge and reasoning scale separately? If so, 2B models could hold 20B knowledge, redefining local