The Transformer architecture proposed by Google in 2017 still reigns over all mainstream large language models (LLMs), from GPT to DeepSeek—stagnation in foundational infrastructure innovation is pushing competition to the application layer.
What this is
Transformer is an underlying architecture for processing text. Before it, AI read long texts using RNNs (an older algorithm that reads word-by-word), which was slow and prone to forgetting the beginning by the end. Transformer abandons sequential reading, instead adopting a "self-attention mechanism" (allowing AI to simultaneously compute the relevance between all words), achieving parallel computation. It typically consists of an encoder (responsible for understanding) and a decoder (responsible for generation): BERT uses only the encoder for reading comprehension, while GPT uses only the decoder for text generation. Because positional information is crucial for text ("I hit you" is different from "you hit me"), but the self-attention mechanism itself has no concept of sequence, Transformer must add "positional encoding" to tell the AI the order of words.
Industry view
We note that while Transformer has become the industry standard, its bottlenecks can no longer be ignored. The computational load of its core attention mechanism increases quadratically with text length, explaining why compute costs skyrocket when LLMs process ultra-long contexts. Academia is already reflecting on this: Meta's Chief AI Scientist Yann LeCun and others have repeatedly pointed out that Transformer is not the endgame for intelligence, as it is essentially performing large-scale pattern matching. Furthermore, research teams are exploring alternatives like Mamba (a new state space architecture), attempting to bypass the fundamental flaw of exorbitant compute costs for long texts in Transformers. We judge that the dividend period of this underlying architecture is nearing its end, and the next reshuffle will inevitably come from architectural breakthroughs.
Impact on regular people
For enterprise IT: Understanding Transformer's compute pain points explains why long-document processing costs remain stubbornly high when deploying private LLMs. During vendor selection, one should not blindly chase ultra-long context windows.
For the workplace: Knowing "self-attention" exists explains why key information should be concentrated when writing AI prompts—because the AI looks at the relationships between all words simultaneously, rather than deliberating word-by-word like a human.
For the consumer market: The unification of underlying architectures means model capabilities are trending towards homogenization. Consumers will ultimately pay for product experience and data moats, not the models themselves.