AI engineering circles have a consensus: 90% of RAG (Retrieval-Augmented Generation: the tech where LLMs query enterprise data before answering) projects deliver poor results, and the bottleneck is fundamentally not the LLM, but the retrieval layer failing to feed it the correct data.

What This Is

We notice that when most developers encounter an AI knowledge base giving irrelevant answers, their first reaction is to swap for a larger model or tweak the prompt. This completely misses the root cause. In a typical RAG architecture, the component actually failing is the retrieval phase.

First, vector similarity (a metric calculating semantic closeness of text) does not equal relevance. When a user asks "how to handle API timeouts," the system recalls numerous passages about "what is a timeout"—semantically similar but completely useless for solving the problem. Second, stuffing the LLM with a massive amount of loosely related content is not providing context; it is forcing the model to guess answers from noise, which is not only costly but also increases latency. Finally, the vast majority of projects chunk documents by a fixed character count, entirely ignoring paragraph logic and resulting in broken context.

The industry has already figured out solutions: using Hybrid Search (combining semantic and BM25 keyword matching) to solve the issue of precise nouns not being found; using Parent-Child Chunking (retrieving with small text chunks, and upon a hit, returning the parent large text chunk to supplement context) to solve information fragmentation; and using HyDE (Hypothetical Document Embedding: having the AI generate a hypothetical answer first, then using that to search for real documents) to handle colloquial user queries.

Industry View

We believe the industry is shifting from "LLM worship" to "data engineering worship." Rather than obsessing over model parameters, it is better to spend effort solidifying the retrieval pipeline; this is the true leverage point for reducing costs and increasing efficiency.

However, there are obvious dissenting voices regarding this approach: complicating the retrieval layer will significantly drive up engineering costs. Introducing hybrid search, reranking, and query rewriting means increased system latency, with each step potentially requiring extra model calls, making maintenance far harder than anticipated. Some architects point out that for clearly structured internal documents, a simple keyword search plus a bit of fine-tuning might perform no worse than fancy retrieval strategies; over-engineering can easily become a new trap that drags down project progress.

Impact on Regular People

For enterprise IT: Stop fixating solely on procuring the most expensive LLMs. Whether AI deployment succeeds or not is 80% determined by how well your data cleaning and retrieval pipelines are built.

For individual careers: The dividend period of merely knowing LLM prompting is fading; people who can organize messy enterprise data into "AI-friendly" structures will become more valuable.

For the consumer market: Enterprise knowledge base products will shift from "sounds smart but constantly fails" to "rigid but accurate," and B2B AI software that can actually help users solve specific problems will increase.