What Happened
A developer on r/LocalLLaMA asked a precise question that many builders avoid admitting confusion about: what exactly counts as RAG versus enhanced retrieval? The poster correctly identified the canonical pipeline — embedding model retrieves chunks, optional reranker scores them, chunks are injected into the LLM prompt, LLM generates the answer. They wanted to know whether query rewriting, summarization, context compression, and web search tools also fall under the RAG label.
Why It Matters
The term RAG is overloaded in 2025. Vendors attach it to everything from simple vector search to multi-agent orchestration, which creates real confusion when scoping a project or evaluating a library. The practical baseline the community uses is:
- Retrieve: embed the query, run similarity search against a vector store (FAISS, Chroma, Qdrant, Milvus)
- Rerank: optional cross-encoder model (e.g., BGE-reranker, Cohere Rerank) to score top-k chunks
- Inject: stuff ranked chunks into the LLM context window
- Generate: LLM produces grounded answer
Everything else — HyDE, query rewriting, context compression with LLMLingua, iterative retrieval, agentic tool calls — is an enhancement on top of this baseline. Web search via tools like Tavily or Brave Search API is functionally similar but typically called "tool-augmented generation" or "agentic RAG" rather than classic RAG, because retrieval is dynamic rather than from a fixed index.
Asia-Pacific Angle
Chinese and Southeast Asian developers building RAG systems for local-language documents face an additional layer: most embedding models are English-optimized. For Chinese-language RAG, BGE-M3 (BAAI) and text2vec-large-chinese outperform OpenAI embeddings on Chinese corpora. Qwen2.5 as the generator handles Chinese context injection significantly better than English-first models. For multilingual Southeast Asian deployments (Thai, Vietnamese, Bahasa), mE5-large or multilingual-e5-large are tested baselines. Rerankers trained on English data degrade on non-Latin scripts — budget an evaluation pass on your target language before shipping.
Action Item This Week
Build a minimal RAG eval: take 20 questions from your actual use case, run retrieve-rerank-generate, and manually score answer correctness. Establish this baseline before adding any enhancements — query rewriting or compression only helps if you can measure the delta.