RAG Demystified: Baseline vs. Advanced Retrieval Pipelines

What Happened

A developer on r/LocalLLaMA asked a precise question that many builders avoid admitting confusion about: what exactly counts as RAG versus enhanced retrieval? The poster correctly identified the canonical pipeline — embedding model retrieves chunks, optional reranker scores them, chunks are injected into the LLM prompt, LLM generates the answer. They wanted to know whether query rewriting, summarization, context compression, and web search tools also fall under the RAG label.

Why It Matters

The term RAG is overloaded in 2025. Vendors attach it to everything from simple vector search to multi-agent orchestration, which creates real confusion when scoping a project or evaluating a library. The practical baseline the community uses is:

Retrieve: embed the query, run similarity search against a vector store (FAISS, Chroma, Qdrant, Milvus)
Rerank: optional cross-encoder model (e.g., BGE-reranker, Cohere Rerank) to score top-k chunks
Inject: stuff ranked chunks into the LLM context window
Generate: LLM produces grounded answer

Everything else — HyDE, query rewriting, context compression with LLMLingua, iterative retrieval, agentic tool calls — is an enhancement on top of this baseline. Web search via tools like Tavily or Brave Search API is functionally similar but typically called "tool-augmented generation" or "agentic RAG" rather than classic RAG, because retrieval is dynamic rather than from a fixed index.

Asia-Pacific Angle

Chinese and Southeast Asian developers building RAG systems for local-language documents face an additional layer: most embedding models are English-optimized. For Chinese-language RAG, BGE-M3 (BAAI) and text2vec-large-chinese outperform OpenAI embeddings on Chinese corpora. Qwen2.5 as the generator handles Chinese context injection significantly better than English-first models. For multilingual Southeast Asian deployments (Thai, Vietnamese, Bahasa), mE5-large or multilingual-e5-large are tested baselines. Rerankers trained on English data degrade on non-Latin scripts — budget an evaluation pass on your target language before shipping.

Action Item This Week

Build a minimal RAG eval: take 20 questions from your actual use case, run retrieve-rerank-generate, and manually score answer correctness. Establish this baseline before adding any enhancements — query rewriting or compression only helps if you can measure the delta.

RAG Demystified: Baseline vs. Advanced Retrieval Pipelines

What Happened

Why It Matters

Asia-Pacific Angle

Action Item This Week

Related Reading

Full Head , Blank Page : How I Pulled 100 Content Ideas in One Session

That 'Free Tool ' in Your Browser May Be Stealing Client Passwords

Sent the Quote , Heard Nothing ? Here 's What Fixed It

Wrong Note App W rec ked My Client Files — I Learned the Hard Way

Your AI Account : Are You the Only One Using It?

Your Files , Your Server : Esc aping Big - Platform Lock -In