RAG

12 articles tagged with this topic

Volcengine Launches Milvus Serverless With 3-Second Instance Creation

ByteDance's Volcengine ships Milvus Serverless with sub-3 -second provisioning, scale-to-zero billing for AI Agent workloads.

A Chinese SaaS firm cut monthly AI infra costs from ¥80,000 to under ¥2,000 by ditching 4x A100s for DeepSeek API.

How to fix write blocking, query latency spikes, and OOM errors when scaling Chroma from prototype to production.

Perplexity AI has published several embedding models under the MIT license, enabling free commercial use in local deployments.

A technical breakdown of four-layer LLM agent architecture with Python code using LangChain and LangGraph.

How to structure LangChain LCEL chains for linear, routing, RAG, and agent workflows in production.

Practical guide to LangChain embeddings using OpenAI, HuggingFace, and local models for semantic search and RAG pipelines.

Community clarifies RAG's true baseline: retrieve, rerank, inject chunks, generate — extras are enhancements.

A LocalLLaMA thread debates whether RAG is a precise term or marketing hype for any retrieval-based LLM system.

Replace string-concatenated prompts with LangChain's ChatPromptTemplate and PipelinePromptTemplate for scalable enterprise knowledge bases.

How to load PDFs, Word, HTML, and Markdown files in LangChain and split them for RAG applications.

Why LangChain's Runnable protocol transforms scattered model calls into composable, maintainable AI workflows.