大型语言模型(LLM)功能强大,但它们有两个关键限制:
  • 有限的上下文 — 它们无法一次性摄取整个语料库。
  • 静态知识 — 它们的训练数据在某个时间点被冻结。
检索通过在查询时获取相关的外部知识来解决这些问题。这是**检索增强生成(RAG)**的基础:使用特定上下文信息增强 LLM 的答案。

构建知识库

知识库是在检索期间使用的文档或结构化数据的存储库。 如果您需要自定义知识库,可以使用 LangChain 的文档加载器和向量存储从您自己的数据构建一个。
如果您已经有知识库(例如,SQL 数据库、CRM 或内部文档系统),则不需要重建它。您可以:
  • 将其作为智能体 RAG 中的工具连接。
  • 查询它并将检索到的内容作为上下文提供给 LLM (两步 RAG)
请参阅以下教程构建可搜索的知识库和最小 RAG 工作流程:

教程:语义搜索

学习如何使用 LangChain 的文档加载器、嵌入和向量存储从您自己的数据创建可搜索的知识库。 在本教程中,您将在 PDF 上构建搜索引擎,从而能够检索与查询相关的段落。您还将在此引擎之上实现最小 RAG 工作流程,以了解如何将外部知识集成到 LLM 推理中。

从检索到 RAG

检索允许 LLM 在运行时访问相关上下文。但大多数实际应用程序更进一步:它们将检索与生成集成以产生有根据的、上下文感知的答案。 这是**检索增强生成(RAG)**背后的核心思想。检索管道成为结合搜索与生成的更广泛系统的基础。

检索管道

典型的检索工作流程如下所示: 每个组件都是模块化的:您可以交换加载器、分割器、嵌入或向量存储,而无需重写应用程序的逻辑。

构建模块

RAG Architectures

RAG can be implemented in multiple ways, depending on your system’s needs. We outline each type in the sections below.
ArchitectureDescriptionControlFlexibilityLatencyExample Use Case
2-Step RAGRetrieval always happens before generation. Simple and predictable✅ High❌ Low⚡ FastFAQs, documentation bots
Agentic RAGAn LLM-powered agent decides when and how to retrieve during reasoning❌ Low✅ High⏳ VariableResearch assistants with access to multiple tools
HybridCombines characteristics of both approaches with validation steps⚖️ Medium⚖️ Medium⏳ VariableDomain-specific Q&A with quality validation
Latency: Latency is generally more predictable in 2-Step RAG, as the maximum number of LLM calls is known and capped. This predictability assumes that LLM inference time is the dominant factor. However, real-world latency may also be affected by the performance of retrieval steps—such as API response times, network delays, or database queries—which can vary based on the tools and infrastructure in use.

2-step RAG

In 2-Step RAG, the retrieval step is always executed before the generation step. This architecture is straightforward and predictable, making it suitable for many applications where the retrieval of relevant documents is a clear prerequisite for generating an answer.

Tutorial: Retrieval-Augmented Generation (RAG)

See how to build a Q&A chatbot that can answer questions grounded in your data using Retrieval-Augmented Generation. This tutorial walks through two approaches:
  • A RAG agent that runs searches with a flexible tool—great for general-purpose use.
  • A 2-step RAG chain that requires just one LLM call per query—fast and efficient for simpler tasks.

智能体式 RAG

智能体式检索增强生成 (RAG) 结合了检索增强生成的优势和基于智能体的推理。不是在回答之前检索文档,而是由 LLM 驱动的智能体逐步推理,并决定在交互过程中何时以及如何检索信息。
The only thing an agent needs to enable RAG behavior is access to one or more tools that can fetch external knowledge — such as documentation loaders, web APIs, or database queries.
import requests
from langchain.tools import tool
from langchain.chat_models import init_chat_model
from langchain.agents import create_agent


@tool
def fetch_url(url: str) -> str:
    """Fetch text content from a URL"""
    response = requests.get(url, timeout=10.0)
    response.raise_for_status()
    return response.text

system_prompt = """\
Use fetch_url when you need to fetch information from a web-page; quote relevant snippets.
"""

agent = create_agent(
    model="claude-sonnet-4-5-20250929",
    tools=[fetch_url], # A tool for retrieval
    system_prompt=system_prompt,
)

Tutorial: Retrieval-Augmented Generation (RAG)

See how to build a Q&A chatbot that can answer questions grounded in your data using Retrieval-Augmented Generation. This tutorial walks through two approaches:
  • A RAG agent that runs searches with a flexible tool—great for general-purpose use.
  • A 2-step RAG chain that requires just one LLM call per query—fast and efficient for simpler tasks.

混合 RAG

混合 RAG 结合了 2 步 RAG 和智能体式 RAG 的特征。它引入了中间步骤,如查询预处理、检索验证和后生成检查。这些系统比固定管道提供更多灵活性,同时保持对执行的一些控制。 Typical components include:
  • Query enhancement: Modify the input question to improve retrieval quality. This can involve rewriting unclear queries, generating multiple variations, or expanding queries with additional context.
  • Retrieval validation: Evaluate whether retrieved documents are relevant and sufficient. If not, the system may refine the query and retrieve again.
  • Answer validation: Check the generated answer for accuracy, completeness, and alignment with source content. If needed, the system can regenerate or revise the answer.
The architecture often supports multiple iterations between these steps: This architecture is suitable for:
  • Applications with ambiguous or underspecified queries
  • Systems that require validation or quality control steps
  • Workflows involving multiple sources or iterative refinement

Tutorial: Agentic RAG with Self-Correction

An example of Hybrid RAG that combines agentic reasoning with retrieval and self-correction.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.