中间件 - Docs by LangChain

中间件提供了一种更紧密控制智能体内部发生的事情的方法。核心智能体循环涉及调用模型，让它选择要执行的工具，然后在不再调用工具时完成：

中间件在这些步骤的前后公开钩子：

中间件可以做什么？

监控

通过日志记录、分析和调试跟踪智能体行为

修改

转换提示、工具选择和输出格式

控制

添加重试、回退和提前终止逻辑

强制执行

应用速率限制、护栏和 PII 检测

通过将中间件传递给 create_agent 来添加中间件：

from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware, HumanInTheLoopMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[SummarizationMiddleware(), HumanInTheLoopMiddleware()],
)

内置中间件

LangChain 为常见用例提供预构建的中间件：

摘要

在接近令牌限制时自动总结对话历史。

完美适用于：

超出上下文窗口的长时间运行的对话
具有广泛历史记录的多轮对话
保留完整对话上下文很重要的应用程序

from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[weather_tool, calculator_tool],
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",
            max_tokens_before_summary=4000,  # Trigger summarization at 4000 tokens
            messages_to_keep=20,  # Keep last 20 messages after summary
            summary_prompt="Custom prompt for summarization...",  # Optional
        ),
    ],
)

Configuration options

model

string

required

Model for generating summaries

max_tokens_before_summary

number

Token threshold for triggering summarization

messages_to_keep

number

default:"20"

Recent messages to preserve

token_counter

function

Custom token counting function. Defaults to character-based counting.

summary_prompt

string

Custom prompt template. Uses built-in template if not specified.

summary_prefix

string

default:"## Previous conversation summary:"

Prefix for summary messages

人在回路

在执行之前暂停智能体执行，以便人工批准、编辑或拒绝工具调用。

完美适用于：

需要人工批准的高风险操作（数据库写入、金融交易）
必须有人工监督的合规工作流程
使用人工反馈来指导智能体的长时间运行对话

from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import InMemorySaver


agent = create_agent(
    model="gpt-4o",
    tools=[read_email_tool, send_email_tool],
    checkpointer=InMemorySaver(),
    middleware=[
        HumanInTheLoopMiddleware(
            interrupt_on={
                # Require approval, editing, or rejection for sending emails
                "send_email_tool": {
                    "allowed_decisions": ["approve", "edit", "reject"],
                },
                # Auto-approve reading emails
                "read_email_tool": False,
            }
        ),
    ],
)

Configuration options

interrupt_on

dict

required

Mapping of tool names to approval configs. Values can be True (interrupt with default config), False (auto-approve), or an InterruptOnConfig object.

description_prefix

string

default:"Tool execution requires approval"

Prefix for action request descriptions

InterruptOnConfig options:

allowed_decisions

list[string]

List of allowed decisions: "approve", "edit", or "reject"

description

string | callable

Static string or callable function for custom description

重要： 人在回路中间件需要检查点器来在中断之间维护状态。有关完整示例和集成模式，请参阅人在回路文档。

Anthropic 提示缓存

通过缓存 Anthropic 模型的重复提示前缀来降低成本。

完美适用于：

具有长且重复的系统提示的应用程序
在调用之间重用相同上下文的智能体
减少高流量部署的 API 成本

了解有关 Anthropic 提示缓存策略和限制的更多信息。

from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import AnthropicPromptCachingMiddleware
from langchain.agents import create_agent


LONG_PROMPT = """
Please be a helpful assistant.

<Lots more context ...>
"""

agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    system_prompt=LONG_PROMPT,
    middleware=[AnthropicPromptCachingMiddleware(ttl="5m")],
)

# cache store
agent.invoke({"messages": [HumanMessage("Hi, my name is Bob")]})

# cache hit, system prompt is cached
agent.invoke({"messages": [HumanMessage("What's my name?")]})

Configuration options

type

string

default:"ephemeral"

Cache type. Only "ephemeral" is currently supported.

ttl

string

default:"5m"

Time to live for cached content. Valid values: "5m" or "1h"

min_messages_to_cache

number

default:"0"

Minimum number of messages before caching starts

unsupported_model_behavior

string

default:"warn"

Behavior when using non-Anthropic models. Options: "ignore", "warn", or "raise"

模型调用限制

限制模型调用次数以防止无限循环或过度成本。

完美适用于：

防止失控的智能体进行过多的 API 调用
在生产部署上强制执行成本控制
在特定调用预算内测试智能体行为

from langchain.agents import create_agent
from langchain.agents.middleware import ModelCallLimitMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        ModelCallLimitMiddleware(
            thread_limit=10,  # Max 10 calls per thread (across runs)
            run_limit=5,  # Max 5 calls per run (single invocation)
            exit_behavior="end",  # Or "error" to raise exception
        ),
    ],
)

Configuration options

thread_limit

number

Maximum model calls across all runs in a thread. Defaults to no limit.

run_limit

number

Maximum model calls per single invocation. Defaults to no limit.

exit_behavior

string

default:"end"

Behavior when limit is reached. Options: "end" (graceful termination) or "error" (raise exception)

工具调用限制

通过限制工具调用次数来控制智能体执行，可以全局限制所有工具或针对特定工具。

完美适用于：

防止对昂贵的外部 API 进行过多调用
限制网络搜索或数据库查询
对特定工具使用强制执行速率限制
防止失控的智能体循环

要全局限制所有工具或针对特定工具限制工具调用，请设置 tool_name。对于每个限制，指定以下一项或两项：

线程限制 (thread_limit) - 对话中所有运行的最大调用次数。在调用之间持续存在。需要检查点器。
运行限制 (run_limit) - 每次调用的最大调用次数。每轮重置。

Exit behaviors:

Behavior	Effect	Best For
`"continue"` (default)	Blocks exceeded calls with error messages, agent continues	Most use cases - agent handles limits gracefully
`"error"`	Raises exception immediately	Complex workflows where you want to handle the limit error manually
`"end"`	Stops with ToolMessage + AI message	Single-tool scenarios (errors if other tools pending)

from langchain.agents import create_agent
from langchain.agents.middleware import ToolCallLimitMiddleware

# Global limit: max 20 calls per thread, 10 per run
global_limiter = ToolCallLimitMiddleware(
    thread_limit=20,
    run_limit=10,
)

# Tool-specific limit with default "continue" behavior
search_limiter = ToolCallLimitMiddleware(
    tool_name="search",
    thread_limit=5,
    run_limit=3,
)

# Thread limit only (no per-run limit)
database_limiter = ToolCallLimitMiddleware(
    tool_name="query_database",
    thread_limit=10,
)

# Strict enforcement with "error" behavior
web_scraper_limiter = ToolCallLimitMiddleware(
    tool_name="scrape_webpage",
    run_limit=2,
    exit_behavior="error",
)

# Immediate termination with "end" behavior
critical_tool_limiter = ToolCallLimitMiddleware(
    tool_name="delete_records",
    run_limit=1,
    exit_behavior="end",
)

# Use multiple limiters together
agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, database_tool, scraper_tool],
    middleware=[
        global_limiter,
        search_limiter,
        database_limiter,
        web_scraper_limiter
    ],
)

Configuration options

tool_name

string

Name of specific tool to limit. If not provided, limits apply to all tools globally.

thread_limit

number

Maximum tool calls across all runs in a thread (conversation). Persists across multiple invocations with the same thread ID. Requires a checkpointer to maintain state. None means no thread limit.

run_limit

number

Maximum tool calls per single invocation (one user message → response cycle). Resets with each new user message. None means no run limit.Note: At least one of thread_limit or run_limit must be specified.

exit_behavior

string

default:"continue"

Behavior when limit is reached:

"continue" (default) - Block exceeded tool calls with error messages, let other tools and the model continue. The model decides when to end based on the error messages.
"error" - Raise a ToolCallLimitExceededError exception, stopping execution immediately
"end" - Stop execution immediately with a ToolMessage and AI message for the exceeded tool call. Only works when limiting a single tool; raises NotImplementedError if other tools have pending calls.

模型回退

当主模型失败时自动回退到替代模型。

完美适用于：

构建能够处理模型中断的弹性智能体
通过回退到更便宜的模型来优化成本
跨 OpenAI、Anthropic 等的提供商冗余

from langchain.agents import create_agent
from langchain.agents.middleware import ModelFallbackMiddleware


agent = create_agent(
    model="gpt-4o",  # Primary model
    tools=[...],
    middleware=[
        ModelFallbackMiddleware(
            "gpt-4o-mini",  # Try first on error
            "claude-3-5-sonnet-20241022",  # Then this
        ),
    ],
)

Configuration options

first_model

string | BaseChatModel

required

First fallback model to try when the primary model fails. Can be a model string (e.g., "openai:gpt-4o-mini") or a BaseChatModel instance.

*additional_models

string | BaseChatModel

Additional fallback models to try in order if previous models fail

PII 检测

检测和处理对话中的个人身份信息。

完美适用于：

具有合规要求的医疗保健和金融应用程序
需要清理日志的客户服务智能体
处理敏感用户数据的任何应用程序

from langchain.agents import create_agent
from langchain.agents.middleware import PIIMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        # Redact emails in user input
        PIIMiddleware("email", strategy="redact", apply_to_input=True),
        # Mask credit cards (show last 4 digits)
        PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
        # Custom PII type with regex
        PIIMiddleware(
            "api_key",
            detector=r"sk-[a-zA-Z0-9]{32}",
            strategy="block",  # Raise error if detected
        ),
    ],
)

Configuration options

pii_type

string

required

Type of PII to detect. Can be a built-in type (email, credit_card, ip, mac_address, url) or a custom type name.

strategy

string

default:"redact"

How to handle detected PII. Options:

"block" - Raise exception when detected
"redact" - Replace with [REDACTED_TYPE]
"mask" - Partially mask (e.g., ****-****-****-1234)
"hash" - Replace with deterministic hash

detector

function | regex

Custom detector function or regex pattern. If not provided, uses built-in detector for the PII type.

apply_to_input

boolean

default:"True"

Check user messages before model call

apply_to_output

boolean

default:"False"

Check AI messages after model call

apply_to_tool_results

boolean

default:"False"

Check tool result messages after execution

规划

为复杂的多步骤任务添加待办事项列表管理功能。

此中间件自动为智能体提供 write_todos 工具和系统提示，以指导有效的任务规划。

from langchain.agents import create_agent
from langchain.agents.middleware import TodoListMiddleware
from langchain.messages import HumanMessage


agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[TodoListMiddleware()],
)

result = agent.invoke({"messages": [HumanMessage("Help me refactor my codebase")]})
print(result["todos"])  # Array of todo items with status tracking

Configuration options

system_prompt

string

Custom system prompt for guiding todo usage. Uses built-in prompt if not specified.

tool_description

string

Custom description for the write_todos tool. Uses built-in description if not specified.

LLM 工具选择器

在调用主模型之前使用 LLM 智能选择相关工具。

完美适用于：

具有许多工具（10+）的智能体，其中大多数工具与每个查询无关
通过过滤不相关的工具来减少 token 使用
提高模型的专注度和准确性

from langchain.agents import create_agent
from langchain.agents.middleware import LLMToolSelectorMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[tool1, tool2, tool3, tool4, tool5, ...],  # Many tools
    middleware=[
        LLMToolSelectorMiddleware(
            model="gpt-4o-mini",  # Use cheaper model for selection
            max_tools=3,  # Limit to 3 most relevant tools
            always_include=["search"],  # Always include certain tools
        ),
    ],
)

Configuration options

model

string | BaseChatModel

Model for tool selection. Can be a model string or BaseChatModel instance. Defaults to the agent’s main model.

system_prompt

string

Instructions for the selection model. Uses built-in prompt if not specified.

max_tools

number

Maximum number of tools to select. Defaults to no limit.

always_include

list[string]

List of tool names to always include in the selection

工具重试

使用可配置的指数退避自动重试失败的工具调用。

完美适用于：

处理外部 API 调用中的瞬态故障
提高依赖网络的工具的可靠性
构建能够优雅处理临时错误的弹性智能体

from langchain.agents import create_agent
from langchain.agents.middleware import ToolRetryMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, database_tool],
    middleware=[
        ToolRetryMiddleware(
            max_retries=3,  # Retry up to 3 times
            backoff_factor=2.0,  # Exponential backoff multiplier
            initial_delay=1.0,  # Start with 1 second delay
            max_delay=60.0,  # Cap delays at 60 seconds
            jitter=True,  # Add random jitter to avoid thundering herd
        ),
    ],
)

Configuration options

max_retries

number

default:"2"

Maximum number of retry attempts after the initial call (3 total attempts with default)

tools

list[BaseTool | str]

Optional list of tools or tool names to apply retry logic to. If None, applies to all tools.

retry_on

tuple[type[Exception], ...] | callable

default:"(Exception,)"

Either a tuple of exception types to retry on, or a callable that takes an exception and returns True if it should be retried.

on_failure

string | callable

default:"return_message"

Behavior when all retries are exhausted. Options:

"return_message" - Return a ToolMessage with error details (allows LLM to handle failure)
"raise" - Re-raise the exception (stops agent execution)
Custom callable - Function that takes the exception and returns a string for the ToolMessage content

backoff_factor

number

default:"2.0"

Multiplier for exponential backoff. Each retry waits initial_delay * (backoff_factor ** retry_number) seconds. Set to 0.0 for constant delay.

initial_delay

number

default:"1.0"

Initial delay in seconds before first retry

max_delay

number

default:"60.0"

Maximum delay in seconds between retries (caps exponential backoff growth)

jitter

boolean

default:"true"

Whether to add random jitter (±25%) to delay to avoid thundering herd

LLM 工具模拟器

使用 LLM 模拟工具执行以用于测试目的，用 AI 生成的响应替换实际工具调用。

完美适用于：

在不执行真实工具的情况下测试智能体行为
在外部工具不可用或昂贵时开发智能体
在实现实际工具之前原型化智能体工作流程

from langchain.agents import create_agent
from langchain.agents.middleware import LLMToolEmulator


agent = create_agent(
    model="gpt-4o",
    tools=[get_weather, search_database, send_email],
    middleware=[
        # Emulate all tools by default
        LLMToolEmulator(),

        # Or emulate specific tools
        # LLMToolEmulator(tools=["get_weather", "search_database"]),

        # Or use a custom model for emulation
        # LLMToolEmulator(model="claude-sonnet-4-5-20250929"),
    ],
)

Configuration options

tools

list[str | BaseTool]

List of tool names (str) or BaseTool instances to emulate. If None (default), ALL tools will be emulated. If empty list, no tools will be emulated.

model

string | BaseChatModel

default:"anthropic:claude-3-5-sonnet-latest"

Model to use for generating emulated tool responses. Can be a model identifier string or BaseChatModel instance.

上下文编辑

通过修剪、总结或清除工具使用来管理对话上下文。

完美适用于：

需要定期清理上下文的长对话
从上下文中删除失败的工具尝试
自定义上下文管理策略

from langchain.agents import create_agent
from langchain.agents.middleware import ContextEditingMiddleware, ClearToolUsesEdit


agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        ContextEditingMiddleware(
            edits=[
                ClearToolUsesEdit(trigger=1000),  # Clear old tool uses
            ],
        ),
    ],
)

Configuration options

edits

list[ContextEdit]

default:"[ClearToolUsesEdit()]"

List of ContextEdit strategies to apply

token_count_method

string

default:"approximate"

Token counting method. Options: "approximate" or "model"

ClearToolUsesEdit options:

trigger

number

default:"100000"

Token count that triggers the edit

clear_at_least

number

default:"0"

Minimum tokens to reclaim

keep

number

default:"3"

Number of recent tool results to preserve

clear_tool_inputs

boolean

default:"False"

Whether to clear tool call parameters

exclude_tools

list[string]

default:"()"

List of tool names to exclude from clearing

placeholder

string

default:"[cleared]"

Placeholder text for cleared outputs

Custom middleware

Build custom middleware by implementing hooks that run at specific points in the agent execution flow. You can create middleware in two ways:

Decorator-based - Quick and simple for single-hook middleware
Class-based - More powerful for complex middleware with multiple hooks

Decorator-based middleware

For simple middleware that only needs a single hook, decorators provide the quickest way to add functionality:

from langchain.agents.middleware import before_model, after_model, wrap_model_call
from langchain.agents.middleware import AgentState, ModelRequest, ModelResponse, dynamic_prompt
from langchain.messages import AIMessage
from langchain.agents import create_agent
from langgraph.runtime import Runtime
from typing import Any, Callable


# Node-style: logging before model calls
@before_model
def log_before_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    print(f"About to call model with {len(state['messages'])} messages")
    return None

# Node-style: validation after model calls
@after_model(can_jump_to=["end"])
def validate_output(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    last_message = state["messages"][-1]
    if "BLOCKED" in last_message.content:
        return {
            "messages": [AIMessage("I cannot respond to that request.")],
            "jump_to": "end"
        }
    return None

# Wrap-style: retry logic
@wrap_model_call
def retry_model(
    request: ModelRequest,
    handler: Callable[[ModelRequest], ModelResponse],
) -> ModelResponse:
    for attempt in range(3):
        try:
            return handler(request)
        except Exception as e:
            if attempt == 2:
                raise
            print(f"Retry {attempt + 1}/3 after error: {e}")

# Wrap-style: dynamic prompts
@dynamic_prompt
def personalized_prompt(request: ModelRequest) -> str:
    user_id = request.runtime.context.get("user_id", "guest")
    return f"You are a helpful assistant for user {user_id}. Be concise and friendly."

# Use decorators in agent
agent = create_agent(
    model="gpt-4o",
    middleware=[log_before_model, validate_output, retry_model, personalized_prompt],
    tools=[...],
)

Available decorators

Node-style (run at specific execution points):

@before_agent - Before agent starts (once per invocation)
@before_model - Before each model call
@after_model - After each model response
@after_agent - After agent completes (once per invocation)

Wrap-style (intercept and control execution):

@wrap_model_call - Around each model call
@wrap_tool_call - Around each tool call

Convenience decorators:

@dynamic_prompt - Generates dynamic system prompts (equivalent to @wrap_model_call that modifies the prompt)

When to use decorators

Use decorators when

• You need a single hook
• No complex configuration

Use classes when

• Multiple hooks needed
• Complex configuration
• Reuse across projects (config on init)

Class-based middleware

Two hook styles

Node-style hooks

Run sequentially at specific execution points. Use for logging, validation, and state updates.

Wrap-style hooks

Intercept execution with full control over handler calls. Use for retries, caching, and transformation.

Node-style hooks

Run at specific points in the execution flow:

before_agent - Before agent starts (once per invocation)
before_model - Before each model call
after_model - After each model response
after_agent - After agent completes (up to once per invocation)

Example: Logging middleware

from langchain.agents.middleware import AgentMiddleware, AgentState
from langgraph.runtime import Runtime
from typing import Any

class LoggingMiddleware(AgentMiddleware):
    def before_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        print(f"About to call model with {len(state['messages'])} messages")
        return None

    def after_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        print(f"Model returned: {state['messages'][-1].content}")
        return None

Example: Conversation length limit

from langchain.agents.middleware import AgentMiddleware, AgentState
from langchain.messages import AIMessage
from langgraph.runtime import Runtime
from typing import Any

class MessageLimitMiddleware(AgentMiddleware):
    def __init__(self, max_messages: int = 50):
        super().__init__()
        self.max_messages = max_messages

    def before_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        if len(state["messages"]) == self.max_messages:
            return {
                "messages": [AIMessage("Conversation limit reached.")],
                "jump_to": "end"
            }
        return None

Wrap-style hooks

Intercept execution and control when the handler is called:

wrap_model_call - Around each model call
wrap_tool_call - Around each tool call

You decide if the handler is called zero times (short-circuit), once (normal flow), or multiple times (retry logic). Example: Model retry middleware

from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponse
from typing import Callable

class RetryMiddleware(AgentMiddleware):
    def __init__(self, max_retries: int = 3):
        super().__init__()
        self.max_retries = max_retries

    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        for attempt in range(self.max_retries):
            try:
                return handler(request)
            except Exception as e:
                if attempt == self.max_retries - 1:
                    raise
                print(f"Retry {attempt + 1}/{self.max_retries} after error: {e}")

Example: Dynamic model selection

from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponse
from langchain.chat_models import init_chat_model
from typing import Callable

class DynamicModelMiddleware(AgentMiddleware):
    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        # Use different model based on conversation length
        if len(request.messages) > 10:
            request.model = init_chat_model("gpt-4o")
        else:
            request.model = init_chat_model("gpt-4o-mini")

        return handler(request)

Example: Tool call monitoring

from langchain.tools.tool_node import ToolCallRequest
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import ToolMessage
from langgraph.types import Command
from typing import Callable

class ToolMonitoringMiddleware(AgentMiddleware):
    def wrap_tool_call(
        self,
        request: ToolCallRequest,
        handler: Callable[[ToolCallRequest], ToolMessage | Command],
    ) -> ToolMessage | Command:
        print(f"Executing tool: {request.tool_call['name']}")
        print(f"Arguments: {request.tool_call['args']}")

        try:
            result = handler(request)
            print(f"Tool completed successfully")
            return result
        except Exception as e:
            print(f"Tool failed: {e}")
            raise

Custom state schema

Middleware can extend the agent’s state with custom properties. Define a custom state type and set it as the state_schema:

from langchain.agents.middleware import AgentState, AgentMiddleware
from typing_extensions import NotRequired
from typing import Any

class CustomState(AgentState):
    model_call_count: NotRequired[int]
    user_id: NotRequired[str]

class CallCounterMiddleware(AgentMiddleware[CustomState]):
    state_schema = CustomState

    def before_model(self, state: CustomState, runtime) -> dict[str, Any] | None:
        # Access custom state properties
        count = state.get("model_call_count", 0)

        if count > 10:
            return {"jump_to": "end"}

        return None

    def after_model(self, state: CustomState, runtime) -> dict[str, Any] | None:
        # Update custom state
        return {"model_call_count": state.get("model_call_count", 0) + 1}

agent = create_agent(
    model="gpt-4o",
    middleware=[CallCounterMiddleware()],
    tools=[...],
)

# Invoke with custom state
result = agent.invoke({
    "messages": [HumanMessage("Hello")],
    "model_call_count": 0,
    "user_id": "user-123",
})

Execution order

When using multiple middleware, understanding execution order is important:

agent = create_agent(
    model="gpt-4o",
    middleware=[middleware1, middleware2, middleware3],
    tools=[...],
)

Execution flow (click to expand)

Before hooks run in order:

middleware1.before_agent()
middleware2.before_agent()
middleware3.before_agent()

Agent loop starts

middleware1.before_model()
middleware2.before_model()
middleware3.before_model()

Wrap hooks nest like function calls:

middleware1.wrap_model_call() → middleware2.wrap_model_call() → middleware3.wrap_model_call() → model

After hooks run in reverse order:

middleware3.after_model()
middleware2.after_model()
middleware1.after_model()

Agent loop ends

middleware3.after_agent()
middleware2.after_agent()
middleware1.after_agent()

Key rules:

before_* hooks: First to last
after_* hooks: Last to first (reverse)
wrap_* hooks: Nested (first middleware wraps all others)

Agent jumps

To exit early from middleware, return a dictionary with jump_to:

class EarlyExitMiddleware(AgentMiddleware):
    def before_model(self, state: AgentState, runtime) -> dict[str, Any] | None:
        # Check some condition
        if should_exit(state):
            return {
                "messages": [AIMessage("Exiting early due to condition.")],
                "jump_to": "end"
            }
        return None

Available jump targets:

"end": Jump to the end of the agent execution
"tools": Jump to the tools node
"model": Jump to the model node (or the first before_model hook)

Important: When jumping from before_model or after_model, jumping to "model" will cause all before_model middleware to run again. To enable jumping, decorate your hook with @hook_config(can_jump_to=[...]):

from langchain.agents.middleware import AgentMiddleware, hook_config
from typing import Any

class ConditionalMiddleware(AgentMiddleware):
    @hook_config(can_jump_to=["end", "tools"])
    def after_model(self, state: AgentState, runtime) -> dict[str, Any] | None:
        if some_condition(state):
            return {"jump_to": "end"}
        return None

Best practices

Keep middleware focused - each should do one thing well
Handle errors gracefully - don’t let middleware errors crash the agent
Use appropriate hook types:
- Node-style for sequential logic (logging, validation)
- Wrap-style for control flow (retry, fallback, caching)
Clearly document any custom state properties
Unit test middleware independently before integrating
Consider execution order - place critical middleware first in the list
Use built-in middleware when possible, don’t reinvent the wheel :)

Examples

Dynamically selecting tools

Select relevant tools at runtime to improve performance and accuracy.

Benefits:

Shorter prompts - Reduce complexity by exposing only relevant tools
Better accuracy - Models choose correctly from fewer options
Permission control - Dynamically filter tools based on user access

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware, ModelRequest
from typing import Callable


class ToolSelectorMiddleware(AgentMiddleware):
    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        """Middleware to select relevant tools based on state/context."""
        # Select a small, relevant subset of tools based on state/context
        relevant_tools = select_relevant_tools(request.state, request.runtime)
        request.tools = relevant_tools
        return handler(request)

agent = create_agent(
    model="gpt-4o",
    tools=all_tools,  # All available tools need to be registered upfront
    # Middleware can be used to select a smaller subset that's relevant for the given run.
    middleware=[ToolSelectorMiddleware()],
)

Show Extended example: GitHub vs GitLab tool selection

from dataclasses import dataclass
from typing import Literal, Callable

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponse
from langchain_core.tools import tool


@tool
def github_create_issue(repo: str, title: str) -> dict:
    """Create an issue in a GitHub repository."""
    return {"url": f"https://github.com/{repo}/issues/1", "title": title}

@tool
def gitlab_create_issue(project: str, title: str) -> dict:
    """Create an issue in a GitLab project."""
    return {"url": f"https://gitlab.com/{project}/-/issues/1", "title": title}

all_tools = [github_create_issue, gitlab_create_issue]

@dataclass
class Context:
    provider: Literal["github", "gitlab"]

class ToolSelectorMiddleware(AgentMiddleware):
    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        """Select tools based on the VCS provider."""
        provider = request.runtime.context.provider

        if provider == "gitlab":
            selected_tools = [t for t in request.tools if t.name == "gitlab_create_issue"]
        else:
            selected_tools = [t for t in request.tools if t.name == "github_create_issue"]

        request.tools = selected_tools
        return handler(request)

agent = create_agent(
    model="gpt-4o",
    tools=all_tools,
    middleware=[ToolSelectorMiddleware()],
    context_schema=Context,
)

# Invoke with GitHub context
agent.invoke(
    {
        "messages": [{"role": "user", "content": "Open an issue titled 'Bug: where are the cats' in the repository `its-a-cats-game`"}]
    },
    context=Context(provider="github"),
)

Key points:

Register all tools upfront
Middleware selects the relevant subset per request
Use context_schema for configuration requirements

Additional resources

Middleware API reference - Complete guide to custom middleware
Human-in-the-loop - Add human review for sensitive operations
Testing agents - Strategies for testing safety mechanisms

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

LangChain v1.0

Get started

Core components

Advanced usage

Use in production

​中间件可以做什么？

监控

修改

控制

强制执行

​内置中间件

​摘要

​人在回路

​Anthropic 提示缓存

​模型调用限制

​工具调用限制

​模型回退

​PII 检测

​规划

​LLM 工具选择器

​工具重试

​LLM 工具模拟器

​上下文编辑

​Custom middleware

​Decorator-based middleware

​Available decorators

​When to use decorators

Use decorators when

Use classes when

​Class-based middleware

​Two hook styles

Node-style hooks

Wrap-style hooks

​Node-style hooks

​Wrap-style hooks

​Custom state schema

​Execution order

​Agent jumps

​Best practices

​Examples

​Dynamically selecting tools

​Additional resources

中间件可以做什么？

内置中间件

摘要

人在回路

Anthropic 提示缓存

模型调用限制

工具调用限制

模型回退

PII 检测

规划

LLM 工具选择器

工具重试

LLM 工具模拟器

上下文编辑

Custom middleware

Decorator-based middleware

Available decorators

When to use decorators

Class-based middleware

Two hook styles

Node-style hooks

Wrap-style hooks

Custom state schema

Execution order

Agent jumps

Best practices

Examples

Dynamically selecting tools

Additional resources