中间件

中间件提供了一种更紧密控制智能体内部发生的事情的方法。核心智能体循环涉及调用模型，让它选择要执行的工具，然后在不再调用工具时完成：

中间件在这些步骤的前后公开钩子：

中间件可以做什么？

监控

通过日志记录、分析和调试跟踪智能体行为

修改

转换提示、工具选择和输出格式

控制

添加重试、回退和提前终止逻辑

强制执行

应用速率限制、护栏和 PII 检测

通过将中间件传递给 @[create_agent] 来添加中间件：

import {
  createAgent,
  summarizationMiddleware,
  humanInTheLoopMiddleware,
} from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [...],
  middleware: [summarizationMiddleware, humanInTheLoopMiddleware],
});

内置中间件

LangChain 为常见用例提供预构建的中间件：

摘要

在接近令牌限制时自动总结对话历史。

完美适用于：

超出上下文窗口的长时间运行的对话
具有广泛历史记录的多轮对话
保留完整对话上下文很重要的应用程序

import { createAgent, summarizationMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [weatherTool, calculatorTool],
  middleware: [
    summarizationMiddleware({
      model: "gpt-4o-mini",
      maxTokensBeforeSummary: 4000, // Trigger summarization at 4000 tokens
      messagesToKeep: 20, // Keep last 20 messages after summary
      summaryPrompt: "Custom prompt for summarization...", // Optional
    }),
  ],
});

Configuration options

model

string

required

Model for generating summaries

maxTokensBeforeSummary

number

Token threshold for triggering summarization

messagesToKeep

number

default:"20"

Recent messages to preserve

tokenCounter

function

Custom token counting function. Defaults to character-based counting.

summaryPrompt

string

Custom prompt template. Uses built-in template if not specified.

summaryPrefix

string

default:"## Previous conversation summary:"

Prefix for summary messages

人在回路

在执行之前暂停智能体执行，以便人工批准、编辑或拒绝工具调用。

完美适用于：

需要人工批准的高风险操作（数据库写入、金融交易）
必须有人工监督的合规工作流程
使用人工反馈来指导智能体的长时间运行对话

import { createAgent, humanInTheLoopMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [readEmailTool, sendEmailTool],
  middleware: [
    humanInTheLoopMiddleware({
      interruptOn: {
        // Require approval, editing, or rejection for sending emails
        send_email: {
          allowAccept: true,
          allowEdit: true,
          allowRespond: true,
        },
        // Auto-approve reading emails
        read_email: false,
      }
    })
  ]
});

Configuration options

interruptOn

object

required

Mapping of tool names to approval configs

Tool approval config options:

allowAccept

boolean

default:"false"

Whether approval is allowed

allowEdit

boolean

default:"false"

Whether editing is allowed

allowRespond

boolean

default:"false"

Whether responding/rejection is allowed

重要： 人在回路中间件需要检查点器来在中断之间维护状态。有关完整示例和集成模式，请参阅人在回路文档。

Anthropic 提示缓存

通过缓存 Anthropic 模型的重复提示前缀来降低成本。

完美适用于：

具有长且重复的系统提示的应用程序
在调用之间重用相同上下文的智能体
减少高流量部署的 API 成本

了解有关 Anthropic 提示缓存策略和限制的更多信息。

import { createAgent, HumanMessage, anthropicPromptCachingMiddleware } from "langchain";

const LONG_PROMPT = `
Please be a helpful assistant.

<Lots more context ...>
`;

const agent = createAgent({
  model: "claude-sonnet-4-5-20250929",
  prompt: LONG_PROMPT,
  middleware: [anthropicPromptCachingMiddleware({ ttl: "5m" })],
});

// cache store
await agent.invoke({
  messages: [new HumanMessage("Hi, my name is Bob")]
});

// cache hit, system prompt is cached
const result = await agent.invoke({
  messages: [new HumanMessage("What's my name?")]
});

Configuration options

ttl

string

default:"5m"

Time to live for cached content. Valid values: "5m" or "1h"

模型调用限制

限制模型调用次数以防止无限循环或过度成本。

完美适用于：

防止失控的智能体进行过多的 API 调用
在生产部署上强制执行成本控制
在特定调用预算内测试智能体行为

import { createAgent, modelCallLimitMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [...],
  middleware: [
    modelCallLimitMiddleware({
      threadLimit: 10, // Max 10 calls per thread (across runs)
      runLimit: 5, // Max 5 calls per run (single invocation)
      exitBehavior: "end", // Or "error" to throw exception
    }),
  ],
});

Configuration options

threadLimit

number

Maximum model calls across all runs in a thread. Defaults to no limit.

runLimit

number

Maximum model calls per single invocation. Defaults to no limit.

exitBehavior

string

default:"end"

Behavior when limit is reached. Options: "end" (graceful termination) or "error" (throw exception)

工具调用限制

通过限制工具调用次数来控制智能体执行，可以全局限制所有工具或针对特定工具。

完美适用于：

防止对昂贵的外部 API 进行过多调用
限制网络搜索或数据库查询
对特定工具使用强制执行速率限制
防止失控的智能体循环

要全局限制所有工具或针对特定工具限制工具调用，请设置 toolName。对于每个限制，指定以下一项或两项：

线程限制 (threadLimit) - 对话中所有运行的最大调用次数。在调用之间持续存在。需要检查点器。
运行限制 (runLimit) - 每次调用的最大调用次数。每轮重置。

Exit behaviors:

Behavior	Effect	Best For
`"continue"` (default)	Blocks exceeded calls with error messages, agent continues	Most use cases - agent handles limits gracefully
`"error"`	Raises exception immediately	Complex workflows where you want to handle the limit error manually
`"end"`	Stops with ToolMessage + AI message	Single-tool scenarios (errors if other tools pending)

import { createAgent, toolCallLimitMiddleware } from "langchain";

// Global limit: max 20 calls per thread, 10 per run
const globalLimiter = toolCallLimitMiddleware({
  threadLimit: 20,
  runLimit: 10,
});

// Tool-specific limit with default "continue" behavior
const searchLimiter = toolCallLimitMiddleware({
  toolName: "search",
  threadLimit: 5,
  runLimit: 3,
});

// Thread limit only (no per-run limit)
const databaseLimiter = toolCallLimitMiddleware({
  toolName: "query_database",
  threadLimit: 10,
});

// Strict enforcement with "error" behavior
const webScraperLimiter = toolCallLimitMiddleware({
  toolName: "scrape_webpage",
  runLimit: 2,
  exitBehavior: "error",
});

// Immediate termination with "end" behavior
const criticalToolLimiter = toolCallLimitMiddleware({
  toolName: "delete_records",
  runLimit: 1,
  exitBehavior: "end",
});

// Use multiple limiters together
const agent = createAgent({
  model: "gpt-4o",
  tools: [searchTool, databaseTool, scraperTool],
  middleware: [globalLimiter, searchLimiter, databaseLimiter, webScraperLimiter],
});

Configuration options

toolName

string

Name of specific tool to limit. If not provided, limits apply to all tools globally.

threadLimit

number

Maximum tool calls across all runs in a thread (conversation). Persists across multiple invocations with the same thread ID. Requires a checkpointer to maintain state. undefined means no thread limit.

runLimit

number

Maximum tool calls per single invocation (one user message → response cycle). Resets with each new user message. undefined means no run limit.Note: At least one of threadLimit or runLimit must be specified.

exitBehavior

string

default:"continue"

Behavior when limit is reached:

"continue" (default) - Block exceeded tool calls with error messages, let other tools and the model continue. The model decides when to end based on the error messages.
"error" - Throw a ToolCallLimitExceededError exception, stopping execution immediately
"end" - Stop execution immediately with a ToolMessage and AI message for the exceeded tool call. Only works when limiting a single tool; throws error if other tools have pending calls.

模型回退

当主模型失败时自动回退到替代模型。

完美适用于：

构建能够处理模型中断的弹性智能体
通过回退到更便宜的模型来优化成本
跨 OpenAI、Anthropic 等的提供商冗余

import { createAgent, modelFallbackMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o", // Primary model
  tools: [...],
  middleware: [
    modelFallbackMiddleware(
      "gpt-4o-mini", // Try first on error
      "claude-3-5-sonnet-20241022" // Then this
    ),
  ],
});

Configuration options

中间件接受可变数量的字符串参数，按顺序表示回退模型：

...models

string[]

required

当主模型失败时按顺序尝试的一个或多个回退模型字符串

modelFallbackMiddleware(
  "first-fallback-model",
  "second-fallback-model",
  // ... more models
)

PII 检测

检测和处理对话中的个人身份信息。

完美适用于：

具有合规要求的医疗保健和金融应用程序
需要清理日志的客户服务智能体
处理敏感用户数据的任何应用程序

import { createAgent, piiRedactionMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [...],
  middleware: [
    // Redact emails in user input
    piiRedactionMiddleware({
      piiType: "email",
      strategy: "redact",
      applyToInput: true,
    }),
    // Mask credit cards (show last 4 digits)
    piiRedactionMiddleware({
      piiType: "credit_card",
      strategy: "mask",
      applyToInput: true,
    }),
    // Custom PII type with regex
    piiRedactionMiddleware({
      piiType: "api_key",
      detector: /sk-[a-zA-Z0-9]{32}/,
      strategy: "block", // Throw error if detected
    }),
  ],
});

Configuration options

piiType

string

required

Type of PII to detect. Can be a built-in type (email, credit_card, ip, mac_address, url) or a custom type name.

strategy

string

default:"redact"

How to handle detected PII. Options:

"block" - Throw error when detected
"redact" - Replace with [REDACTED_TYPE]
"mask" - Partially mask (e.g., ****-****-****-1234)
"hash" - Replace with deterministic hash

detector

RegExp

Custom detector regex pattern. If not provided, uses built-in detector for the PII type.

applyToInput

boolean

default:"true"

Check user messages before model call

applyToOutput

boolean

default:"false"

Check AI messages after model call

applyToToolResults

boolean

default:"false"

Check tool result messages after execution

规划

为复杂的多步骤任务添加待办事项列表管理功能。

此中间件自动为智能体提供 write_todos 工具和系统提示，以指导有效的任务规划。

import { createAgent, HumanMessage, todoListMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [
    /* ... */
  ],
  middleware: [todoListMiddleware()] as const,
});

const result = await agent.invoke({
  messages: [new HumanMessage("Help me refactor my codebase")],
});
console.log(result.todos); // Array of todo items with status tracking

Configuration options

没有可用的配置选项（使用默认值）。

LLM 工具选择器

在调用主模型之前使用 LLM 智能选择相关工具。

完美适用于：

具有许多工具（10+）的智能体，其中大多数工具与每个查询无关
通过过滤不相关的工具来减少 token 使用
提高模型的专注度和准确性

import { createAgent, llmToolSelectorMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [tool1, tool2, tool3, tool4, tool5, ...], // Many tools
  middleware: [
    llmToolSelectorMiddleware({
      model: "gpt-4o-mini", // Use cheaper model for selection
      maxTools: 3, // Limit to 3 most relevant tools
      alwaysInclude: ["search"], // Always include certain tools
    }),
  ],
});

Configuration options

model

string

Model for tool selection. Defaults to the agent’s main model.

maxTools

number

Maximum number of tools to select. Defaults to no limit.

alwaysInclude

string[]

Array of tool names to always include in the selection

上下文编辑

通过修剪、总结或清除工具使用来管理对话上下文。

完美适用于：

需要定期清理上下文的长对话
从上下文中删除失败的工具尝试
自定义上下文管理策略

import { createAgent, contextEditingMiddleware, ClearToolUsesEdit } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [...],
  middleware: [
    contextEditingMiddleware({
      edits: [
        new ClearToolUsesEdit({ maxTokens: 1000 }), // Clear old tool uses
      ],
    }),
  ],
});

Configuration options

edits

ContextEdit[]

default:"[new ClearToolUsesEdit()]"

Array of ContextEdit strategies to apply

@[ClearToolUsesEdit] options:

maxTokens

number

default:"1000"

Token count that triggers the edit

Custom middleware

Build custom middleware by implementing hooks that run at specific points in the agent execution flow.

Class-based middleware

Two hook styles

Node-style hooks

Run sequentially at specific execution points. Use for logging, validation, and state updates.

Wrap-style hooks

Intercept execution with full control over handler calls. Use for retries, caching, and transformation.

Node-style hooks

Run at specific points in the execution flow:

beforeAgent - Before agent starts (once per invocation)
beforeModel - Before each model call
afterModel - After each model response
afterAgent - After agent completes (up to once per invocation)

Example: Logging middleware

import { createMiddleware } from "langchain";

const loggingMiddleware = createMiddleware({
  name: "LoggingMiddleware",
  beforeModel: (state) => {
    console.log(`About to call model with ${state.messages.length} messages`);
    return;
  },
  afterModel: (state) => {
    const lastMessage = state.messages[state.messages.length - 1];
    console.log(`Model returned: ${lastMessage.content}`);
    return;
  },
});

Example: Conversation length limit

import { createMiddleware, AIMessage } from "langchain";

const createMessageLimitMiddleware = (maxMessages: number = 50) => {
  return createMiddleware({
    name: "MessageLimitMiddleware",
    beforeModel: (state) => {
      if (state.messages.length === maxMessages) {
        return {
          messages: [new AIMessage("Conversation limit reached.")],
          jumpTo: "end",
        };
      }
      return;
    },
  });
};

Wrap-style hooks

Intercept execution and control when the handler is called:

wrapModelCall - Around each model call
wrapToolCall - Around each tool call

You decide if the handler is called zero times (short-circuit), once (normal flow), or multiple times (retry logic). Example: Model retry middleware

import { createMiddleware } from "langchain";

const createRetryMiddleware = (maxRetries: number = 3) => {
  return createMiddleware({
    name: "RetryMiddleware",
    wrapModelCall: (request, handler) => {
      for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
          return handler(request);
        } catch (e) {
          if (attempt === maxRetries - 1) {
            throw e;
          }
          console.log(`Retry ${attempt + 1}/${maxRetries} after error: ${e}`);
        }
      }
      throw new Error("Unreachable");
    },
  });
};

Example: Dynamic model selection

import { createMiddleware, initChatModel } from "langchain";

const dynamicModelMiddleware = createMiddleware({
  name: "DynamicModelMiddleware",
  wrapModelCall: (request, handler) => {
    // Use different model based on conversation length
    const modifiedRequest = { ...request };
    if (request.messages.length > 10) {
      modifiedRequest.model = initChatModel("gpt-4o");
    } else {
      modifiedRequest.model = initChatModel("gpt-4o-mini");
    }
    return handler(modifiedRequest);
  },
});

Example: Tool call monitoring

import { createMiddleware } from "langchain";

const toolMonitoringMiddleware = createMiddleware({
  name: "ToolMonitoringMiddleware",
  wrapToolCall: (request, handler) => {
    console.log(`Executing tool: ${request.toolCall.name}`);
    console.log(`Arguments: ${JSON.stringify(request.toolCall.args)}`);

    try {
      const result = handler(request);
      console.log("Tool completed successfully");
      return result;
    } catch (e) {
      console.log(`Tool failed: ${e}`);
      throw e;
    }
  },
});

Custom state schema

Middleware can extend the agent’s state with custom properties. Define a custom state type and set it as the state_schema:

import { createMiddleware, createAgent, HumanMessage } from "langchain";
import * as z from "zod";

// Middleware with custom state requirements
const callCounterMiddleware = createMiddleware({
  name: "CallCounterMiddleware",
  stateSchema: z.object({
    modelCallCount: z.number().default(0),
    userId: z.string().optional(),
  }),
  beforeModel: (state) => {
    // Access custom state properties
    if (state.modelCallCount > 10) {
      return { jumpTo: "end" };
    }
    return;
  },
  afterModel: (state) => {
    // Update custom state
    return { modelCallCount: state.modelCallCount + 1 };
  },
});

const agent = createAgent({
  model: "gpt-4o",
  tools: [...],
  middleware: [callCounterMiddleware] as const,
});

// TypeScript enforces required state properties
const result = await agent.invoke({
  messages: [new HumanMessage("Hello")],
  modelCallCount: 0, // Optional due to default value
  userId: "user-123", // Optional
});

Context extension

Context properties are configuration values passed through the runnable config. Unlike state, context is read-only and typically used for configuration that doesn’t change during execution. Middleware can define context requirements that must be satisfied through the agent’s configuration:

import * as z from "zod";
import { createMiddleware, HumanMessage } from "langchain";

const rateLimitMiddleware = createMiddleware({
  name: "RateLimitMiddleware",
  contextSchema: z.object({
    maxRequestsPerMinute: z.number(),
    apiKey: z.string(),
  }),
  beforeModel: async (state, runtime) => {
    // Access context through runtime
    const { maxRequestsPerMinute, apiKey } = runtime.context;

    // Implement rate limiting logic
    const allowed = await checkRateLimit(apiKey, maxRequestsPerMinute);
    if (!allowed) {
      return { jumpTo: "END" };
    }

    return state;
  },
});

// Context is provided through config
await agent.invoke(
  { messages: [new HumanMessage("Process data")] },
  {
    context: {
      maxRequestsPerMinute: 60,
      apiKey: "api-key-123",
    },
  }
);

Execution order

When using multiple middleware, understanding execution order is important:

const agent = createAgent({
  model: "gpt-4o",
  middleware: [middleware1, middleware2, middleware3],
  tools: [...],
});

Execution flow (click to expand)

Before hooks run in order:

middleware1.before_agent()
middleware2.before_agent()
middleware3.before_agent()

Agent loop starts

middleware1.before_model()
middleware2.before_model()
middleware3.before_model()

Wrap hooks nest like function calls:

middleware1.wrap_model_call() → middleware2.wrap_model_call() → middleware3.wrap_model_call() → model

After hooks run in reverse order:

middleware3.after_model()
middleware2.after_model()
middleware1.after_model()

Agent loop ends

middleware3.after_agent()
middleware2.after_agent()
middleware1.after_agent()

Key rules:

before_* hooks: First to last
after_* hooks: Last to first (reverse)
wrap_* hooks: Nested (first middleware wraps all others)

Agent jumps

To exit early from middleware, return a dictionary with jump_to:

import { createMiddleware, AIMessage } from "langchain";

const earlyExitMiddleware = createMiddleware({
  name: "EarlyExitMiddleware",
  beforeModel: (state) => {
    // Check some condition
    if (shouldExit(state)) {
      return {
        messages: [new AIMessage("Exiting early due to condition.")],
        jumpTo: "end",
      };
    }
    return;
  },
});

Available jump targets:

"end": Jump to the end of the agent execution
"tools": Jump to the tools node
"model": Jump to the model node (or the first before_model hook)

Important: When jumping from before_model or after_model, jumping to "model" will cause all before_model middleware to run again. To enable jumping, decorate your hook with @hook_config(can_jump_to=[...]):

import { createMiddleware } from "langchain";

const conditionalMiddleware = createMiddleware({
  name: "ConditionalMiddleware",
  afterModel: (state) => {
    if (someCondition(state)) {
      return { jumpTo: "end" };
    }
    return;
  },
});

Best practices

Keep middleware focused - each should do one thing well
Handle errors gracefully - don’t let middleware errors crash the agent
Use appropriate hook types:
- Node-style for sequential logic (logging, validation)
- Wrap-style for control flow (retry, fallback, caching)
Clearly document any custom state properties
Unit test middleware independently before integrating
Consider execution order - place critical middleware first in the list
Use built-in middleware when possible, don’t reinvent the wheel :)

Examples

Dynamically selecting tools

Select relevant tools at runtime to improve performance and accuracy.

Benefits:

Shorter prompts - Reduce complexity by exposing only relevant tools
Better accuracy - Models choose correctly from fewer options
Permission control - Dynamically filter tools based on user access

import { createAgent, createMiddleware } from "langchain";

const toolSelectorMiddleware = createMiddleware({
  name: "ToolSelector",
  wrapModelCall: (request, handler) => {
    // Select a small, relevant subset of tools based on state/context
    const relevantTools = selectRelevantTools(request.state, request.runtime);
    const modifiedRequest = { ...request, tools: relevantTools };
    return handler(modifiedRequest);
  },
});

const agent = createAgent({
  model: "gpt-4o",
  tools: allTools, // All available tools need to be registered upfront
  // Middleware can be used to select a smaller subset that's relevant for the given run.
  middleware: [toolSelectorMiddleware],
});

Show Extended example: GitHub vs GitLab tool selection

import * as z from "zod";
import { createAgent, createMiddleware, tool, HumanMessage } from "langchain";

const githubCreateIssue = tool(
  async ({ repo, title }) => ({
    url: `https://github.com/${repo}/issues/1`,
    title,
  }),
  {
    name: "github_create_issue",
    description: "Create an issue in a GitHub repository",
    schema: z.object({ repo: z.string(), title: z.string() }),
  }
);

const gitlabCreateIssue = tool(
  async ({ project, title }) => ({
    url: `https://gitlab.com/${project}/-/issues/1`,
    title,
  }),
  {
    name: "gitlab_create_issue",
    description: "Create an issue in a GitLab project",
    schema: z.object({ project: z.string(), title: z.string() }),
  }
);

const allTools = [githubCreateIssue, gitlabCreateIssue];

const toolSelector = createMiddleware({
  name: "toolSelector",
  contextSchema: z.object({ provider: z.enum(["github", "gitlab"]) }),
  wrapModelCall: (request, handler) => {
    const provider = request.runtime.context.provider;
    const toolName = provider === "gitlab" ? "gitlab_create_issue" : "github_create_issue";
    const selectedTools = request.tools.filter((t) => t.name === toolName);
    const modifiedRequest = { ...request, tools: selectedTools };
    return handler(modifiedRequest);
  },
});

const agent = createAgent({
  model: "gpt-4o",
  tools: allTools,
  middleware: [toolSelector],
});

// Invoke with GitHub context
await agent.invoke(
  {
    messages: [
      new HumanMessage("Open an issue titled 'Bug: where are the cats' in the repository `its-a-cats-game`"),
    ],
  },
  {
    context: { provider: "github" },
  }
);

Key points:

Register all tools upfront
Middleware selects the relevant subset per request
Use contextSchema for configuration requirements

Additional resources

Middleware API reference - Complete guide to custom middleware
Human-in-the-loop - Add human review for sensitive operations
Testing agents - Strategies for testing safety mechanisms

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

LangChain v1.0

Get started

Core components

Advanced usage

Use in production

中间件可以做什么？

监控

修改

控制

强制执行

内置中间件

摘要

人在回路

Anthropic 提示缓存

模型调用限制

工具调用限制

模型回退

PII 检测

规划

LLM 工具选择器

上下文编辑

Custom middleware

Class-based middleware

Two hook styles

Node-style hooks

Wrap-style hooks

Node-style hooks

Wrap-style hooks

Custom state schema

Context extension

Execution order

Agent jumps

Best practices

Examples

Dynamically selecting tools

Additional resources

LangChain v1.0

Get started

Core components

Advanced usage

Use in production

​中间件可以做什么？

监控

修改

控制

强制执行

​内置中间件

​摘要

​人在回路

​Anthropic 提示缓存

​模型调用限制

​工具调用限制

​模型回退

​PII 检测

​规划

​LLM 工具选择器

​上下文编辑

​Custom middleware

​Class-based middleware

​Two hook styles

Node-style hooks

Wrap-style hooks

​Node-style hooks

​Wrap-style hooks

​Custom state schema

​Context extension

​Execution order

​Agent jumps

​Best practices

​Examples

​Dynamically selecting tools

​Additional resources

中间件可以做什么？

内置中间件

摘要

人在回路

Anthropic 提示缓存

模型调用限制

工具调用限制

模型回退

PII 检测

规划

LLM 工具选择器

上下文编辑

Custom middleware

Class-based middleware

Two hook styles

Node-style hooks

Wrap-style hooks

Custom state schema

Context extension

Execution order

Agent jumps

Best practices

Examples

Dynamically selecting tools

Additional resources