LLM 是强大的 AI 工具,可以像人类一样解释和生成文本。它们足够通用,可以编写内容、翻译语言、总结和回答问题,而无需为每项任务进行专门培训。 除了文本生成外,许多模型还支持:
  • 工具调用 - 调用外部工具(如数据库查询或 API 调用)并在其响应中使用结果。
  • 结构化输出 - 其中模型的响应被约束为遵循定义的格式。
  • 多模态 - 处理和返回文本以外的数据,如图像、音频和视频。
  • 推理 - 模型执行多步推理以得出结论。
模型是智能体的推理引擎。它们驱动智能体的决策过程,确定要调用哪些工具、如何解释结果以及何时提供最终答案。 您选择的模型的质量和能力直接影响智能体的可靠性和性能。不同的模型在不同的任务上表现出色 - 有些更擅长遵循复杂指令,其他擅长结构化推理,有些支持更大的上下文窗口来处理更多信息。 LangChain 的标准模型接口使您可以访问许多不同的提供商集成,这使得实验和在模型之间切换以找到最适合您情况的模型变得容易。
有关提供商特定的集成信息和能力,请参阅提供商的聊天模型页面

基本用法

模型可以通过两种方式使用:
  1. 与智能体一起使用 - 在创建智能体时可以动态指定模型。
  2. 独立使用 - 可以直接调用模型(在智能体循环之外)进行文本生成、分类或提取等任务,而无需智能体框架。
相同的模型接口在两种上下文中都有效,这使您可以灵活地从简单开始,并根据需要扩展到更复杂的基于智能体的工作流程。

初始化模型

在 LangChain 中开始使用独立模型的最简单方法是使用 init_chat_model 从您选择的聊天模型提供商初始化一个(下面的示例):
  • OpenAI
  • Anthropic
  • Azure
  • Google Gemini
  • AWS Bedrock
👉 Read the OpenAI chat model integration docs
pip install -U "langchain[openai]"
import os
from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "sk-..."

model = init_chat_model("gpt-4.1")
response = model.invoke("为什么鹦鹉会说话?")
有关更多详细信息,包括如何传递模型参数的信息,请参阅 init_chat_model

关键方法

调用 (Invoke)

模型将消息作为输入,并在生成完整响应后输出消息。

流式传输 (Stream)

调用模型,但在实时生成时流式传输输出。

批处理 (Batch)

批量向模型发送多个请求以实现更高效的处理。
除了聊天模型外,LangChain 还支持其他相关技术,如嵌入模型和向量存储。有关详细信息,请参阅集成页面

参数

聊天模型采用可用于配置其行为的参数。支持的参数的完整集合因模型和提供商而异,但标准参数包括:
model
string
required
您要使用的特定模型的名称或标识符。
api_key
string
与模型提供商进行身份验证所需的密钥。通常在您注册访问模型时颁发。通常通过设置来访问。
temperature
number
控制模型输出的随机性。较高的数字使响应更具创造性;较低的数字使它们更具确定性。
timeout
number
在取消请求之前等待模型响应的最长时间(以秒为单位)。
max_tokens
number
限制响应中的总数,有效控制输出的长度。
max_retries
number
如果由于网络超时或速率限制等问题导致请求失败,系统将尝试重新发送请求的最大尝试次数。
使用 init_chat_model,将这些参数作为内联传递:
Initialize using model parameters
model = init_chat_model(
    "claude-sonnet-4-5-20250929",
    # Kwargs passed to the model:
    temperature=0.7,
    timeout=30,
    max_tokens=1000,
)
每个聊天模型集成可能有用于控制提供商特定功能的额外参数。例如,ChatOpenAI 具有 use_responses_api 来指示是使用 OpenAI Responses 还是 Completions API。要查找给定聊天模型支持的所有参数,请访问聊天模型集成页面。

调用

必须调用聊天模型才能生成输出。有三种主要的调用方法,每种都适用于不同的用例。

调用

调用模型最直接的方法是使用 invoke() 和单个消息或消息列表。
Single message
response = model.invoke("Why do parrots have colorful feathers?")
print(response)
可以向模型提供消息列表以表示对话历史。每条消息都有一个角色,模型使用该角色来指示谁在对话中发送了消息。有关角色、类型和内容的更多详细信息,请参阅消息指南。
Dictionary format
from langchain.messages import HumanMessage, AIMessage, SystemMessage

conversation = [
    {"role": "system", "content": "You are a helpful assistant that translates English to French."},
    {"role": "user", "content": "Translate: I love programming."},
    {"role": "assistant", "content": "J'adore la programmation."},
    {"role": "user", "content": "Translate: I love building applications."}
]

response = model.invoke(conversation)
print(response)  # AIMessage("J'adore créer des applications.")
Message objects
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

conversation = [
    SystemMessage("You are a helpful assistant that translates English to French."),
    HumanMessage("Translate: I love programming."),
    AIMessage("J'adore la programmation."),
    HumanMessage("Translate: I love building applications.")
]

response = model.invoke(conversation)
print(response)  # AIMessage("J'adore créer des applications.")

流式传输

大多数模型可以在生成输出内容时流式传输它们。通过逐步显示输出,流式传输显著改善了用户体验,特别是对于较长的响应。 调用 stream() 返回一个,该迭代器在生成输出块时产生它们。您可以使用循环实时处理每个块:
for chunk in model.stream("Why do parrots have colorful feathers?"):
    print(chunk.text, end="|", flush=True)
invoke() 相反,它在模型完成生成完整响应后返回单个 AIMessage,而 stream() 返回多个 AIMessageChunk 对象,每个对象包含输出文本的一部分。重要的是,流中的每个块都设计为通过求和聚合成完整消息:
Construct an AIMessage
full = None  # None | AIMessageChunk
for chunk in model.stream("What color is the sky?"):
    full = chunk if full is None else full + chunk
    print(full.text)

# The
# The sky
# The sky is
# The sky is typically
# The sky is typically blue
# ...

print(full.content_blocks)
# [{"type": "text", "text": "The sky is typically blue..."}]
生成的消息可以与使用 invoke() 生成的消息相同的方式处理 - 例如,它可以聚合到消息历史记录中,并作为对话上下文传递回模型。
只有当程序中的所有步骤都知道如何处理块流时,流式传输才能工作。例如,不支持流式传输的应用程序需要在处理之前将整个输出存储在内存中。
LangChain 通过在某些情况下自动启用流式传输模式来简化来自聊天模型的流式传输,即使您没有显式调用流式传输方法也是如此。当您使用非流式传输的 invoke 方法但仍希望流式传输整个应用程序(包括来自聊天模型的中间结果)时,这特别有用。例如,在 LangGraph 智能体中,您可以在节点内调用 model.invoke(),但如果以流式传输模式运行,LangChain 将自动委托给流式传输。

工作原理

当您 invoke() 聊天模型时,如果 LangChain 检测到您正在尝试流式传输整个应用程序,它将自动切换到内部流式传输模式。就使用 invoke 的代码而言,调用的结果将是相同的;但是,在流式传输聊天模型时,LangChain 将负责在 LangChain 的回调系统中调用 on_llm_new_token 事件。回调事件允许 LangGraph stream()astream_events() 实时显示聊天模型的输出。
LangChain 聊天模型还可以使用 astream_events() 流式传输语义事件。这简化了基于事件类型和其他元数据的过滤,并将在后台聚合完整消息。请参阅下面的示例。
async for event in model.astream_events("Hello"):

    if event["event"] == "on_chat_model_start":
        print(f"Input: {event['data']['input']}")

    elif event["event"] == "on_chat_model_stream":
        print(f"Token: {event['data']['chunk'].text}")

    elif event["event"] == "on_chat_model_end":
        print(f"Full message: {event['data']['output'].text}")

    else:
        pass
Input: Hello
Token: Hi
Token:  there
Token: !
Token:  How
Token:  can
Token:  I
...
Full message: Hi there! How can I help today?
有关事件类型和其他详细信息,请参阅 astream_events() 参考。

批处理

将一组独立的请求批处理到模型可以显著提高性能并降低成本,因为处理可以并行完成:
Batch
responses = model.batch([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
])
for response in responses:
    print(response)
本节描述聊天模型方法 batch(),它在客户端并行化模型调用。它与推理提供商支持的批处理 API 不同,例如 OpenAIAnthropic
默认情况下,batch() 将仅返回整个批处理的最终输出。如果您想在每个单独输入完成生成时接收其输出,可以使用 batch_as_completed() 流式传输结果:
Yield batch responses upon completion
for response in model.batch_as_completed([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
]):
    print(response)
使用 batch_as_completed() 时,结果可能不按顺序到达。每个结果都包含输入索引,用于匹配以根据需要重建原始顺序。
使用 batch()batch_as_completed() 处理大量输入时,您可能希望控制最大并行调用数。这可以通过在 RunnableConfig 字典中设置 max_concurrency 属性来完成。
Batch with max concurrency
model.batch(
    list_of_inputs,
    config={
        'max_concurrency': 5,  # Limit to 5 parallel calls
    }
)
有关支持的属性的完整列表,请参阅 RunnableConfig 参考。
有关批处理的更多详细信息,请参阅 参考

Tool calling

Models can request to call tools that perform tasks such as fetching data from a database, searching the web, or running code. Tools are pairings of:
  1. A schema, including the name of the tool, a description, and/or argument definitions (often a JSON schema)
  2. A function or to execute.
You may hear the term “function calling”. We use this interchangeably with “tool calling”.
To make tools that you have defined available for use by a model, you must bind them using bind_tools(). In subsequent invocations, the model can choose to call any of the bound tools as needed. Some model providers offer built-in tools that can be enabled via model or invocation parameters (e.g. ChatOpenAI, ChatAnthropic). Check the respective provider reference for details.
See the tools guide for details and other options for creating tools.
Binding user tools
from langchain.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the weather at a location."""
    return f"It's sunny in {location}."


model_with_tools = model.bind_tools([get_weather])  

response = model_with_tools.invoke("What's the weather like in Boston?")
for tool_call in response.tool_calls:
    # View tool calls made by the model
    print(f"Tool: {tool_call['name']}")
    print(f"Args: {tool_call['args']}")
When binding user-defined tools, the model’s response includes a request to execute a tool. When using a model separately from an agent, it is up to you to perform the requested action and return the result back to the model for use in subsequent reasoning. Note that when using an agent, the agent loop will handle the tool execution loop for you. Below, we show some common ways you can use tool calling.
When a model returns tool calls, you need to execute the tools and pass the results back to the model. This creates a conversation loop where the model can use tool results to generate its final response. LangChain includes agent abstractions that handle this orchestration for you.Here’s a simple example of how to do this:
Tool execution loop
# Bind (potentially multiple) tools to the model
model_with_tools = model.bind_tools([get_weather])

# Step 1: Model generates tool calls
messages = [{"role": "user", "content": "What's the weather in Boston?"}]
ai_msg = model_with_tools.invoke(messages)
messages.append(ai_msg)

# Step 2: Execute tools and collect results
for tool_call in ai_msg.tool_calls:
    # Execute the tool with the generated arguments
    tool_result = get_weather.invoke(tool_call)
    messages.append(tool_result)

# Step 3: Pass results back to model for final response
final_response = model_with_tools.invoke(messages)
print(final_response.text)
# "The current weather in Boston is 72°F and sunny."
Each ToolMessage returned by the tool includes a tool_call_id that matches the original tool call, helping the model correlate results with requests.
By default, the model has the freedom to choose which bound tool to use based on the user’s input. However, you might want to force choosing a tool, ensuring the model uses either a particular tool or any tool from a given list:
model_with_tools = model.bind_tools([tool_1], tool_choice="any")
Many models support calling multiple tools in parallel when appropriate. This allows the model to gather information from different sources simultaneously.
Parallel tool calls
model_with_tools = model.bind_tools([get_weather])

response = model_with_tools.invoke(
    "What's the weather in Boston and Tokyo?"
)


# The model may generate multiple tool calls
print(response.tool_calls)
# [
#   {'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': 'call_1'},
#   {'name': 'get_weather', 'args': {'location': 'Tokyo'}, 'id': 'call_2'},
# ]


# Execute all tools (can be done in parallel with async)
results = []
for tool_call in response.tool_calls:
    if tool_call['name'] == 'get_weather':
        result = get_weather.invoke(tool_call)
    ...
    results.append(result)
The model intelligently determines when parallel execution is appropriate based on the independence of the requested operations.
Most models supporting tool calling enable parallel tool calls by default. Some (including OpenAI and Anthropic) allow you to disable this feature. To do this, set parallel_tool_calls=False:
model.bind_tools([get_weather], parallel_tool_calls=False)
When streaming responses, tool calls are progressively built through ToolCallChunk. This allows you to see tool calls as they’re being generated rather than waiting for the complete response.
Streaming tool calls
for chunk in model_with_tools.stream(
    "What's the weather in Boston and Tokyo?"
):
    # Tool call chunks arrive progressively
    for tool_chunk in chunk.tool_call_chunks:
        if name := tool_chunk.get("name"):
            print(f"Tool: {name}")
        if id_ := tool_chunk.get("id"):
            print(f"ID: {id_}")
        if args := tool_chunk.get("args"):
            print(f"Args: {args}")

# Output:
# Tool: get_weather
# ID: call_SvMlU1TVIZugrFLckFE2ceRE
# Args: {"lo
# Args: catio
# Args: n": "B
# Args: osto
# Args: n"}
# Tool: get_weather
# ID: call_QMZdy6qInx13oWKE7KhuhOLR
# Args: {"lo
# Args: catio
# Args: n": "T
# Args: okyo
# Args: "}
You can accumulate chunks to build complete tool calls:
Accumulate tool calls
gathered = None
for chunk in model_with_tools.stream("What's the weather in Boston?"):
    gathered = chunk if gathered is None else gathered + chunk
    print(gathered.tool_calls)

Structured outputs

Models can be requested to provide their response in a format matching a given schema. This is useful for ensuring the output can be easily parsed and used in subsequent processing. LangChain supports multiple schema types and methods for enforcing structured outputs.
  • Pydantic
  • TypedDict
  • JSON Schema
Pydantic models provide the richest feature set with field validation, descriptions, and nested structures.
from pydantic import BaseModel, Field

class Movie(BaseModel):
    """A movie with details."""
    title: str = Field(..., description="The title of the movie")
    year: int = Field(..., description="The year the movie was released")
    director: str = Field(..., description="The director of the movie")
    rating: float = Field(..., description="The movie's rating out of 10")

model_with_structure = model.with_structured_output(Movie)
response = model_with_structure.invoke("Provide details about the movie Inception")
print(response)  # Movie(title="Inception", year=2010, director="Christopher Nolan", rating=8.8)
Key considerations for structured outputs:
  • Method parameter: Some providers support different methods ('json_schema', 'function_calling', 'json_mode')
    • 'json_schema' typically refers to dedicated structured output features offered by a provider
    • 'function_calling' derives structured output by forcing a tool call following the given schema
    • 'json_mode' is a precursor to 'json_schema' offered by some providers- it generates valid json, but the schema must be described in the prompt
  • Include raw: Use include_raw=True to get both the parsed output and the raw AI message
  • Validation: Pydantic models provide automatic validation, while TypedDict and JSON Schema require manual validation
It can be useful to return the raw AIMessage object alongside the parsed representation to access response metadata such as token counts. To do this, set include_raw=True when calling with_structured_output:
from pydantic import BaseModel, Field

class Movie(BaseModel):
    """A movie with details."""
    title: str = Field(..., description="The title of the movie")
    year: int = Field(..., description="The year the movie was released")
    director: str = Field(..., description="The director of the movie")
    rating: float = Field(..., description="The movie's rating out of 10")

model_with_structure = model.with_structured_output(Movie, include_raw=True)  
response = model_with_structure.invoke("Provide details about the movie Inception")
response
# {
#     "raw": AIMessage(...),
#     "parsed": Movie(title=..., year=..., ...),
#     "parsing_error": None,
# }
Schemas can be nested:
from pydantic import BaseModel, Field

class Actor(BaseModel):
    name: str
    role: str

class MovieDetails(BaseModel):
    title: str
    year: int
    cast: list[Actor]
    genres: list[str]
    budget: float | None = Field(None, description="Budget in millions USD")

model_with_structure = model.with_structured_output(MovieDetails)

Supported models

LangChain supports all major model providers, including OpenAI, Anthropic, Google, Azure, AWS Bedrock, and more. Each provider offers a variety of models with different capabilities. For a full list of supported models in LangChain, see the integrations page.

Advanced topics

Multimodal

Certain models can process and return non-textual data such as images, audio, and video. You can pass non-textual data to a model by providing content blocks.
All LangChain chat models with underlying multimodal capabilities support:
  1. Data in the cross-provider standard format (see our messages guide)
  2. OpenAI chat completions format
  3. Any format that is native to that specific provider (e.g., Anthropic models accept Anthropic native format)
See the multimodal section of the messages guide for details. can return multimodal data as part of their response. If invoked to do so, the resulting AIMessage will have content blocks with multimodal types.
Multimodal output
response = model.invoke("Create a picture of a cat")
print(response.content_blocks)
# [
#     {"type": "text", "text": "Here's a picture of a cat"},
#     {"type": "image", "base64": "...", "mime_type": "image/jpeg"},
# ]
See the integrations page for details on specific providers.

Reasoning

Newer models are capable of performing multi-step reasoning to arrive at a conclusion. This involves breaking down complex problems into smaller, more manageable steps. If supported by the underlying model, you can surface this reasoning process to better understand how the model arrived at its final answer.
for chunk in model.stream("Why do parrots have colorful feathers?"):
    reasoning_steps = [r for r in chunk.content_blocks if r["type"] == "reasoning"]
    print(reasoning_steps if reasoning_steps else chunk.text)
Depending on the model, you can sometimes specify the level of effort it should put into reasoning. Similarly, you can request that the model turn off reasoning entirely. This may take the form of categorical “tiers” of reasoning (e.g., 'low' or 'high') or integer token budgets. For details, see the integrations page or reference for your respective chat model.

Local models

LangChain supports running models locally on your own hardware. This is useful for scenarios where either data privacy is critical, you want to invoke a custom model, or when you want to avoid the costs incurred when using a cloud-based model. Ollama is one of the easiest ways to run models locally. See the full list of local integrations on the integrations page.

Prompt caching

Many providers offer prompt caching features to reduce latency and cost on repeat processing of the same tokens. These features can be implicit or explicit:
Prompt caching is often only engaged above a minimum input token threshold. See provider pages for details.
Cache usage will be reflected in the usage metadata of the model response.

Server-side tool use

Some providers support server-side tool-calling loops: models can interact with web search, code interpreters, and other tools and analyze the results in a single conversational turn. If a model invokes a tool server-side, the content of the response message will include content representing the invocation and result of the tool. Accessing the content blocks of the response will return the server-side tool calls and results in a provider-agnostic format:
Invoke with server-side tool use
from langchain.chat_models import init_chat_model

model = init_chat_model("gpt-4.1-mini")

tool = {"type": "web_search"}
model_with_tools = model.bind_tools([tool])

response = model_with_tools.invoke("What was a positive news story from today?")
response.content_blocks
Result
[
    {
        "type": "server_tool_call",
        "name": "web_search",
        "args": {
            "query": "positive news stories today",
            "type": "search"
        },
        "id": "ws_abc123"
    },
    {
        "type": "server_tool_result",
        "tool_call_id": "ws_abc123",
        "status": "success"
    },
    {
        "type": "text",
        "text": "Here are some positive news stories from today...",
        "annotations": [
            {
                "end_index": 410,
                "start_index": 337,
                "title": "article title",
                "type": "citation",
                "url": "..."
            }
        ]
    }
]
This represents a single conversational turn; there are no associated ToolMessage objects that need to be passed in as in client-side tool-calling. See the integration page for your given provider for available tools and usage details.

Rate limiting

Many chat model providers impose a limit on the number of invocations that can be made in a given time period. If you hit a rate limit, you will typically receive a rate limit error response from the provider, and will need to wait before making more requests. To help manage rate limits, chat model integrations accept a rate_limiter parameter that can be provided during initialization to control the rate at which requests are made.
LangChain in comes with (an optional) built-in InMemoryRateLimiter. This limiter is thread safe and can be shared by multiple threads in the same process.
Define a rate limiter
from langchain_core.rate_limiters import InMemoryRateLimiter

rate_limiter = InMemoryRateLimiter(
    requests_per_second=0.1,  # 1 request every 10s
    check_every_n_seconds=0.1,  # Check every 100ms whether allowed to make a request
    max_bucket_size=10,  # Controls the maximum burst size.
)

model = init_chat_model(
    model="gpt-5",
    model_provider="openai",
    rate_limiter=rate_limiter  
)
The provided rate limiter can only limit the number of requests per unit time. It will not help if you need to also limit based on the size of the requests.

Base URL or proxy

For many chat model integrations, you can configure the base URL for API requests, which allows you to use model providers that have OpenAI-compatible APIs or to use a proxy server.
Many model providers offer OpenAI-compatible APIs (e.g., Together AI, vLLM). You can use init_chat_model with these providers by specifying the appropriate base_url parameter:
model = init_chat_model(
    model="MODEL_NAME",
    model_provider="openai",
    base_url="BASE_URL",
    api_key="YOUR_API_KEY",
)
When using direct chat model class instantiation, the parameter name may vary by provider. Check the respective reference for details.
For deployments requiring HTTP proxies, some model integrations support proxy configuration:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-4o",
    openai_proxy="http://proxy.example.com:8080"
)
Proxy support varies by integration. Check the specific model provider’s reference for proxy configuration options.

Log probabilities

Certain models can be configured to return token-level log probabilities representing the likelihood of a given token by setting the logprobs parameter when initializing the model:
model = init_chat_model(
    model="gpt-4o",
    model_provider="openai"
).bind(logprobs=True)

response = model.invoke("Why do parrots talk?")
print(response.response_metadata["logprobs"])

Token usage

A number of model providers return token usage information as part of the invocation response. When available, this information will be included on the AIMessage objects produced by the corresponding model. For more details, see the messages guide.
Some provider APIs, notably OpenAI and Azure OpenAI chat completions, require users opt-in to receiving token usage data in streaming contexts. See the streaming usage metadata section of the integration guide for details.
You can track aggregate token counts across models in an application using either a callback or context manager, as shown below:
  • Callback handler
  • Context manager
from langchain.chat_models import init_chat_model
from langchain_core.callbacks import UsageMetadataCallbackHandler

model_1 = init_chat_model(model="gpt-4o-mini")
model_2 = init_chat_model(model="claude-haiku-4-5-20251001")

callback = UsageMetadataCallbackHandler()
result_1 = model_1.invoke("Hello", config={"callbacks": [callback]})
result_2 = model_2.invoke("Hello", config={"callbacks": [callback]})
callback.usage_metadata
{
    'gpt-4o-mini-2024-07-18': {
        'input_tokens': 8,
        'output_tokens': 10,
        'total_tokens': 18,
        'input_token_details': {'audio': 0, 'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    },
    'claude-haiku-4-5-20251001': {
        'input_tokens': 8,
        'output_tokens': 21,
        'total_tokens': 29,
        'input_token_details': {'cache_read': 0, 'cache_creation': 0}
    }
}

Invocation config

When invoking a model, you can pass additional configuration through the config parameter using a RunnableConfig dictionary. This provides run-time control over execution behavior, callbacks, and metadata tracking. Common configuration options include:
Invocation with config
response = model.invoke(
    "Tell me a joke",
    config={
        "run_name": "joke_generation",      # Custom name for this run
        "tags": ["humor", "demo"],          # Tags for categorization
        "metadata": {"user_id": "123"},     # Custom metadata
        "callbacks": [my_callback_handler], # Callback handlers
    }
)
These configuration values are particularly useful when:
  • Debugging with LangSmith tracing
  • Implementing custom logging or monitoring
  • Controlling resource usage in production
  • Tracking invocations across complex pipelines
run_name
string
Identifies this specific invocation in logs and traces. Not inherited by sub-calls.
tags
string[]
Labels inherited by all sub-calls for filtering and organization in debugging tools.
metadata
object
Custom key-value pairs for tracking additional context, inherited by all sub-calls.
max_concurrency
number
Controls the maximum number of parallel calls when using batch() or batch_as_completed().
callbacks
array
Handlers for monitoring and responding to events during execution.
recursion_limit
number
Maximum recursion depth for chains to prevent infinite loops in complex pipelines.
See full RunnableConfig reference for all supported attributes.

Configurable models

You can also create a runtime-configurable model by specifying configurable_fields. If you don’t specify a model value, then 'model' and 'model_provider' will be configurable by default.
from langchain.chat_models import init_chat_model

configurable_model = init_chat_model(temperature=0)

configurable_model.invoke(
    "what's your name",
    config={"configurable": {"model": "gpt-5-nano"}},  # Run with GPT-5-Nano
)
configurable_model.invoke(
    "what's your name",
    config={"configurable": {"model": "claude-sonnet-4-5-20250929"}},  # Run with Claude
)
We can create a configurable model with default model values, specify which parameters are configurable, and add prefixes to configurable params:
first_model = init_chat_model(
        model="gpt-4.1-mini",
        temperature=0,
        configurable_fields=("model", "model_provider", "temperature", "max_tokens"),
        config_prefix="first",  # Useful when you have a chain with multiple models
)

first_model.invoke("what's your name")
first_model.invoke(
    "what's your name",
    config={
        "configurable": {
            "first_model": "claude-sonnet-4-5-20250929",
            "first_temperature": 0.5,
            "first_max_tokens": 100,
        }
    },
)
We can call declarative operations like bind_tools, with_structured_output, with_configurable, etc. on a configurable model and chain a configurable model in the same way that we would a regularly instantiated chat model object.
from pydantic import BaseModel, Field


class GetWeather(BaseModel):
    """Get the current weather in a given location"""

        location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


class GetPopulation(BaseModel):
    """Get the current population in a given location"""

        location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


model = init_chat_model(temperature=0)
model_with_tools = model.bind_tools([GetWeather, GetPopulation])

model_with_tools.invoke(
    "what's bigger in 2024 LA or NYC", config={"configurable": {"model": "gpt-4.1-mini"}}
).tool_calls
[
    {
        'name': 'GetPopulation',
        'args': {'location': 'Los Angeles, CA'},
        'id': 'call_Ga9m8FAArIyEjItHmztPYA22',
        'type': 'tool_call'
    },
    {
        'name': 'GetPopulation',
        'args': {'location': 'New York, NY'},
        'id': 'call_jh2dEvBaAHRaw5JUDthOs7rt',
        'type': 'tool_call'
    }
]
model_with_tools.invoke(
    "what's bigger in 2024 LA or NYC",
    config={"configurable": {"model": "claude-sonnet-4-5-20250929"}},
).tool_calls
[
    {
        'name': 'GetPopulation',
        'args': {'location': 'Los Angeles, CA'},
        'id': 'toolu_01JMufPf4F4t2zLj7miFeqXp',
        'type': 'tool_call'
    },
    {
        'name': 'GetPopulation',
        'args': {'location': 'New York City, NY'},
        'id': 'toolu_01RQBHcE8kEEbYTuuS8WqY1u',
        'type': 'tool_call'
    }
]

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.