Most LLMs have a maximum supported context window (denominated in tokens).One way to decide when to truncate messages is to count the tokens in the message history and truncate whenever it approaches that limit. If you’re using LangChain, you can use the trim messages utility and specify the number of tokens to keep from the list, as well as the strategy (e.g., keep the last max_tokens) to use for handling the boundary.To trim message history in an agent, use the @before_model middleware decorator:
from langchain.messages import RemoveMessagefrom langgraph.graph.message import REMOVE_ALL_MESSAGESfrom langgraph.checkpoint.memory import InMemorySaverfrom langchain.agents import create_agent, AgentStatefrom langchain.agents.middleware import before_modelfrom langgraph.runtime import Runtimefrom langchain_core.runnables import RunnableConfigfrom typing import Any@before_modeldef trim_messages(state: AgentState, runtime: Runtime) -> dict[str, Any] | None: """Keep only the last few messages to fit context window.""" messages = state["messages"] if len(messages) <= 3: return None # No changes needed first_msg = messages[0] recent_messages = messages[-3:] if len(messages) % 2 == 0 else messages[-4:] new_messages = [first_msg] + recent_messages return { "messages": [ RemoveMessage(id=REMOVE_ALL_MESSAGES), *new_messages ] }agent = create_agent( model, tools=tools, middleware=[trim_messages], checkpointer=InMemorySaver(),)config: RunnableConfig = {"configurable": {"thread_id": "1"}}agent.invoke({"messages": "hi, my name is bob"}, config)agent.invoke({"messages": "write a short poem about cats"}, config)agent.invoke({"messages": "now do the same but for dogs"}, config)final_response = agent.invoke({"messages": "what's my name?"}, config)final_response["messages"][-1].pretty_print()"""================================== Ai Message ==================================Your name is Bob. You told me that earlier.If you'd like me to call you a nickname or use a different name, just say the word."""
You can delete messages from the graph state to manage the message history.This is useful when you want to remove specific messages or clear the entire message history.To delete messages from the graph state, you can use the RemoveMessage.For RemoveMessage to work, you need to use a state key with add_messagesreducer.The default AgentState provides this.To remove specific messages:
from langchain.messages import RemoveMessage def delete_messages(state): messages = state["messages"] if len(messages) > 2: # remove the earliest two messages return {"messages": [RemoveMessage(id=m.id) for m in messages[:2]]}
To remove all messages:
from langgraph.graph.message import REMOVE_ALL_MESSAGESdef delete_messages(state): return {"messages": [RemoveMessage(id=REMOVE_ALL_MESSAGES)]}
When deleting messages, make sure that the resulting message history is valid. Check the limitations of the LLM provider you’re using. For example:
Some providers expect message history to start with a user message
Most providers require assistant messages with tool calls to be followed by corresponding tool result messages.
from langchain.messages import RemoveMessagefrom langchain.agents import create_agent, AgentStatefrom langchain.agents.middleware import after_modelfrom langgraph.checkpoint.memory import InMemorySaverfrom langgraph.runtime import Runtimefrom langchain_core.runnables import RunnableConfig@after_modeldef delete_old_messages(state: AgentState, runtime: Runtime) -> dict | None: """Remove old messages to keep conversation manageable.""" messages = state["messages"] if len(messages) > 2: # remove the earliest two messages return {"messages": [RemoveMessage(id=m.id) for m in messages[:2]]} return Noneagent = create_agent( "gpt-5-nano", tools=[], system_prompt="Please be concise and to the point.", middleware=[delete_old_messages], checkpointer=InMemorySaver(),)config: RunnableConfig = {"configurable": {"thread_id": "1"}}for event in agent.stream( {"messages": [{"role": "user", "content": "hi! I'm bob"}]}, config, stream_mode="values",): print([(message.type, message.content) for message in event["messages"]])for event in agent.stream( {"messages": [{"role": "user", "content": "what's my name?"}]}, config, stream_mode="values",): print([(message.type, message.content) for message in event["messages"]])
[('human', "hi! I'm bob")][('human', "hi! I'm bob"), ('ai', 'Hi Bob! Nice to meet you. How can I help you today? I can answer questions, brainstorm ideas, draft text, explain things, or help with code.')][('human', "hi! I'm bob"), ('ai', 'Hi Bob! Nice to meet you. How can I help you today? I can answer questions, brainstorm ideas, draft text, explain things, or help with code.'), ('human', "what's my name?")][('human', "hi! I'm bob"), ('ai', 'Hi Bob! Nice to meet you. How can I help you today? I can answer questions, brainstorm ideas, draft text, explain things, or help with code.'), ('human', "what's my name?"), ('ai', 'Your name is Bob. How can I help you today, Bob?')][('human', "what's my name?"), ('ai', 'Your name is Bob. How can I help you today, Bob?')]
The problem with trimming or removing messages, as shown above, is that you may lose information from culling of the message queue.
Because of this, some applications benefit from a more sophisticated approach of summarizing the message history using a chat model.To summarize message history in an agent, use the built-in SummarizationMiddleware:
from langchain.agents import create_agentfrom langchain.agents.middleware import SummarizationMiddlewarefrom langgraph.checkpoint.memory import InMemorySaverfrom langchain_core.runnables import RunnableConfigcheckpointer = InMemorySaver()agent = create_agent( model="gpt-4o", tools=[], middleware=[ SummarizationMiddleware( model="gpt-4o-mini", max_tokens_before_summary=4000, # Trigger summarization at 4000 tokens messages_to_keep=20, # Keep last 20 messages after summary ) ], checkpointer=checkpointer,)config: RunnableConfig = {"configurable": {"thread_id": "1"}}agent.invoke({"messages": "hi, my name is bob"}, config)agent.invoke({"messages": "write a short poem about cats"}, config)agent.invoke({"messages": "now do the same but for dogs"}, config)final_response = agent.invoke({"messages": "what's my name?"}, config)final_response["messages"][-1].pretty_print()"""================================== Ai Message ==================================Your name is Bob!"""
Access short term memory (state) in a tool using the ToolRuntime parameter.The tool_runtime parameter is hidden from the tool signature (so the model doesn’t see it), but the tool can access the state through it.
from langchain.agents import create_agent, AgentStatefrom langchain.tools import tool, ToolRuntimeclass CustomState(AgentState): user_id: str@tooldef get_user_info( runtime: ToolRuntime) -> str: """Look up user info.""" user_id = runtime.state["user_id"] return "User is John Smith" if user_id == "user_123" else "Unknown user"agent = create_agent( model="gpt-5-nano", tools=[get_user_info], state_schema=CustomState,)result = agent.invoke({ "messages": "look up user information", "user_id": "user_123"})print(result["messages"][-1].content)# > User is John Smith.
To modify the agent’s short-term memory (state) during execution, you can return state updates directly from the tools.This is useful for persisting intermediate results or making information accessible to subsequent tools or prompts.
from langchain.tools import tool, ToolRuntimefrom langchain_core.runnables import RunnableConfigfrom langchain.messages import ToolMessagefrom langchain.agents import create_agent, AgentStatefrom langgraph.types import Commandfrom pydantic import BaseModelclass CustomState(AgentState): user_name: strclass CustomContext(BaseModel): user_id: str@tooldef update_user_info( runtime: ToolRuntime[CustomContext, CustomState],) -> Command: """Look up and update user info.""" user_id = runtime.context.user_id name = "John Smith" if user_id == "user_123" else "Unknown user" return Command(update={ "user_name": name, # update the message history "messages": [ ToolMessage( "Successfully looked up user information", tool_call_id=runtime.tool_call_id ) ] })@tooldef greet( runtime: ToolRuntime[CustomContext, CustomState]) -> str: """Use this to greet the user once you found their info.""" user_name = runtime.state["user_name"] return f"Hello {user_name}!"agent = create_agent( model="gpt-5-nano", tools=[update_user_info, greet], state_schema=CustomState, context_schema=CustomContext, )agent.invoke( {"messages": [{"role": "user", "content": "greet the user"}]}, context=CustomContext(user_id="user_123"),)
Access short term memory (state) in middleware to create dynamic prompts based on conversation history or custom state fields.
from langchain.agents import create_agentfrom typing import TypedDictfrom langchain.agents.middleware import dynamic_prompt, ModelRequestclass CustomContext(TypedDict): user_name: strdef get_weather(city: str) -> str: """Get the weather in a city.""" return f"The weather in {city} is always sunny!"@dynamic_promptdef dynamic_system_prompt(request: ModelRequest) -> str: user_name = request.runtime.context["user_name"] system_prompt = f"You are a helpful assistant. Address the user as {user_name}." return system_promptagent = create_agent( model="gpt-5-nano", tools=[get_weather], middleware=[dynamic_system_prompt], context_schema=CustomContext,)result = agent.invoke( {"messages": [{"role": "user", "content": "What is the weather in SF?"}]}, context=CustomContext(user_name="John Smith"),)for msg in result["messages"]: msg.pretty_print()
Output
================================ Human Message =================================What is the weather in SF?================================== Ai Message ==================================Tool Calls: get_weather (call_WFQlOGn4b2yoJrv7cih342FG) Call ID: call_WFQlOGn4b2yoJrv7cih342FG Args: city: San Francisco================================= Tool Message =================================Name: get_weatherThe weather in San Francisco is always sunny!================================== Ai Message ==================================Hi John Smith, the weather in San Francisco is always sunny!
Access short term memory (state) in @before_model middleware to process messages before model calls.
from langchain.messages import RemoveMessagefrom langgraph.graph.message import REMOVE_ALL_MESSAGESfrom langgraph.checkpoint.memory import InMemorySaverfrom langchain.agents import create_agent, AgentStatefrom langchain.agents.middleware import before_modelfrom langgraph.runtime import Runtimefrom typing import Any@before_modeldef trim_messages(state: AgentState, runtime: Runtime) -> dict[str, Any] | None: """Keep only the last few messages to fit context window.""" messages = state["messages"] if len(messages) <= 3: return None # No changes needed first_msg = messages[0] recent_messages = messages[-3:] if len(messages) % 2 == 0 else messages[-4:] new_messages = [first_msg] + recent_messages return { "messages": [ RemoveMessage(id=REMOVE_ALL_MESSAGES), *new_messages ] }agent = create_agent( model, tools=tools, middleware=[trim_messages])config: RunnableConfig = {"configurable": {"thread_id": "1"}}agent.invoke({"messages": "hi, my name is bob"}, config)agent.invoke({"messages": "write a short poem about cats"}, config)agent.invoke({"messages": "now do the same but for dogs"}, config)final_response = agent.invoke({"messages": "what's my name?"}, config)final_response["messages"][-1].pretty_print()"""================================== Ai Message ==================================Your name is Bob. You told me that earlier.If you'd like me to call you a nickname or use a different name, just say the word."""