在大多数 LLM 应用程序中,您希望流式传输输出以最小化用户看到第一个令牌的时间。 LangSmith 的跟踪功能通过 generator 函数本地支持流式输出。以下是一个示例。
from langsmith import traceable
@traceable
def my_generator():
  for chunk in ["Hello", "World", "!"]:
      yield chunk
# Stream to the user
for output in my_generator():
  print(output)
# It also works with async functions
import asyncio
@traceable
async def my_async_generator():
  for chunk in ["Hello", "World", "!"]:
      yield chunk
# Stream to the user
async def main():
  async for output in my_async_generator():
      print(output)
asyncio.run(main())

Aggregate Results

By default, the outputs of the traced function are aggregated into a single array in LangSmith. If you want to customize how it is stored (for instance, concatenating the outputs into a single string), you can use the aggregate option (reduce_fn in python). This is especially useful for aggregating streamed LLM outputs.
Aggregating outputs only impacts the traced representation of the outputs. It doesn not alter the values returned by your function.
from langsmith import traceable
def concatenate_strings(outputs: list):
  return "".join(outputs)
@traceable(reduce_fn=concatenate_strings)
def my_generator():
  for chunk in ["Hello", "World", "!"]:
      yield chunk
# Stream to the user
for output in my_generator():
  print(output)
# It also works with async functions
import asyncio
@traceable(reduce_fn=concatenate_strings)
async def my_async_generator():
  for chunk in ["Hello", "World", "!"]:
      yield chunk
# Stream to the user
async def main():
  async for output in my_async_generator():
      print(output)
asyncio.run(main())

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.