数据集转换

LangSmith 允许您将转换附加到数据集架构中的字段，这些转换在将数据添加到数据集之前应用于数据，无论是从 UI、API 还是运行规则。与 LangSmith 的预构建 JSON 架构类型结合使用，这些允许您在将数据保存到数据集之前轻松预处理数据。

转换类型

转换类型	目标类型	功能
remove_system_messages	Array[Message]	过滤消息列表以删除任何系统消息。
convert_to_openai_message	Message Array[Message]	使用 langchain 的 convert_to_openai_messages 将任何传入数据从 LangChain 的内部序列化格式转换为 OpenAI 的标准消息格式。如果目标字段标记为必需，并且在输入时未找到匹配的消息，它将尝试从几个众所周知的 LangSmith 跟踪格式中提取消息（或消息列表）（例如，任何跟踪的 LangChain BaseChatModel 运行或来自 LangSmith OpenAI wrapper 的跟踪运行），并删除包含消息的原始键。
convert_to_openai_tool	Array[Tool] 仅在 inputs 字典的顶级字段中可用。	使用 langchain 的 convert_to_openai_tool 将任何传入数据转换为 OpenAI 标准工具格式。如果存在/在指定键处未找到工具，将从运行的调用参数中提取工具定义。这很有用，因为 LangChain 聊天模型将工具定义跟踪到运行的 `extra.invocation_params` 字段而不是输入。
remove_extra_fields	Object	删除此目标对象的架构中未定义的任何字段。

聊天模型预构建架构

转换的主要用例是简化将生产跟踪收集到数据集中的过程，格式可以跨模型提供商标准化，用于评估/少样本提示/等下游用途。为了简化最终用户的转换设置，LangSmith 提供了一个预定义的架构，它将执行以下操作：

从您收集的运行中提取消息并将其转换为 openai 标准格式，这使得它们与所有 LangChain ChatModels 和大多数模型提供商的 SDK 兼容，用于下游评估和实验
提取您的 LLM 使用的任何工具，并将其添加到您的示例输入中，用于下游评估的可重现性

想要迭代其系统提示的用户在使用我们的聊天模型架构时，通常还会在其输入消息上添加”删除系统消息”转换，这将防止您将系统提示保存到数据集中。

兼容性

LLM 运行收集架构旨在从 LangChain BaseChatModel 运行或来自 LangSmith OpenAI wrapper 的跟踪运行中收集数据。如果您正在跟踪的 LLM 运行不兼容，请联系 support@langchain.dev，我们可以扩展支持。如果您想将转换应用于其他类型的运行（例如，使用消息历史记录表示 LangGraph 状态），请直接定义您的架构并手动添加相关转换。

启用

当从跟踪项目或注释队列将运行添加到数据集时，如果它具有 LLM 运行类型，我们将默认应用聊天模型架构。有关在新数据集上启用，请参阅我们的数据集管理操作指南。

规范

有关预构建架构的完整 API 规范，请参阅以下部分：

Input schema

{
  "type": "object",
  "properties": {
    "messages": {
      "type": "array",
      "items": {
        "$ref": "https://api.smith.langchain.com/public/schemas/v1/message.json"
      }
    },
    "tools": {
      "type": "array",
      "items": {
        "$ref": "https://api.smith.langchain.com/public/schemas/v1/tooldef.json"
      }
    }
  },
  "required": ["messages"]
}

Output schema

{
  "type": "object",
  "properties": {
    "message": {
      "$ref": "https://api.smith.langchain.com/public/schemas/v1/message.json"
    }
  },
  "required": ["message"]
}

Transformations

And the transformations look as follows:

[
  {
    "path": ["inputs"],
    "transformation_type": "remove_extra_fields"
  },
  {
    "path": ["inputs", "messages"],
    "transformation_type": "convert_to_openai_message"
  },
  {
    "path": ["inputs", "tools"],
    "transformation_type": "convert_to_openai_tool"
  },
  {
    "path": ["outputs"],
    "transformation_type": "remove_extra_fields"
  },
  {
    "path": ["outputs", "message"],
    "transformation_type": "convert_to_openai_message"
  }
]

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Datasets

Set up evaluations

Analyze experiment results

Annotation & human feedback

Common data types

转换类型

聊天模型预构建架构

兼容性

启用

规范

Input schema

Output schema

Transformations

Datasets

Set up evaluations

Analyze experiment results

Annotation & human feedback

Common data types

​转换类型

​聊天模型预构建架构

​兼容性

​启用

​规范

​Input schema

​Output schema

​Transformations

转换类型

聊天模型预构建架构

兼容性

启用

规范

Input schema

Output schema

Transformations