数据集使您能够使用一致的数据随时间执行可重复的评估。数据集由示例组成,它们存储输入、输出和可选的参考输出。 本页概述了在 LangSmith UI创建管理数据集的各种方法。

创建数据集并添加示例

以下部分解释了在 LangSmith 中创建数据集并向其添加示例的不同方法。根据您的工作流程,您可以手动策划示例、自动从跟踪中捕获它们、导入文件,甚至生成合成数据:

从跟踪项目手动添加

构建数据集的常见模式是将应用程序中的显著跟踪转换为数据集示例。此方法要求您已配置跟踪到 LangSmith
构建数据集的一种技术是过滤最有趣的跟踪,例如标记有差用户反馈的跟踪,并将它们添加到数据集中。有关如何过滤跟踪的提示,请参阅过滤跟踪指南。
有两种方法可以从跟踪项目手动将数据添加到数据集。导航到 Tracing Projects 并选择项目。
  1. 从运行表中多选运行。在 Runs 选项卡上,多选运行。在页面底部,单击 Add to Dataset 运行表,选中了一个运行并在页面底部显示 Add to Dataset 按钮。
  2. Runs 选项卡上,从表中选择运行。在单个运行详细信息页面上,选择右上角的 Add to -> Dataset 添加到数据集 当您从运行详细信息页面选择数据集时,将弹出一个模态,让您知道是否应用了任何转换或架构验证是否失败。例如,下面的屏幕截图显示了使用转换来优化收集 LLM 运行的数据集。 确认 然后,您可以在将运行添加到数据集之前选择性地编辑它。

Automatically from a tracing project

You can use run rules to automatically add traces to a dataset based on certain conditions. For example, you could add all traces that are tagged with a specific use case or have a low feedback score.

From examples in an Annotation Queue

If you rely on subject matter experts to build meaningful datasets, use annotation queues to provide a streamlined view for reviewers. Human reviewers can optionally modify the inputs/outputs/reference outputs from a trace before it is added to the dataset.
Annotation queues can be optionally configured with a default dataset, though you can add runs to any dataset by using the dataset switcher on the bottom of the screen. Once you select the right dataset, click Add to Dataset or hit the hot key D to add the run to it. Any modifications you make to the run in your annotation queue will carry over to the dataset, and all metadata associated with the run will also be copied. Add to dataset from annotation queue Note you can also set up rules to add runs that meet specific criteria to an annotation queue using automation rules.

From the Prompt Playground

On the Prompt Playground page, select Set up Evaluation, click +New if you’re starting a new dataset or select from an existing dataset.
Creating datasets inline in the playground is not supported for datasets that have nested keys. In order to add/edit examples with nested keys, you must edit from the datasets page.
To edit the examples:
  • Use +Row to add a new example to the dataset
  • Delete an example using the dropdown on the right hand side of the table
  • If you’re creating a reference-free dataset remove the “Reference Output” column using the x button in the column. Note: this action is not reversible.
Create a dataset in the playground

Import a dataset from a CSV or JSONL file

On the Datasets & Experiments page, click +New Dataset, then Import an existing dataset from CSV or JSONL file.

Create a new dataset from the Datasets & Experiments page

  1. Navigate to the Datasets & Experiments page from the left-hand menu.
  2. Click + New Dataset.
  3. On the New Dataset page, select the Create from scratch tab.
  4. Add a name and description for the dataset.
  5. (Optional) Create a dataset schema to validate your dataset.
  6. Click Create, which will create an empty dataset.
  7. To add examples inline, on the dataset’s page, go to the Examples tab. Click + Example.
  8. Define examples in JSON and click Submit. For more details on dataset splits, refer to Create and manage dataset splits.

Add synthetic examples created by an LLM

If you have existing examples and a schema defined on your dataset, when you click + Example there is an option to Add AI-Generated Examples. This will use an LLM to create synthetic examples. In Generate examples, do the following:
  1. Click API Key in the top right of the pane to set your OpenAI API key as a workspace secret. If your workspace already has an OpenAI API key set, you can skip this step.
  2. Select : Toggle Automatic or Manual reference examples. You can select these examples manually from your dataset or use the automatic selection option.
  3. Enter the number of synthetic examples you want to generate.
  4. Click Generate.
    The AI-Generated Examples configuration window. Selections for manual and automatic and number of examples to generate.
  5. The examples will appear on the Select generated examples page. Choose which examples to add to your dataset, with the option to edit them before finalizing. Click Save Examples.
  6. Each example will be validated against your specified dataset schema and tagged as synthetic in the source metadata.
    Select generated examples page with generated examples selected and Save examples button.

Manage a dataset

Create a dataset schema

LangSmith datasets store arbitrary JSON objects. We recommend (but do not require) that you define a schema for your dataset to ensure that they conform to a specific JSON schema. Dataset schemas are defined with standard JSON schema, with the addition of a few prebuilt types that make it easier to type common primitives like messages and tools. Certain fields in your schema have a + Transformations option. Transformations are preprocessing steps that, if enabled, update your examples when you add them to the dataset. For example the convert to OpenAI messages transformation will convert message-like objects, like LangChain messages, to OpenAI message format. For the full list of available transformations, see our reference.
If you plan to collect production traces in your dataset from LangChain ChatModels or from OpenAI calls using the LangSmith OpenAI wrapper, we offer a prebuilt Chat Model schema that converts messages and tools into industry standard openai formats that can be used downstream with any model for testing. You can also customize the template settings to match your use case.Please see the dataset transformations reference for more information.

Create and manage dataset splits

数据集拆分是将数据集划分为多段,以便按需分区。例如,在机器学习工作流程中,将数据集划分为训练集、验证集和测试集非常常见,这有助于防止过拟合——即模型在训练数据上表现良好,但在未见过的数据上表现不佳。在评估工作流程中,如果数据集涵盖多个类别并希望分别评估,或正在测试未来可能纳入数据集的新用例但暂时希望保持独立时,拆分也很有用。请注意,您也可以通过元数据手动实现类似效果;不过我们仍建议使用拆分对数据集进行更高层级的组织,以便为评估划分不同的组,而元数据更适合用于记录示例的标签或来源信息。 In machine learning, it is best practice to keep your splits separate (each example belongs to exactly one split). However, we allow you to select multiple splits for the same example in LangSmith because it can make sense for some evaluation workflows - for example, if an example falls into multiple categories on which you may want to evaluate your application. In order to create and manage splits in the app, you can select some examples in your dataset and click “Add to Split”. From the resulting popup menu, you can select and unselect splits for the selected examples, or create a new split. Add to Split

Edit example metadata

You can add metadata to your examples by clicking on an example and then clicking “Edit” on the top righthand side of the popover. From this page, you can update/delete existing metadata, or add new metadata. You may use this to store information about your examples, such as tags or version info, which you can then group by when analyzing experiment results or filter by when you call list_examples in the SDK. Add Metadata

Filter examples

You can filter examples by split, metadata key/value or perform full-text search over examples. These filtering options are available to the top left of the examples table.
  • Filter by split: Select split > Select a split to filter by
  • Filter by metadata: Filters > Select “Metadata” from the dropdown > Select the metadata key and value to filter on
  • Full-text search: Filters > Select “Full Text” from the dropdown > Enter your search criteria
You may add multiple filters, and only examples that satisfy all of the filters will be displayed in the table. Filters Applied to Examples
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.