计划限制适用请注意,数据导出功能仅支持 LangSmith Plus 或 Enterprise 层级
LangSmith 的批量数据导出功能允许您将跟踪导出到外部目标。如果您想在 BigQuery、Snowflake、RedShift、Jupyter Notebooks 等工具中离线分析数据,这会很有用。 可以启动导出以针对特定的 LangSmith 项目和日期范围。启动批量导出后,我们的系统将处理导出过程的编排和弹性。 请注意,导出数据可能需要一些时间,具体取决于数据的大小。我们还限制了可以同时运行的导出数量。 批量导出还有 24 小时的运行时超时。

目标

目前我们支持导出到您提供的 S3 存储桶或 S3 API 兼容存储桶。数据将以 Parquet 列格式导出。此格式将允许您轻松地将数据导入到其他系统。数据导出将包含与运行数据格式相同的数据字段。

导出数据

目标 - 提供 S3 存储桶

要导出 LangSmith 数据,您需要提供将数据导出到的 S3 存储桶。 导出需要以下信息:
  • Bucket Name:将数据导出到的 S3 存储桶的名称。
  • Prefix:存储桶内将数据导出到的根前缀。
  • S3 Region:存储桶的区域 - AWS S3 存储桶需要此项。
  • Endpoint URL:S3 存储桶的端点 URL - S3 API 兼容存储桶需要此项。
  • Access Key:S3 存储桶的访问密钥。
  • Secret Key:S3 存储桶的密钥。
我们支持任何 S3 兼容存储桶,对于非 AWS 存储桶(如 GCS 或 MinIO),您需要提供端点 URL。

准备目标

对于自托管和欧盟区域部署对于自托管安装或欧盟区域的组织,请在下面的请求中适当更新 LangSmith URL。 对于欧盟区域,使用 eu.api.smith.langchain.com
所需权限backendqueue 服务都需要对目标存储桶的写访问权限:
  • 创建导出目标时,backend 服务尝试将测试文件写入目标存储桶。 如果有权限,它将删除测试文件(删除访问权限是可选的)。
  • queue 服务负责批量导出执行和将文件上传到存储桶。
以下示例演示如何使用 cURL 创建目标。将占位符值替换为您的实际配置详细信息。 请注意,凭据将以加密形式安全地存储在我们的系统中。
curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "destination_type": "s3",
    "display_name": "My S3 Destination",
    "config": {
      "bucket_name": "your-s3-bucket-name",
      "prefix": "root_folder_prefix",
      "region": "your aws s3 region",
      "endpoint_url": "your endpoint url for s3 compatible buckets"
    },
    "credentials": {
      "access_key_id": "YOUR_S3_ACCESS_KEY_ID",
      "secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
    }
  }'
使用返回的 id 在后续批量导出操作中引用此目标。 如果在创建目标时收到错误,请参阅调试目标错误以获取有关如何调试此问题的详细信息。

凭据配置

需要 LangSmith Helm 版本 >= 0.10.34(应用程序版本 >= 0.10.91
除了静态 access_key_idsecret_access_key 之外,我们还支持以下其他凭据格式:
  • 要使用包含 AWS 会话令牌的临时凭据, 在创建批量导出目标时另外提供 credentials.session_token 键。
  • (仅自托管):要使用基于环境的凭据,例如使用 AWS IAM Roles for Service Accounts (IRSA), 在创建批量导出目标时从请求中省略 credentials 键。 在这种情况下,将按库定义的顺序检查标准 Boto3 凭据位置

AWS S3 存储桶

对于 AWS S3,您可以省略 endpoint_url 并提供与存储桶区域匹配的区域。
curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "destination_type": "s3",
    "display_name": "My AWS S3 Destination",
    "config": {
      "bucket_name": "my_bucket",
      "prefix": "data_exports",
      "region": "us-east-1"
    },
    "credentials": {
      "access_key_id": "YOUR_S3_ACCESS_KEY_ID",
      "secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
    }
  }'

Google GCS XML S3 compatible bucket

When using Google’s GCS bucket, you need to use the XML S3 compatible API, and supply the endpoint_url which is typically https://storage.googleapis.com. Here is an example of the API request when using the GCS XML API which is compatible with S3:
curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "destination_type": "s3",
    "display_name": "My GCS Destination",
    "config": {
      "bucket_name": "my_bucket",
      "prefix": "data_exports",
      "endpoint_url": "https://storage.googleapis.com"
    },
    "credentials": {
      "access_key_id": "YOUR_S3_ACCESS_KEY_ID",
      "secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
    }
  }'
See Google documentation for more info

Create an export job

To export data, you will need to create an export job. This job will specify the destination, the project, the date range, and filter expression of the data to export. The filter expression is used to narrow down the set of runs exported and is optional. Not setting the filter field will export all runs. Refer to our filter query language and examples to determine the correct filter expression for your export. You can use the following cURL command to create the job:
curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "bulk_export_destination_id": "your_destination_id",
    "session_id": "project_uuid",
    "start_time": "2024-01-01T00:00:00Z",
    "end_time": "2024-01-02T23:59:59Z",
    "filter": "and(eq(run_type, \"llm\"), eq(name, \"ChatOpenAI\"), eq(input_key, \"messages.content\"), like(input_value, \"%messages.content%\"))"
  }'
The session_id is also known as the Tracing Project ID, which can be copied from the individual project view by clicking into the project in the Tracing Projects list.
Use the returned id to reference this export in subsequent bulk export operations.

Scheduled exports

Requires LangSmith Helm version >= 0.10.42 (application version >= 0.10.109)
Scheduled exports collect runs periodically and export to the configured destination. To create a scheduled export, include interval_hours and remove end_time:
curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "bulk_export_destination_id": "your_destination_id",
    "session_id": "project_uuid",
    "start_time": "2024-01-01T00:00:00Z",
    "filter": "and(eq(run_type, \"llm\"), eq(name, \"ChatOpenAI\"), eq(input_key, \"messages.content\"), like(input_value, \"%messages.content%\"))",
    "interval_hours": 1
  }'
Details
  • interval_hours must be between 1 hour and 168 hours (1 week) inclusive.
  • For spawned exports, the first time range exported is start_time=(scheduled_export_start_time), end_time=(start_time + interval_hours). Then start_time=(previous_export_end_time), end_time=(this_export_start_time + interval_hours), and so on.
  • end_time must be omitted for scheduled exports. end_time is still required for non-scheduled exports.
  • Scheduled exports can be stopped by cancelling the export.
    • Exports that have been spawned by a scheduled export have the source_bulk_export_id attribute filled.
    • If desired, these spawned bulk exports must be canceled separately from the source scheduled bulk export - canceling the source bulk export does not cancel the spawned bulk exports.
  • Spawned exports run at end_time + 10 minutes to account for any runs that are submitted with end_time in the recent past.
Example If a scheduled bulk export is created with start_time=2025-07-16T00:00:00Z and interval_hours=6:
ExportStart TimeEnd TimeRuns At
12025-07-16T00:00:00Z2025-07-16T06:00:00Z2025-07-16T06:10:00Z
22025-07-16T06:00:00Z2025-07-16T12:00:00Z2025-07-16T12:10:00Z
32025-07-16T12:00:00Z2025-07-16T18:00:00Z2025-07-16T18:10:00Z

Monitoring the Export Job

Monitor Export Status

To monitor the status of an export job, use the following cURL command:
curl --request GET \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID'
Replace {export_id} with the ID of the export you want to monitor. This command retrieves the current status of the specified export job.

List Runs for an Export

An export is typically broken up into multiple runs which correspond to a specific date partition to export. To list all runs associated with a specific export, use the following cURL command:
curl --request GET \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}/runs' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID'
This command fetches all runs related to the specified export, providing details such as run ID, status, creation time, rows exported, etc.

List All Exports

To retrieve a list of all export jobs, use the following cURL command:
curl --request GET \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID'
This command returns a list of all export jobs along with their current statuses and creation timestamps.

Stop an Export

To stop an existing export, use the following cURL command:
curl --request PATCH \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "status": "Cancelled"
}'
{export_id} 替换为要取消的导出任务 ID。请注意,任务一旦被取消就无法重新启动, 需要重新创建新的导出任务。

Partitioning Scheme

Data will be exported into your bucket into the follow Hive partitioned format:
<bucket>/<prefix>/export_id=<export_id>/tenant_id=<tenant_id>/session_id=<session_id>/runs/year=<year>/month=<month>/day=<day>

Importing Data into other systems

Importing data from S3 and Parquet format is commonly supported by the majority of analytical systems. See below for documentation links:

BigQuery

To import your data into BigQuery, see Loading Data from Parquet and also Hive Partitioned loads.

Snowflake

You can load data into Snowflake from S3 by following the Load from Cloud Document.

RedShift

You can COPY data from S3 or Parquet into Amazon Redshift by following the AWS COPY command documentation.

Clickhouse

You can directly query data in S3 / Parquet format in Clickhouse. As an example, if using GCS, you can query the data as follows:
SELECT count(distinct id) FROM s3('https://storage.googleapis.com/<bucket>/<prefix>/export_id=<export_id>/**',
 'access_key_id', 'access_secret', 'Parquet')
See Clickhouse S3 Integration Documentation for more information.

DuckDB

You can query the data from S3 in-memory with SQL using DuckDB. See S3 import Documentation.

Error Handling

Debugging Destination Errors

The destinations API endpoint will validate that the destination and credentials are valid and that write access is is present for the bucket. If you receive an error, and would like to debug this error, you can use the AWS CLI to test the connectivity to the bucket. You should be able to write a file with the CLI using the same data that you supplied to the destinations API above. AWS S3:
aws configure

# set the same access key credentials and region as you used for the destination
> AWS Access Key ID: <access_key_id>
> AWS Secret Access Key: <secret_access_key>
> Default region name [us-east-1]: <region>

# List buckets
aws s3 ls /

# test write permissions
touch ./test.txt
aws s3 cp ./test.txt s3://<bucket-name>/tmp/test.txt
GCS Compatible Buckets: You will need to supply the endpoint_url with --endpoint-url option. For GCS, the endpoint_url is typically https://storage.googleapis.com:
aws configure

# set the same access key credentials and region as you used for the destination
> AWS Access Key ID: <access_key_id>
> AWS Secret Access Key: <secret_access_key>
> Default region name [us-east-1]: <region>

# List buckets
aws s3 --endpoint-url=<endpoint_url> ls /

# test write permissions
touch ./test.txt
aws s3 --endpoint-url=<endpoint_url> cp ./test.txt s3://<bucket-name>/tmp/test.txt

Monitoring Runs

You can monitor your runs using the List Runs API. If this is a known error, this will be added to the errors field of the run.

Common Errors

Here are some common errors:
ErrorDescription
Access deniedThe blob store credentials or bucket are not valid. This error occurs when the provided access key and secret key combination doesn’t have the necessary permissions to access the specified bucket or perform the required operations.
Bucket is not validThe specified blob store bucket is not valid. This error is thrown when the bucket doesn’t exist or there is not enough access to perform writes on the bucket.
Key ID you provided does not existThe blob store credentials provided are not valid. This error occurs when the access key ID used for authentication is not a valid key.
Invalid endpointThe endpoint_url provided is invalid. This error is raised when the specified endpoint is an invalid endpoint. Only S3 compatible endpoints are supported, for example https://storage.googleapis.com for GCS, https://play.min.io for minio, etc. If using AWS, you should omit the endpoint_url.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.