计划限制适用请注意,数据导出功能仅支持 LangSmith Plus 或 Enterprise 层级。
目标
目前我们支持导出到您提供的 S3 存储桶或 S3 API 兼容存储桶。数据将以 Parquet 列格式导出。此格式将允许您轻松地将数据导入到其他系统。数据导出将包含与运行数据格式相同的数据字段。导出数据
目标 - 提供 S3 存储桶
要导出 LangSmith 数据,您需要提供将数据导出到的 S3 存储桶。 导出需要以下信息:- Bucket Name:将数据导出到的 S3 存储桶的名称。
- Prefix:存储桶内将数据导出到的根前缀。
- S3 Region:存储桶的区域 - AWS S3 存储桶需要此项。
- Endpoint URL:S3 存储桶的端点 URL - S3 API 兼容存储桶需要此项。
- Access Key:S3 存储桶的访问密钥。
- Secret Key:S3 存储桶的密钥。
准备目标
对于自托管和欧盟区域部署对于自托管安装或欧盟区域的组织,请在下面的请求中适当更新 LangSmith URL。
对于欧盟区域,使用
eu.api.smith.langchain.com。所需权限
backend 和 queue 服务都需要对目标存储桶的写访问权限:- 创建导出目标时,
backend服务尝试将测试文件写入目标存储桶。 如果有权限,它将删除测试文件(删除访问权限是可选的)。 queue服务负责批量导出执行和将文件上传到存储桶。
id 在后续批量导出操作中引用此目标。
如果在创建目标时收到错误,请参阅调试目标错误以获取有关如何调试此问题的详细信息。
凭据配置
需要 LangSmith Helm 版本 >=
0.10.34(应用程序版本 >= 0.10.91)access_key_id 和 secret_access_key 之外,我们还支持以下其他凭据格式:
- 要使用包含 AWS 会话令牌的临时凭据,
在创建批量导出目标时另外提供
credentials.session_token键。 - (仅自托管):要使用基于环境的凭据,例如使用 AWS IAM Roles for Service Accounts (IRSA),
在创建批量导出目标时从请求中省略
credentials键。 在这种情况下,将按库定义的顺序检查标准 Boto3 凭据位置。
AWS S3 存储桶
对于 AWS S3,您可以省略endpoint_url 并提供与存储桶区域匹配的区域。
Google GCS XML S3 compatible bucket
When using Google’s GCS bucket, you need to use the XML S3 compatible API, and supply theendpoint_url
which is typically https://storage.googleapis.com.
Here is an example of the API request when using the GCS XML API which is compatible with S3:
Create an export job
To export data, you will need to create an export job. This job will specify the destination, the project, the date range, and filter expression of the data to export. The filter expression is used to narrow down the set of runs exported and is optional. Not setting the filter field will export all runs. Refer to our filter query language and examples to determine the correct filter expression for your export. You can use the following cURL command to create the job:The
session_id is also known as the Tracing Project ID, which can be copied from the individual project view by clicking into the project in the Tracing Projects list.id to reference this export in subsequent bulk export operations.
Scheduled exports
Requires LangSmith Helm version >=
0.10.42 (application version >= 0.10.109)interval_hours and remove end_time:
interval_hoursmust be between 1 hour and 168 hours (1 week) inclusive.- For spawned exports, the first time range exported is
start_time=(scheduled_export_start_time), end_time=(start_time + interval_hours). Thenstart_time=(previous_export_end_time), end_time=(this_export_start_time + interval_hours), and so on. end_timemust be omitted for scheduled exports.end_timeis still required for non-scheduled exports.- Scheduled exports can be stopped by cancelling the export.
- Exports that have been spawned by a scheduled export have the
source_bulk_export_idattribute filled. - If desired, these spawned bulk exports must be canceled separately from the source scheduled bulk export - canceling the source bulk export does not cancel the spawned bulk exports.
- Exports that have been spawned by a scheduled export have the
- Spawned exports run at
end_time + 10 minutesto account for any runs that are submitted withend_timein the recent past.
start_time=2025-07-16T00:00:00Z and interval_hours=6:
| Export | Start Time | End Time | Runs At |
|---|---|---|---|
| 1 | 2025-07-16T00:00:00Z | 2025-07-16T06:00:00Z | 2025-07-16T06:10:00Z |
| 2 | 2025-07-16T06:00:00Z | 2025-07-16T12:00:00Z | 2025-07-16T12:10:00Z |
| 3 | 2025-07-16T12:00:00Z | 2025-07-16T18:00:00Z | 2025-07-16T18:10:00Z |
Monitoring the Export Job
Monitor Export Status
To monitor the status of an export job, use the following cURL command:{export_id} with the ID of the export you want to monitor. This command retrieves the current status of the specified export job.
List Runs for an Export
An export is typically broken up into multiple runs which correspond to a specific date partition to export. To list all runs associated with a specific export, use the following cURL command:List All Exports
To retrieve a list of all export jobs, use the following cURL command:Stop an Export
To stop an existing export, use the following cURL command:{export_id} 替换为要取消的导出任务 ID。请注意,任务一旦被取消就无法重新启动,
需要重新创建新的导出任务。
Partitioning Scheme
Data will be exported into your bucket into the follow Hive partitioned format:Importing Data into other systems
Importing data from S3 and Parquet format is commonly supported by the majority of analytical systems. See below for documentation links:BigQuery
To import your data into BigQuery, see Loading Data from Parquet and also Hive Partitioned loads.Snowflake
You can load data into Snowflake from S3 by following the Load from Cloud Document.RedShift
You can COPY data from S3 or Parquet into Amazon Redshift by following the AWS COPY command documentation.Clickhouse
You can directly query data in S3 / Parquet format in Clickhouse. As an example, if using GCS, you can query the data as follows:DuckDB
You can query the data from S3 in-memory with SQL using DuckDB. See S3 import Documentation.Error Handling
Debugging Destination Errors
The destinations API endpoint will validate that the destination and credentials are valid and that write access is is present for the bucket. If you receive an error, and would like to debug this error, you can use the AWS CLI to test the connectivity to the bucket. You should be able to write a file with the CLI using the same data that you supplied to the destinations API above. AWS S3:--endpoint-url option.
For GCS, the endpoint_url is typically https://storage.googleapis.com:
Monitoring Runs
You can monitor your runs using the List Runs API. If this is a known error, this will be added to theerrors field of the run.
Common Errors
Here are some common errors:| Error | Description |
|---|---|
| Access denied | The blob store credentials or bucket are not valid. This error occurs when the provided access key and secret key combination doesn’t have the necessary permissions to access the specified bucket or perform the required operations. |
| Bucket is not valid | The specified blob store bucket is not valid. This error is thrown when the bucket doesn’t exist or there is not enough access to perform writes on the bucket. |
| Key ID you provided does not exist | The blob store credentials provided are not valid. This error occurs when the access key ID used for authentication is not a valid key. |
| Invalid endpoint | The endpoint_url provided is invalid. This error is raised when the specified endpoint is an invalid endpoint. Only S3 compatible endpoints are supported, for example https://storage.googleapis.com for GCS, https://play.min.io for minio, etc. If using AWS, you should omit the endpoint_url. |