本指南将引导您解决运行自托管 LangSmith 实例时可能遇到的常见问题。 在运行 LangSmith 时,您可能会遇到意外的 500 错误、性能缓慢或其他问题。本指南将帮助您诊断和解决这些问题。

获取有用的信息

要诊断和解决问题,您首先需要检索一些相关信息。以下部分解释如何为 Kubernetes 或 Docker 设置执行此操作,以及如何提取有用的浏览器信息。 通常,您想要分析的主要服务是:
  • langsmith-backend:处理 CRUD API 请求、业务逻辑、来自前端和 SDK 的请求、跟踪准备以进行摄取以及中心 API。
  • langsmith-platform-backend:处理身份验证、运行摄取和其他高容量任务。
  • langsmith-queue:处理传入的跟踪和反馈、异步摄取和持久化到数据存储、数据完整性检查以及数据库错误或连接问题期间的重试。
有关这些服务的更多详细信息,请参阅架构概述

Kubernetes

故障排除的第一步是收集有关 LangSmith 部署的重要调试信息。服务日志、kubernetes 事件和容器的资源利用率可以帮助识别问题的根本原因。 您可以运行我们的 k8s 故障排除脚本,该脚本将提取所有相关的 kubernetes 信息并将其输出到文件夹以供调查。该脚本还将此文件夹压缩为 zip 文件以供共享。以下是如何运行此脚本的示例,假设您的 langsmith 部署在 langsmith 命名空间中启动:
bash get_k8s_debugging_info.sh --namespace langsmith
然后,您可以检查生成的文件夹的内容以查找任何相关错误或信息。如果您希望 LangSmith 团队协助调试,请与团队共享此 zip 文件。

Docker

如果在 Docker 上运行,您可以通过运行以下命令检查部署的日志:
docker compose logs >> logs.txt

浏览器错误

如果您遇到显示为浏览器错误的问题,检查可能包含关键信息的 HAR 文件也可能有所帮助。要获取 HAR 文件,您可以遵循本指南,该指南解释了各种浏览器的简短过程。 接着可以使用 Google HAR 分析器 检查文件,也可以将 HAR 文件发给 LangSmith 团队协助排查。

Common issues

DB::Exception: Cannot reserve 1.00 MiB, not enough space: While executing WaitForAsyncInsert. (NOT_ENOUGH_SPACE)

This error occurs when ClickHouse runs out of disk space. You will need to increase the disk space available to ClickHouse.

Kubernetes

In Kubernetes, you will need to increase the size of the ClickHouse PVC. To achieve this, you can perform the following steps:
  1. Get the storage class of the PVC: kubectl get pvc data-langsmith-clickhouse-0 -n <namespace> -o jsonpath='{.spec.storageClassName}'
  2. Ensure the storage class has AllowVolumeExpansion: true: kubectl get sc <storage-class-name> -o jsonpath='{.allowVolumeExpansion}'
    • If it is false, some storage classes can be updated to allow volume expansion.
    • To update the storage class, you can run kubectl patch sc <storage-class-name> -p '{"allowVolumeExpansion": true}'
    • If this fails, you may need to create a new storage class with the correct settings.
  3. Edit your pvc to have the new size: kubectl edit pvc data-langsmith-clickhouse-0 -n <namespace> or kubectl patch pvc data-langsmith-clickhouse-0 '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' -n <namespace>
  4. Update your helm chart langsmith_config.yaml to new size(e.g 100 Gi)
  5. Delete the clickhouse statefulset kubectl delete statefulset langsmith-clickhouse --cascade=orphan -n <namespace>
  6. Apply helm chart with updated size (You can follow the upgrade guide here)
  7. Your pvc should now have the new size. Verify by running kubectl get pvc and kubectl exec langsmith-clickhouse-0 -- bash -c "df"

Docker

In Docker, you will need to increase the size of the ClickHouse volume. To achieve this, you can perform the following steps:
  1. Stop your instance of LangSmith. docker compose down
  2. If using bind mount, you will need to increase the size of the mount point.
  3. If using a docker volume, you will need to allocate more space to the volume/docker.

error: Dirty database version ‘version’. Fix and force version

This error occurs when the ClickHouse database is in an inconsistent state with our migrations. You will need to reset to an earlier database version and then rerun your upgrade/migrations.

Kubernetes

  1. Force migration to an earlier version, where version = dirty version - 1.
kubectl exec -it deployments/langsmith-backend -- bash -c 'migrate -source "file://clickhouse/migrations" -database "clickhouse://$CLICKHOUSE_HOST:$CLICKHOUSE_NATIVE_PORT?username=$CLICKHOUSE_USER&password=$CLICKHOUSE_PASSWORD&database=$CLICKHOUSE_DB&x-multi-statement=true&x-migrations-table-engine=MergeTree&secure=$CLICKHOUSE_TLS" force <version>'
  1. Rerun your upgrade/migrations.

Docker

  1. Force migration to an earlier version, where version = dirty version - 1.
docker compose exec langchain-backend migrate -source "file://clickhouse/migrations" -database "clickhouse://$CLICKHOUSE_HOST:$CLICKHOUSE_NATIVE_PORT?username=$CLICKHOUSE_USER&password=$CLICKHOUSE_PASSWORD&database=$CLICKHOUSE_DB&x-multi-statement=true&x-migrations-table-engine=MergeTree&secure=$CLICKHOUSE_TLS" force <version>
  1. Rerun your upgrade/migrations.

413 - Request Entity Too Large

This error occurs when the request size exceeds the maximum allowed size. You will need to increase the maximum request size in your Nginx configuration.

Kubernetes

  1. Edit your langsmith_config.yaml and increase the frontend.maxBodySize value. This might look something like this:
frontend:
  maxBodySize: "100M"
  1. Apply your changes to the cluster.

Details: code: 497, message: default: Not enough privileges. To execute this query, it’s necessary to have the grant CREATE ROW POLICY ON default.feedbacks_rmt

This error occurs when your user does not have the necessary permissions to create row policies in Clickhouse. When deploying the Docker deployment, you need to copy the users.xml file from the github repo as well. This adds the <access_management> tag to the users.xml file, which allows the user to create row policies. Below is the default users.xml file that we expect to be used.
<clickhouse>
    <users>
        <default>
            <access_management>1</access_management>
            <named_collection_control>1</named_collection_control>
            <show_named_collections>1</show_named_collections>
            <show_named_collections_secrets>1</show_named_collections_secrets>
            <profile>default</profile>
        </default>
    </users>
    <profiles>
        <default>
            <async_insert>1</async_insert>
            <async_insert_max_data_size>2000000</async_insert_max_data_size>
            <wait_for_async_insert>0</wait_for_async_insert>
            <parallel_view_processing>1</parallel_view_processing>
            <allow_simdjson>0</allow_simdjson>
            <lightweight_deletes_sync>0</lightweight_deletes_sync>
        </default>
    </profiles>
</clickhouse>
In some environments, your mount point may not be writable by the container. In these cases we suggest building a custom image with the users.xml file included. Example Dockerfile:
FROM clickhouse/clickhouse-server:24.8
COPY ./users.xml /etc/clickhouse-server/users.d/users.xml
Then take the following steps:
  1. Build your custom image.
docker build -t <image-name> .
  1. Update your docker-compose.yaml to use the custom image. Make sure to remove the users.xml mount point.
langchain-clickhouse:
  image: <image-name>
  1. Restart your instance of LangSmith.
docker compose down --volumes
docker compose up

ClickHouse fails to start up when running a cluster with AquaSec

In some environments, AquaSec may prevent ClickHouse from starting up correctly. This may manifest as the ClickHouse pod not emitting any logs and failing to get marked as ready. Generally this is due to LD_PRELOAD being set by AquaSec, which interferes with ClickHouse. To resolve this, you can add the following environment variable to your ClickHouse deployment:

Kubernetes

Edit your langsmith_config.yaml (or corresponding config file) and set the AQUA_SKIP_LD_PRELOAD environment variable:
clickhouse:
  statefulSet:
    extraEnv:
      - name: AQUA_SKIP_LD_PRELOAD
        value: "true"

Docker

Edit your docker-compose.yaml and set the AQUA_SKIP_LD_PRELOAD environment variable:
langchain-clickhouse:
  environment:
    - AQUA_SKIP_LD_PRELOAD=true

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.