获取有用的信息
要诊断和解决问题,您首先需要检索一些相关信息。以下部分解释如何为 Kubernetes 或 Docker 设置执行此操作,以及如何提取有用的浏览器信息。 通常,您想要分析的主要服务是:langsmith-backend:处理 CRUD API 请求、业务逻辑、来自前端和 SDK 的请求、跟踪准备以进行摄取以及中心 API。langsmith-platform-backend:处理身份验证、运行摄取和其他高容量任务。langsmith-queue:处理传入的跟踪和反馈、异步摄取和持久化到数据存储、数据完整性检查以及数据库错误或连接问题期间的重试。
Kubernetes
故障排除的第一步是收集有关 LangSmith 部署的重要调试信息。服务日志、kubernetes 事件和容器的资源利用率可以帮助识别问题的根本原因。 您可以运行我们的 k8s 故障排除脚本,该脚本将提取所有相关的 kubernetes 信息并将其输出到文件夹以供调查。该脚本还将此文件夹压缩为 zip 文件以供共享。以下是如何运行此脚本的示例,假设您的 langsmith 部署在langsmith 命名空间中启动:
Docker
如果在 Docker 上运行,您可以通过运行以下命令检查部署的日志:浏览器错误
如果您遇到显示为浏览器错误的问题,检查可能包含关键信息的 HAR 文件也可能有所帮助。要获取 HAR 文件,您可以遵循本指南,该指南解释了各种浏览器的简短过程。 接着可以使用 Google HAR 分析器 检查文件,也可以将 HAR 文件发给 LangSmith 团队协助排查。Common issues
DB::Exception: Cannot reserve 1.00 MiB, not enough space: While executing WaitForAsyncInsert. (NOT_ENOUGH_SPACE)
This error occurs when ClickHouse runs out of disk space. You will need to increase the disk space available to ClickHouse.Kubernetes
In Kubernetes, you will need to increase the size of the ClickHouse PVC. To achieve this, you can perform the following steps:-
Get the storage class of the PVC:
kubectl get pvc data-langsmith-clickhouse-0 -n <namespace> -o jsonpath='{.spec.storageClassName}' -
Ensure the storage class has AllowVolumeExpansion: true:
kubectl get sc <storage-class-name> -o jsonpath='{.allowVolumeExpansion}'- If it is false, some storage classes can be updated to allow volume expansion.
- To update the storage class, you can run
kubectl patch sc <storage-class-name> -p '{"allowVolumeExpansion": true}' - If this fails, you may need to create a new storage class with the correct settings.
-
Edit your pvc to have the new size:
kubectl edit pvc data-langsmith-clickhouse-0 -n <namespace>orkubectl patch pvc data-langsmith-clickhouse-0 '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' -n <namespace> -
Update your helm chart
langsmith_config.yamlto new size(e.g100 Gi) -
Delete the clickhouse statefulset
kubectl delete statefulset langsmith-clickhouse --cascade=orphan -n <namespace> - Apply helm chart with updated size (You can follow the upgrade guide here)
-
Your pvc should now have the new size. Verify by running
kubectl get pvcandkubectl exec langsmith-clickhouse-0 -- bash -c "df"
Docker
In Docker, you will need to increase the size of the ClickHouse volume. To achieve this, you can perform the following steps:- Stop your instance of LangSmith.
docker compose down - If using bind mount, you will need to increase the size of the mount point.
- If using a docker
volume, you will need to allocate more space to the volume/docker.
error: Dirty database version ‘version’. Fix and force version
This error occurs when the ClickHouse database is in an inconsistent state with our migrations. You will need to reset to an earlier database version and then rerun your upgrade/migrations.Kubernetes
- Force migration to an earlier version, where version = dirty version - 1.
- Rerun your upgrade/migrations.
Docker
- Force migration to an earlier version, where version = dirty version - 1.
- Rerun your upgrade/migrations.
413 - Request Entity Too Large
This error occurs when the request size exceeds the maximum allowed size. You will need to increase the maximum request size in your Nginx configuration.Kubernetes
- Edit your
langsmith_config.yamland increase thefrontend.maxBodySizevalue. This might look something like this:
- Apply your changes to the cluster.
Details: code: 497, message: default: Not enough privileges. To execute this query, it’s necessary to have the grant CREATE ROW POLICY ON default.feedbacks_rmt
This error occurs when your user does not have the necessary permissions to create row policies in Clickhouse. When deploying the Docker deployment, you need to copy theusers.xml file from the github repo as well. This adds the <access_management> tag to the users.xml file, which allows the user to create row policies. Below is the default users.xml file that we expect to be used.
users.xml file included.
Example Dockerfile:
- Build your custom image.
- Update your
docker-compose.yamlto use the custom image. Make sure to remove the users.xml mount point.
- Restart your instance of LangSmith.
ClickHouse fails to start up when running a cluster with AquaSec
In some environments, AquaSec may prevent ClickHouse from starting up correctly. This may manifest as the ClickHouse pod not emitting any logs and failing to get marked as ready. Generally this is due toLD_PRELOAD being set by AquaSec, which interferes with ClickHouse. To resolve this, you can add the following environment variable to your ClickHouse deployment:
Kubernetes
Edit yourlangsmith_config.yaml (or corresponding config file) and set the AQUA_SKIP_LD_PRELOAD environment variable:
Docker
Edit yourdocker-compose.yaml and set the AQUA_SKIP_LD_PRELOAD environment variable: