连接到外部 ClickHouse 数据库

ClickHouse 是一个高性能的面向列的数据库系统。它允许快速摄取数据并针对分析查询进行了优化。 LangSmith 使用 ClickHouse 作为跟踪和反馈的主要数据存储。默认情况下，自托管 LangSmith 将使用与 LangSmith 实例捆绑在一起的内部 ClickHouse 数据库。这作为有状态集在与 LangSmith 应用程序相同的 Kubernetes 集群中运行，或作为与 LangSmith 应用程序相同的主机上的 Docker 容器运行。但是，您可以配置 LangSmith 使用外部 ClickHouse 数据库以便更轻松地管理和扩展。通过配置外部 ClickHouse 数据库，您可以管理数据库的备份、扩展和其他操作任务。虽然 Clickhouse 在 Azure、AWS 或 Google Cloud 中还不是原生服务，但您可以通过以下方式使用外部 ClickHouse 数据库运行 LangSmith：

LangSmith 托管 ClickHouse
直接或通过云提供商市场配置 ClickHouse Cloud：
在云提供商的 VM 上

使用前两个选项（LangSmith 托管 ClickHouse 或 ClickHouse Cloud）将在 VPC 外部配置 Clickhouse 服务。但是，这两个选项都支持私有端点，这意味着您可以将流量定向到 ClickHouse 服务，而无需将其暴露给公共互联网（例如，通过 AWS PrivateLink 或 GCP Private Service Connect）。此外，可以配置敏感信息不存储在 Clickhouse 中。请联系 support@langchain.dev 获取更多信息。

Requirements

A provisioned ClickHouse instance that your LangSmith application will have network access to (see above for options).
A user with admin access to the ClickHouse database. This user will be used to create the necessary tables, indexes, and views.
我们同时支持独立部署的 ClickHouse 与外部托管的集群部署。对于集群部署，请确保所有节点运行相同版本。请注意，集成包形式的 ClickHouse 安装不支持集群模式。
We only support ClickHouse versions >= 23.9. Use of ClickHouse versions >= 24.2 requires LangSmith v0.6 or later.
We rely on a few configuration parameters to be set on your ClickHouse instance. These are detailed below:

<profiles>
  <default>
      <async_insert>1</async_insert> # Turn on async insert
      <async_insert_max_data_size>25000000</async_insert_max_data_size> # Flush data to disk after 25MB. You may need to adjust this based on your workload.
      <wait_for_async_insert>0</wait_for_async_insert> # Disable waiting for async insert by default
      <parallel_view_processing>1</parallel_view_processing> # Enable parallel view processing
      <materialize_ttl_after_modify>0</materialize_ttl_after_modify> # Disable TTL materialization after modify
      <wait_for_async_insert_timeout>120</wait_for_async_insert_timeout> # Set the timeout for waiting for async insert
      <lightweight_deletes_sync>0</lightweight_deletes_sync> # Disable lightweight deletes sync
      <allow_materialized_view_with_bad_select>1</allow_materialized_view_with_bad_select> # Allow materialized views with legacy SELECT statements that cause CH to fail
  </default>
</profiles>

Our system has been tuned to work with the above configuration parameters. Changing these parameters may result in unexpected behavior.

HA Replicated Clickhouse Cluster

By default, the setup process above will only work with a single node Clickhouse cluster.

If you would like to use a multi-node Clickhouse cluster for HA, we support this with additional required configuration. This setup can use a Clickhouse cluster with multiple nodes where data replicated via Zookeeper or Clickhouse Keeper. For more information on Clickhouse replication, see Clickhouse Data Replication Docs. In order to setup LangSmith with a replicated multi-node Clickhouse setup:

You need to have a Clickhouse cluster that is setup with Keeper or Zookeeper for data replication and the appropriate settings. See Clickhouse Replication Setup Docs.
You need to set the cluster setting in the LangSmith Configuration section, specifically the cluster settings to match your Clickhouse Cluster name. This will use the Replicated table engines when running the Clickhouse migrations.
If in addition to HA, you would like to load balance among the Clickhouse nodes (to distribute reads or writes), we suggest using a load balancer or DNS load balancing to round robin among your Clickhouse servers.
Note: You will need to enable your cluster setting before launching LangSmith for the first time and running the Clickhouse migrations. This is a requirement since the table engine will need to be created as a Replicated table engine vs the non replicated engine type.

When running migrations with cluster enabled, the migration will create the Replicated table engine flavor. This means that data will be replicated among the servers in the cluster. This is a master-master setup where any server can process reads, writes, or merges.

For an example setup of a replicated ClickHouse cluster, refer to the replicated ClickHouse section in the LangSmith Helm chart repo, under examples.

LangSmith-managed ClickHouse

If using LangSmith-managed ClickHouse, you will need to set up a VPC peering connection between the LangSmith VPC and the ClickHouse VPC. Please reach out to support@langchain.dev for more information.
You will also need to set up Blob Storage. You can read more about Blob Storage in the Blob Storage documentation.

ClickHouse installations managed by LangSmith use a SharedMerge engine, which automatically clusters them and separates compute from storage.

For more information, refer to the managed ClickHouse page.

Parameters

You will need to provide several parameters to your LangSmith installation to configure an external ClickHouse database. These parameters include:

Host: The hostname or IP address of the ClickHouse database
HTTP Port: The port that the ClickHouse database listens on for HTTP connections
Native Port: The port that the ClickHouse database listens on for native connections
Database: The name of the ClickHouse database that LangSmith should use
Username: The username to use to connect to the ClickHouse database
Password: The password to use to connect to the ClickHouse database
Cluster (Optional): The name of the ClickHouse cluster if using an external Clickhouse cluster. When set, LangSmith will run migrations on the cluster and replicate data across instances.

Important considerations for clustered deployments:

Clustered setups must be configured on a fresh schema - existing standalone ClickHouse instances cannot be converted to clustered mode.
Clustering is only supported with externally managed ClickHouse deployments. It is not compatible with bundled ClickHouse installations as these do not include required ZooKeeper configurations.
When using a clustered deployment, LangSmith will automatically:
Run database migrations across all nodes in the cluster
Configure tables for data replication across the cluster

请注意，尽管数据会在节点之间复制，LangSmith 不会配置分布式表或处理查询路由——所有查询都会指向指定主机。如需负载均衡或查询分发，需要在基础设施层自行处理。

Configuration

With these parameters in hand, you can configure your LangSmith instance to use the provisioned ClickHouse database. You can do this by modifying the config.yaml file for your LangSmith Helm Chart installation or the .env file for your Docker installation.

clickhouse:
  external:
    enabled: true
    host: "host"
    port: "http port"
    nativePort: "native port"
    user: "default"
    password: "password"
    database: "default"
    tls: false
    cluster: "my_cluster_name"  # Optional: Set this if using an external Clickhouse cluster

Once configured, you should be able to reinstall your LangSmith instance. If everything is configured correctly, your LangSmith instance should now be using your external ClickHouse database.

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Overview

Hybrid

Self-hosted

连接到外部 ClickHouse 数据库

Requirements

HA Replicated Clickhouse Cluster

LangSmith-managed ClickHouse

Parameters

Configuration

Overview

Hybrid

Self-hosted

​Requirements

​HA Replicated Clickhouse Cluster

​LangSmith-managed ClickHouse

​Parameters

​Configuration

Requirements

HA Replicated Clickhouse Cluster

LangSmith-managed ClickHouse

Parameters

Configuration