LangSmith 部署中的各类服务会以日志、指标和追踪的形式发送遥测数据。你可能已经在 Kubernetes 集群中部署了遥测收集器,也可能正准备部署一个用于监控你的应用。 本文介绍如何配置 OTel Collector,以便从 LangSmith 收集遥测数据。这里的概念同样可以迁移到其他收集器,例如 FluentdFluentBit
本节仅适用于 Kubernetes 部署。

接收器

日志

以下示例展示 Sidecar 收集器如何从所属 pod 读取日志,并排除非业务容器的日志。Sidecar 在此场景下很实用,因为需要访问各容器的文件系统;你也可以选择使用 DaemonSet。
filelog:
  exclude:
    - "**/otc-container/*.log"
  include:
    - /var/log/pods/${POD_NAMESPACE}_${POD_NAME}_${POD_UID}/*/*.log
  include_file_name: false
  include_file_path: true
  operators:
    - id: container-parser
      type: container
  retry_on_failure:
    enabled: true
  start_at: end
env:
  - name: POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: POD_NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
  - name: POD_UID
    valueFrom:
      fieldRef:
        fieldPath: metadata.uid
volumes:
  - name: varlogpods
    hostPath:
      path: /var/log/pods
volumeMounts:
  - name: varlogpods
    mountPath: /var/log/pods
    readOnly: true
此配置要求具备目标命名空间内 pod 的 getlistwatch 权限。

指标

可以使用 Prometheus 端点抓取指标。为避免重复抓取,可部署单实例 Gateway 收集器。下面的配置会抓取所有遵循默认命名的 LangSmith 服务:
prometheus:
  config:
    scrape_configs:
      - job_name: langsmith-services
        metrics_path: /metrics
        scrape_interval: 15s
        # Only scrape endpoints in the LangSmith namespace
        kubernetes_sd_configs:
          - role: endpoints
            namespaces:
              names: [<langsmith-namespace>]
        relabel_configs:
          # Only scrape services with the name langsmith-.*
          - source_labels: [__meta_kubernetes_service_name]
            regex: "langsmith-.*"
            action: keep
          # Only scrape ports with the following names
          - source_labels: [__meta_kubernetes_endpoint_port_name]
            regex: "(backend|platform|playground|redis-metrics|postgres-metrics|metrics)"
            action: keep
          # Promote useful metadata into regular labels
          - source_labels: [__meta_kubernetes_service_name]
            target_label: k8s_service
          - source_labels: [__meta_kubernetes_pod_name]
            target_label: k8s_pod
          # Replace the default "host:port" as Prom's instance label
          - source_labels: [__address__]
            target_label: instance
此配置要求具备目标命名空间内 pod、service、endpoint 的 getlistwatch 权限。

跟踪

若要采集追踪数据,需要启用 OTLP 接收器。以下配置在 4318 端口监听 HTTP,在 4317 端口监听 gRPC:
otlp:
  protocols:
    grpc:
      endpoint: 0.0.0.0:4317
    http:
      endpoint: 0.0.0.0:4318

处理器

推荐的 OTEL 处理器

使用 OTel 收集器时,推荐启用以下处理器:

导出器

导出器只需指向目标外部端点。下列配置示例展示如何分别设置日志、指标和追踪的端点:
otlphttp/logs:
  endpoint: <your_logs_endpoint>
otlphttp/metrics:
  endpoint: <your_metrics_endpoint>
otlphttp/traces:
  endpoint: <your_traces_endpoint>
OTel Collector 同样支持直接导出到 Datadog 端点。

示例收集器配置:日志 Sidecar

mode: sidecar
image: otel/opentelemetry-collector-contrib
config:
  receivers:
    filelog:
      exclude:
        - "**/otc-container/*.log"
      include:
        - /var/log/pods/${POD_NAMESPACE}_${POD_NAME}_${POD_UID}/*/*.log
      include_file_name: false
      include_file_path: true
      operators:
        - id: container-parser
          type: container
      retry_on_failure:
        enabled: true
      start_at: end
  processors:
    batch:
      send_batch_size: 8192
      timeout: 10s
    memory_limiter:
      check_interval: 1m
      limit_percentage: 90
      spike_limit_percentage: 80
  exporters:
    otlphttp/logs:
      endpoint: <your-endpoint>
  service:
    pipelines:
      logs/langsmith:
        receivers: [filelog]
        processors: [batch, memory_limiter]
        exporters: [otlphttp/logs]
env:
  - name: POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: POD_NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
  - name: POD_UID
    valueFrom:
      fieldRef:
        fieldPath: metadata.uid
volumes:
  - name: varlogpods
    hostPath:
      path: /var/log/pods
volumeMounts:
  - name: varlogpods
    mountPath: /var/log/pods
    readOnly: true

示例收集器配置:指标与追踪网关

mode: deployment
image: otel/opentelemetry-collector-contrib
config:
  receivers:
    prometheus:
      config:
        scrape_configs:
          - job_name: langsmith-services
            metrics_path: /metrics
            scrape_interval: 15s
            # Only scrape endpoints in the LangSmith namespace
            kubernetes_sd_configs:
              - role: endpoints
                namespaces:
                  names: [<langsmith-namespace>]
            relabel_configs:
              # Only scrape services with the name langsmith-.*
              - source_labels: [__meta_kubernetes_service_name]
                regex: "langsmith-.*"
                action: keep
              # Only scrape ports with the following names
              - source_labels: [__meta_kubernetes_endpoint_port_name]
                regex: "(backend|platform|playground|redis-metrics|postgres-metrics|metrics)"
                action: keep
              # Promote useful metadata into regular labels
              - source_labels: [__meta_kubernetes_service_name]
                target_label: k8s_service
              - source_labels: [__meta_kubernetes_pod_name]
                target_label: k8s_pod
              # Replace the default "host:port" as Prom's instance label
              - source_labels: [__address__]
                target_label: instance
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318
  processors:
    batch:
      send_batch_size: 8192
      timeout: 10s
    memory_limiter:
      check_interval: 1m
      limit_percentage: 90
      spike_limit_percentage: 80
  exporters:
    otlphttp/metrics:
      endpoint: <metrics_endpoint>
    otlphttp/traces:
      endpoint: <traces_endpoint>
  service:
    pipelines:
      metrics/langsmith:
        receivers: [prometheus]
        processors: [batch, memory_limiter]
        exporters: [otlphttp/metrics]
      traces/langsmith:
        receivers: [otlp]
        processors: [batch, memory_limiter]
        exporters: [otlphttp/traces]

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.