本加载器介绍如何从 GMail 导入数据。理论上可以有多种方式,目前此实现较有主见:它会先查找所有你发送的邮件,再查找你回复了上一封邮件的记录。随后获取那封被回复的邮件,并创建“上一封邮件 + 你的回复”这一训练样本。 需要注意其局限性,例如所有样本仅以上一封邮件作为上下文。 使用步骤:
  • 设置 Google 开发者账户:进入 Google Developer Console,创建项目并为其启用 Gmail API,会得到稍后需要的 credentials.json
  • 安装 Google Client Library,运行:
pip install -qU  google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client
import os.path

from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow

SCOPES = ["https://www.googleapis.com/auth/gmail.readonly"]


creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists("email_token.json"):
    creds = Credentials.from_authorized_user_file("email_token.json", SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
    if creds and creds.expired and creds.refresh_token:
        creds.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file(
            # your creds file here. Please create json file as here https://cloud.google.com/docs/authentication/getting-started
            "creds.json",
            SCOPES,
        )
        creds = flow.run_local_server(port=0)
    # Save the credentials for the next run
    with open("email_token.json", "w") as token:
        token.write(creds.to_json())
from langchain_community.chat_loaders.gmail import GMailLoader
loader = GMailLoader(creds=creds, n=3)
data = loader.load()
# 有时可能存在被静默忽略的错误
len(data)
2
from langchain_community.chat_loaders.utils import (
    map_ai_messages,
)
# 将 hchase@langchain.com 发送的消息视为 AI 消息
# 这意味着将训练 LLM 以 hchase 的身份作答
training_data = list(
    map_ai_messages(data, sender="Harrison Chase [hchase@langchain.com](mailto:hchase@langchain.com)")
)

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.