Azure Cosmos DB for Apache Gremlin 是一项图数据库服务,可存储包含数十亿顶点与边的超大规模图。你可以毫秒级延迟查询图,并轻松演化其结构。 Gremlin 是由 Apache 软件基金会 旗下 Apache TinkerPop 项目开发的图遍历语言与虚拟机。
本笔记本展示如何使用 LLM 为可通过 Gremlin 查询的图数据库提供自然语言接口。

设置

安装依赖:
!pip3 install gremlinpython
需要一个 Azure Cosmos DB Graph 实例。可在 Azure 中创建免费 CosmosDB Graph 实例 创建 Cosmos DB 帐号与图时,将 /type 作为分区键。
cosmosdb_name = "mycosmosdb"
cosmosdb_db_id = "graphtesting"
cosmosdb_db_graph_id = "mygraph"
cosmosdb_access_Key = "longstring=="
import nest_asyncio
from langchain_community.chains.graph_qa.gremlin import GremlinQAChain
from langchain_community.graphs import GremlinGraph
from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document
from langchain_openai import AzureChatOpenAI
graph = GremlinGraph(
    url=f"wss://{cosmosdb_name}.gremlin.cosmos.azure.com:443/",
    username=f"/dbs/{cosmosdb_db_id}/colls/{cosmosdb_db_graph_id}",
    password=cosmosdb_access_Key,
)

填充数据库

假设数据库为空,可以使用 GraphDocument 进行填充。 对于 Gremlin,务必为每个节点添加 label 属性;若未设置,Node.type 会作为标签。Cosmos 中使用自然 ID 更合适,因为在图浏览器中可见。
source_doc = Document(
    page_content="Matrix is a movie where Keanu Reeves, Laurence Fishburne and Carrie-Anne Moss acted."
)
movie = Node(id="The Matrix", properties={"label": "movie", "title": "The Matrix"})
actor1 = Node(id="Keanu Reeves", properties={"label": "actor", "name": "Keanu Reeves"})
actor2 = Node(
    id="Laurence Fishburne", properties={"label": "actor", "name": "Laurence Fishburne"}
)
actor3 = Node(
    id="Carrie-Anne Moss", properties={"label": "actor", "name": "Carrie-Anne Moss"}
)
rel1 = Relationship(
    id=5, type="ActedIn", source=actor1, target=movie, properties={"label": "ActedIn"}
)
rel2 = Relationship(
    id=6, type="ActedIn", source=actor2, target=movie, properties={"label": "ActedIn"}
)
rel3 = Relationship(
    id=7, type="ActedIn", source=actor3, target=movie, properties={"label": "ActedIn"}
)
rel4 = Relationship(
    id=8,
    type="Starring",
    source=movie,
    target=actor1,
    properties={"label": "Strarring"},
)
rel5 = Relationship(
    id=9,
    type="Starring",
    source=movie,
    target=actor2,
    properties={"label": "Strarring"},
)
rel6 = Relationship(
    id=10,
    type="Straring",
    source=movie,
    target=actor3,
    properties={"label": "Strarring"},
)
graph_doc = GraphDocument(
    nodes=[movie, actor1, actor2, actor3],
    relationships=[rel1, rel2, rel3, rel4, rel5, rel6],
    source=source_doc,
)
# python-gremlin 在 notebook 中运行时会有问题
# 以下代码用于修复
nest_asyncio.apply()

# 将文档写入 CosmosDB 图
graph.add_graph_documents([graph_doc])

刷新图模式

若数据库模式发生变化,可刷新相关信息。
graph.refresh_schema()
print(graph.schema)

查询图

现在可以使用 Gremlin QA 链对图提问:
chain = GremlinQAChain.from_llm(
    AzureChatOpenAI(
        temperature=0,
        azure_deployment="gpt-4-turbo",
    ),
    graph=graph,
    verbose=True,
)
chain.invoke("Who played in The Matrix?")
chain.run("How many people played in The Matrix?")

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.