Open
Description
There does not appear to be a way to pass document_info
to the LexicalGraphBuilder when running the SimpleKGPipeline
with text input.
e.g. when I run
import asyncio
import os
from langchain_neo4j import Neo4jGraph
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import (
FixedSizeSplitter,
)
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm import OpenAILLM
OPENAI_MODEL = "gpt-4o-mini"
NODE_TYPES = [
"Person",
"Organization",
"Location",
"Event",
"Legislation",
"Claim",
"Topic",
]
PROMPT_TEMPLATE = """
You are a research assistant tasked with extracting information from news articles
to assemble a knowledge graph that can be used to help readers make better sense of
the content, context, and implications of news stories via Q&A.
Extract nodes and relationships from the following input text.
Return result as JSON using the following format:
{{"nodes": [ {{"id": "0", "label": "Person", "properties": {{"name": "John"}} }}],
"relationships": [{{"type": "KNOWS", "start_node_id": "0", "end_node_id": "1", "properties": {{"since": "2024-08-01"}} }}] }}
- Assign a unique ID (string) to each node, and reuse it to define relationships
- Respect the source and target node types for relationships and their directions
- Use only information from the input text to create graph components and properties
- Create as many nodes and relationships as needed to sufficiently characterize the input text
- Your output should only contain the JSON object, nothing else
Use only the following nodes and relationships (if provided):
{schema}
Input text:
{text}
"""
async def main():
graph = Neo4jGraph(
url=os.getenv("NEO4J_URI"),
username=os.getenv("NEO4J_USERNAME"),
password=os.getenv("NEO4J_PASSWORD"),
refresh_schema=False,
database=os.getenv("NEO4J_DATABASE"),
)
llm = OpenAILLM(
model_name=OPENAI_MODEL,
model_params={"response_format": {"type": "json_object"}, "temperature": 0},
)
kg_pipeline = SimpleKGPipeline(
llm=llm,
driver=graph._driver,
text_splitter=FixedSizeSplitter(chunk_size=500, chunk_overlap=100),
embedder=OpenAIEmbeddings(),
entities=NODE_TYPES,
prompt_template=PROMPT_TEMPLATE,
from_pdf=False,
perform_entity_resolution=True,
)
pipeline_result = await kg_pipeline.run_async(text="Some long article text here...")
if __name__ == "__main__":
asyncio.run(main())
I see in the logs:
neo4j_graphrag.experimental.components.lexical_graph - INFO - Document node not created in the lexical graph because no document metadata is provided
Is there some way around this?
Metadata
Metadata
Assignees
Labels
No labels