Document node not created in lexical graph when running SimpleKGPipeline with text input (i.e. not pdf)

There does not appear to be a way to pass `document_info` to the [LexicalGraphBuilder](https://github.com/neo4j/neo4j-graphrag-python/blob/main/src/neo4j_graphrag/experimental/components/lexical_graph.py#L60) when running the `SimpleKGPipeline` with text input.

e.g. when I run
```
import asyncio
import os

from langchain_neo4j import Neo4jGraph
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import (
    FixedSizeSplitter,
)
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm import OpenAILLM

OPENAI_MODEL = "gpt-4o-mini"
NODE_TYPES = [
    "Person",
    "Organization",
    "Location",
    "Event",
    "Legislation",
    "Claim",
    "Topic",
]
PROMPT_TEMPLATE = """
You are a research assistant tasked with extracting information from news articles 
to assemble a knowledge graph that can be used to help readers make better sense of
the content, context, and implications of news stories via Q&A.

Extract nodes and relationships from the following input text.

Return result as JSON using the following format:
{{"nodes": [ {{"id": "0", "label": "Person", "properties": {{"name": "John"}} }}],
"relationships": [{{"type": "KNOWS", "start_node_id": "0", "end_node_id": "1", "properties": {{"since": "2024-08-01"}} }}] }}

- Assign a unique ID (string) to each node, and reuse it to define relationships
- Respect the source and target node types for relationships and their directions
- Use only information from the input text to create graph components and properties
- Create as many nodes and relationships as needed to sufficiently characterize the input text
- Your output should only contain the JSON object, nothing else

Use only the following nodes and relationships (if provided):
{schema}

Input text:
{text}
"""


async def main():
    graph = Neo4jGraph(
        url=os.getenv("NEO4J_URI"),
        username=os.getenv("NEO4J_USERNAME"),
        password=os.getenv("NEO4J_PASSWORD"),
        refresh_schema=False,
        database=os.getenv("NEO4J_DATABASE"),
    )
    llm = OpenAILLM(
        model_name=OPENAI_MODEL,
        model_params={"response_format": {"type": "json_object"}, "temperature": 0},
    )
    kg_pipeline = SimpleKGPipeline(
        llm=llm,
        driver=graph._driver,
        text_splitter=FixedSizeSplitter(chunk_size=500, chunk_overlap=100),
        embedder=OpenAIEmbeddings(),
        entities=NODE_TYPES,
        prompt_template=PROMPT_TEMPLATE,
        from_pdf=False,
        perform_entity_resolution=True,
    )
    pipeline_result = await kg_pipeline.run_async(text="Some long article text here...")


if __name__ == "__main__":
    asyncio.run(main())
```

I see in the logs:
```
neo4j_graphrag.experimental.components.lexical_graph - INFO - Document node not created in the lexical graph because no document metadata is provided
```

Is there some way around this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document node not created in lexical graph when running SimpleKGPipeline with text input (i.e. not pdf) #353

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document node not created in lexical graph when running SimpleKGPipeline with text input (i.e. not pdf) #353

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions