get_structured_schema function misses some labels/types on large databases

In the current implementation of the[` get_structured_schema` ](https://github.com/neo4j/neo4j-graphrag-python/blob/main/src/neo4j_graphrag/schema.py#L228) function, the calls to `apoc.meta.data `do not specify `sample: -1`, which causes sampling to occur with the default skip rate (see: [apoc.meta.data documentation](https://neo4j.com/docs/apoc/current/overview/apoc.meta/apoc.meta.data/)) . On large databases, this sampling behavior means some labels and relationship types are not discovered or included in the resulting schema.

## Steps to Reproduce
To reproduce the error access the  [offshoreleaks](https://demo.neo4jlabs.com:7473/browser/?dbms=neo4j://offshoreleaks@demo.neo4jlabs.com&db=offshoreleaks) database in the [Neo4j Dataset Demo server](https://github.com/neo4j-graph-examples/demo.neo4jlabs.com?tab=readme-ov-file)

- server_url = https://demo.neo4jlabs.com:7473/
- username = offshoreleaks
- password = offshoreleaks
- database_name = offshoreleaks

## Suggested Fix
In the [schema.py file ](https://github.com/neo4j/neo4j-graphrag-python/blob/main/src/neo4j_graphrag/schema.py#L31) , the calls to  `apoc.meta.data ` in `NODE_PROPERTIES_QUERY`, `REL_PROPERTIES_QUERY `, and `REL_QUERY ` could be updated to include the parameter:
`CALL apoc.meta.data({sample: -1})`
This change would ensure that all nodes and relationships are scanned, avoiding missed labels and properties — especially in large databases.

However, using sample: -1 forces a full scan of the database, which can significantly impact performance on large datasets. To provide flexibility, this could be exposed as an optional function parameter (e.g., skip_sampling=True) in get_structured_schema, so users can control the behavior based on their needs.

Would you agree with this approach? If so, I’d be happy to open a PR implementing the change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

get_structured_schema function misses some labels/types on large databases #350

Steps to Reproduce

Suggested Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

get_structured_schema function misses some labels/types on large databases #350

Description

Steps to Reproduce

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions