Add support for existing graph nodes and relationsships to the entity and relationship extractor by ali-sedaghatbaf · Pull Request #532 · neo4j/neo4j-graphrag-python

ali-sedaghatbaf · 2026-05-26T15:01:27Z

Description

This pull request enhances the entity and relation extraction process by allowing the system to incorporate information about existing nodes and relationships in the knowledge graph. When provided, these existing entities are included in the prompt to the language model, instructing it to reuse IDs for matching entities instead of creating duplicates. The changes also ensure that sensitive or unnecessary properties (like embeddings) are excluded from the prompt and add comprehensive tests for the new functionality.

The LLMEntityRelationExtractor now accepts an optional existing_graph parameter in its main methods (run, run_for_chunk, and extract_for_chunk). When supplied, the extractor serializes existing nodes and relationships (excluding embedding_properties) and includes them in the prompt to guide the language model to reuse entity IDs.
The ERExtractionTemplate prompt now conditionally adds a section listing existing nodes and relationships, formatted as JSON, and provides clear instructions to the LLM about reusing IDs for existing entities. If no existing entities are provided, this section is omitted.
New unit tests verify that:
- Existing entities are included in the prompt when provided.
- embedding_properties are excluded from the prompt.
- The "Existing graph entities" section is omitted when no existing entities are provided.
- The prompt template formats the existing entity information correctly under various scenarios.

Type of Change

Complexity

Complexity: low

How Has This Been Tested?

Unit tests
E2E tests
Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

Documentation has been updated
Unit tests have been updated
E2E tests have been updated
Examples have been updated
New files have copyright header
CLA (https://neo4j.com/developer/cla/) has been signed
CHANGELOG.md updated if appropriate

NathalieCharbel

Great stuff!

I'd like to further discuss the following points that are still unclear to me:

how do we accumulate existing graphs from previous runs or what is the intention regarding how the caller deals with the existing graph?
how is the approach impacted by node IDs being prefixed in post_process_chunk -> update_ids?
how does the approach hold end to end, should we prove the actual approach works as intended through a small test with "real" LLM calls?

NathalieCharbel · 2026-05-29T13:43:10Z

                schema,
                examples,
                lexical_graph_builder,
+                existing_graphs[i] if existing_graphs else None,


so each chunk is seeing only its caller-supplied existing_graphs[i]. Do we suppose we're accumulating the graphs across previous runs and that's the responsibility of the caller? But anyway, if the same new entity appears in chunk 0 and chunk 5 of the same run, two distinct IDs will be created - we are only trying to deduplicate against externally-provided (prior/persisted) state, never against the current batch? I am just trying to understand the approach and the possible drawbacks. This could be legitimate design choice but I don't think it is clear in the docstring.

NathalieCharbel · 2026-05-29T14:16:09Z

            lexical_graph_config (Optional[LexicalGraphConfig], optional): Lexical graph configuration to customize node labels and relationship types in the lexical graph.
            schema (GraphSchema | None): Definition of the schema to guide the LLM in its extraction.
            examples (str): Examples for few-shot learning in the prompt.
+            existing_graphs (Optional[list[Neo4jGraph]]): One subgraph per chunk, each containing nodes and relationships already in the knowledge graph that are relevant to that chunk. When provided, the LLM is instructed to reuse their IDs for matching entities instead of creating new ones. Must have the same length as chunks.


would this be negatively impacted by nodes IDs being rewritten by update_id (where node ids are re-written to ensure their uniqueness across chunks)?

No, don't think so.

NathalieCharbel · 2026-05-29T14:23:27Z

+        existing_nodes=[],
+        existing_rels=[],
+    )
+    assert "Existing graph entities" not in prompt


while all of these unit tests are great to have, it feels a bit difficult to prove the approach and that the LLM is doing the right thing in picking the existing nodes/rels without testing it with real LLM calls on a very small dataset?

ali-sedaghatbaf added 3 commits May 26, 2026 16:34

Add support for existing graph to ER extractor

9783a54

Add unit tests

71cea93

Consider one sub-graph per chunk

95fdc43

ali-sedaghatbaf marked this pull request as ready for review May 28, 2026 14:19

ali-sedaghatbaf requested a review from a team as a code owner May 28, 2026 14:19

NathalieCharbel reviewed May 29, 2026

View reviewed changes

Add e2e test

7170548

ali-sedaghatbaf closed this Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for existing graph nodes and relationsships to the entity and relationship extractor#532

Add support for existing graph nodes and relationsships to the entity and relationship extractor#532
ali-sedaghatbaf wants to merge 4 commits into
mainfrom
support-existing-graph

ali-sedaghatbaf commented May 26, 2026 •

edited

Loading

Uh oh!

NathalieCharbel left a comment

Uh oh!

NathalieCharbel May 29, 2026

Uh oh!

NathalieCharbel May 29, 2026

Uh oh!

ali-sedaghatbaf Jun 2, 2026

Uh oh!

Uh oh!

NathalieCharbel May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ali-sedaghatbaf commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Complexity

How Has This Been Tested?

Checklist

Uh oh!

NathalieCharbel left a comment

Choose a reason for hiding this comment

Uh oh!

NathalieCharbel May 29, 2026

Choose a reason for hiding this comment

Uh oh!

NathalieCharbel May 29, 2026

Choose a reason for hiding this comment

Uh oh!

ali-sedaghatbaf Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NathalieCharbel May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ali-sedaghatbaf commented May 26, 2026 •

edited

Loading