Skip to content

fix(kg-pipeline): always create entity-to-chunk relationships regardless of create_lexical_graph#491

Open
ADunfield wants to merge 2 commits into
neo4j:mainfrom
ADunfield:fix/470-entity-to-chunk-relationships
Open

fix(kg-pipeline): always create entity-to-chunk relationships regardless of create_lexical_graph#491
ADunfield wants to merge 2 commits into
neo4j:mainfrom
ADunfield:fix/470-entity-to-chunk-relationships

Conversation

@ADunfield

Copy link
Copy Markdown
Contributor

Summary

Fixes #470.

When LLMEntityRelationExtractor is instantiated with create_lexical_graph=False and no lexical_graph_config is provided, FROM_CHUNK relationships between extracted entities and their source chunks are silently never created. This breaks any downstream query or pipeline step that traverses the entity→chunk link.

Root cause

In run(), lexical_graph_builder is conditionally assigned:

lexical_graph_builder = None
lexical_graph = None
if self.create_lexical_graph:
    config = lexical_graph_config or LexicalGraphConfig()
    lexical_graph_builder = LexicalGraphBuilder(config=config)
    ...
elif lexical_graph_config:
    lexical_graph_builder = LexicalGraphBuilder(config=lexical_graph_config)

When create_lexical_graph=False and lexical_graph_config=None (the default), lexical_graph_builder stays None. Later, post_process_chunk() guards its call to process_chunk_extracted_entities() behind if lexical_graph_builder:, so no FROM_CHUNK relationships are created.

Fix

Always instantiate LexicalGraphBuilder before the conditional:

config = lexical_graph_config or LexicalGraphConfig()
lexical_graph_builder = LexicalGraphBuilder(config=config)
lexical_graph = None
if self.create_lexical_graph:
    lexical_graph_result = await lexical_graph_builder.run(
        text_chunks=chunks, document_info=document_info
    )
    lexical_graph = lexical_graph_result.graph

create_lexical_graph continues to control whether document/chunk nodes and NEXT_CHUNK/FROM_DOCUMENT relationships are added to the graph — it no longer suppresses entity→chunk relationship creation. The lexical_graph_config parameter still customises all relationship types and node labels as before.

Changes

  • src/neo4j_graphrag/experimental/components/entity_relation_extractor.py — restructure conditional in run()
  • tests/unit/experimental/components/test_entity_relation_extractor.py:
    • Updates test_extractor_llm_badly_formatted_json_gets_fixed which was asserting the (now-fixed) buggy relationships == []
    • Adds test_entity_to_chunk_rels_created_when_lexical_graph_disabled — core regression test
    • Adds test_no_chunk_nodes_when_lexical_graph_disabled — verifies chunk nodes still absent
    • Adds test_entity_to_chunk_rels_created_with_custom_lexical_graph_config — verifies custom rel type respected

Testing

pytest tests/unit/experimental/components/test_entity_relation_extractor.py -v
# 37 passed

All 37 tests pass (34 existing + 3 new).

…ess of create_lexical_graph

When LLMEntityRelationExtractor was run with create_lexical_graph=False and
no lexical_graph_config, lexical_graph_builder was left as None. This caused
post_process_chunk() to skip process_chunk_extracted_entities(), so FROM_CHUNK
relationships between extracted entities and their source chunks were never
created, silently breaking downstream graph queries that traverse entity→chunk.

Fix: always instantiate LexicalGraphBuilder (with default or provided config)
before the conditional block. The create_lexical_graph flag continues to control
whether document/chunk nodes and NEXT_CHUNK/FROM_DOCUMENT relationships are
added — it no longer suppresses entity→chunk relationship creation.

Also updates the existing test that was asserting the (now-fixed) buggy behavior
of relationships == [], and adds three new regression tests.

Closes neo4j#470
@ADunfield ADunfield requested a review from a team as a code owner March 18, 2026 07:36

@AmirLayegh AmirLayegh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, Thanks!
With this fix the lexical graph relationships are correctly present in the returned Neo4jGraph object. However, it is worth noting that when using Neo4jWriter, these relationships will still not be persisted to the database since create_lexical_grapp=False skips lexical_graph_builder.run().

Can you also please fix the CI checks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants