Skip to content

Variable-length grouped traversal is too expensive on a large on-disk graph and gets killed #475

@tae898

Description

@tae898

Ladybug version

v0.16.1

What operating system are you using?

Ubuntu 22.04

What happened?

Summary

This query is too expensive to execute reliably on a large on-disk Ladybug database, and the process gets killed during execution:

MATCH (a:NodeType01)-[:EdgeType01*0..3]->(b:NodeType01)
RETURN b.active AS active, count(b) AS total, avg(b.score) AS avg_score
ORDER BY total DESC, active

Environment

  • LadybugDB 0.16.1
  • Python 3.12.13
  • Linux x86_64
  • on-disk database

Dataset size

  • 10,000,000 nodes
  • 77,790,000 edges
  • 10 node labels
  • 10 relationship types

Observed behavior

On this dataset size, running the query causes the process to be terminated with exit code 137.

Expected behavior

The query should either complete successfully or fail gracefully with a normal handled error.

Actual behavior

The whole process is killed while executing this query on the large dataset.

Are there known steps to reproduce?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions