Skip to content

Remove misleading LanceDB naming from Lance integration code #33

Description

@FANNG1

The current implementation appears to be bound to the Lance Python package / pylance API rather than the LanceDB Python client, but several code names and docs still use lancedb / LanceDB. This can mislead readers into thinking the integration depends on or targets the LanceDB client API.

Examples found in the repo:

  • LanceDBScanOperator
  • _lancedb_table_factory_function
  • _lancedb_count_result_function
  • tests/io/lancedb/...
  • docstrings that refer to "LanceDB table" or say the functions require LanceDB

At runtime, the code imports and uses lance, e.g. lance.dataset(...), lance.LanceDataset, and lance.LanceOperation. The project dependencies include pylance and lance-namespace*, but not lancedb.

Proposal:

  • Rename internal symbols from lancedb / LanceDB to lance / Lance where they describe Lance dataset behavior.
  • Update docstrings and examples to say "Lance dataset/table" instead of "LanceDB table" unless a true LanceDB client integration is intended.
  • Consider moving tests/io/lancedb to a Lance-specific path, or at least update test names over time to reduce confusion.
  • Keep backward compatibility in mind for any public or semi-public names, but the current package API appears to expose read_lance, merge_columns, merge_columns_df, create_scalar_index, and compact_files, not LanceDBScanOperator.

This is mostly a clarity issue, but it matters because Lance and LanceDB are distinct layers and the current naming makes the integration boundary harder to understand.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions