GitHub - yeongseon/excel-dbapi: PEP 249 (DB-API 2.0) compliant driver for querying Excel files with SQL

excel-dbapi

A local-first worksheet query engine with a Python DB-API 2.0 (PEP 249) interface. Query, insert, update, and delete rows in .xlsx workbooks using SQL — no database server required.

This is not a document-preservation tool. excel-dbapi treats worksheets as row-oriented datasets. It does not guarantee preservation of Excel formatting, charts, images, conditional formatting, or formulas — particularly with the pandas engine, which rewrites the entire workbook on save. If you need a round-trip-safe Excel editor, use openpyxl directly.

Documentation

SQL Specification — authoritative feature matrix and SQL subset reference (v1.0)
Usage Guide — engine comparison, configuration, advanced patterns
10-Minute Quickstart
Engine Benchmarks — row limits, performance characteristics, preservation matrix
Project Roadmap
Development Guide
Operations Notes

Limitations

Before you begin, understand what excel-dbapi is not:

Not full SQL — a documented SQL subset (see SQL Specification)
Not a document-preservation tool — the pandas engine drops all formatting, charts, images, and formulas on save; openpyxl preserves formatting but not all Excel features survive round-trips through SQL DML
No concurrent writes — single-writer model; advisory PID-based file locking is provided but not ACID
Not for large datasets — designed for worksheets up to ~50k rows; beyond that, use a real database
No transactional rollback guarantees — rollback restores an in-memory snapshot, not a WAL-based recovery; a crash mid-save can lose data
Identifier grammar — both unquoted Unicode identifiers (이름, naïve) and double-quoted identifiers ("Full Name", "이름") are supported

If you need relational guarantees, concurrent access, or large-scale data, use SQLite or PostgreSQL.

SQL Feature Set (Stable)

All features below are Stable — covered by the SQL Spec v1.0 contract and will not have breaking changes.

SELECT with aliases, arithmetic/CASE expressions, DISTINCT, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET
JOINs: INNER, LEFT, RIGHT, FULL OUTER, CROSS
Aggregates: COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT col), FILTER (WHERE ...)
Scalar functions: UPPER, LOWER, LENGTH, TRIM, SUBSTR, COALESCE, NULLIF, CONCAT, ABS, ROUND, REPLACE, YEAR, MONTH, DAY
CAST(expr AS type) — INTEGER, REAL, TEXT, DATE, DATETIME, BOOLEAN
Subqueries: WHERE col [NOT] IN (SELECT ...), EXISTS (SELECT ...), scalar subqueries
Set operations: UNION, UNION ALL, INTERSECT, EXCEPT
DML: INSERT (single/multi-row, INSERT ... SELECT), UPSERT (ON CONFLICT), UPDATE, DELETE
DDL: CREATE TABLE, DROP TABLE, ALTER TABLE ADD/DROP/RENAME COLUMN
Parameters: ? positional placeholders (qmark paramstyle)

Experimental Features

These features are implemented but may change semantics or be removed in a future release:

Window functions (ROW_NUMBER, RANK, DENSE_RANK, running aggregates) — SQL Spec § Window Functions
CTEs (WITH ... AS) — non-recursive only — SQL Spec § CTEs

For the full feature matrix with per-feature notes, see docs/SQL_SPEC.md § Authoritative Feature Matrix.

Who is this for?

Data analysts who want to query Excel files with SQL instead of manual filtering
Citizen developers automating small workflows with familiar SQL syntax
Educators teaching SQL concepts without setting up a database
Prototypers building quick data pipelines before moving to a real database

Installation

pip install excel-dbapi

See CHANGELOG for release history.

Quick Start

from excel_dbapi.connection import ExcelConnection

# Open an Excel file and query it
with ExcelConnection("sample.xlsx") as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM Sheet1")
    print(cursor.fetchall())

Insert, Update, Delete

with ExcelConnection("sample.xlsx") as conn:
    cursor = conn.cursor()

    # Insert with parameter binding (recommended)
    cursor.execute("INSERT INTO Sheet1 (id, name) VALUES (?, ?)", (1, "Alice"))

    # Update
    cursor.execute("UPDATE Sheet1 SET name = 'Ann' WHERE id = 1")

    # Delete
    cursor.execute("DELETE FROM Sheet1 WHERE id = 2")

Multi-row Insert

with ExcelConnection("sample.xlsx") as conn:
    cursor = conn.cursor()

    # Insert multiple rows at once
    cursor.execute("INSERT INTO Sheet1 VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Carol')")

    # INSERT...SELECT: copy rows from another sheet
    cursor.execute("INSERT INTO Sheet2 (id, name) SELECT id, name FROM Sheet1 WHERE id > 1")

Create and Drop Sheets

with ExcelConnection("sample.xlsx") as conn:
    cursor = conn.cursor()
    cursor.execute("CREATE TABLE NewSheet (id, name)")
    cursor.execute("DROP TABLE NewSheet")

Engine Options

Engine	Description	Dependency	Preserves formatting
openpyxl (default)	Cell-level read/write	openpyxl	✅ Yes
pandas	DataFrame-based operations	pandas, openpyxl	❌ No — rewrites entire workbook
graph	Microsoft Graph API (remote)	httpx	✅ (cell values only)

conn = ExcelConnection("sample.xlsx", engine="openpyxl")  # default
conn = ExcelConnection("sample.xlsx", engine="pandas")

Engine Capability Matrix

Capability	openpyxl	pandas	graph
Read support	✅	✅	✅
Write support	✅	✅	✅ (opt-in, `readonly=False`)
Preserves formatting/charts/images	✅	❌ (rewrites workbook)	✅ (updates cell values only)
Transactions (commit/rollback)	✅ (in-memory snapshot)	✅ (in-memory snapshot)	❌ (writes are immediate)
`data_only=False` (read formulas)	✅	❌	❌
File locking	✅ (advisory PID-based)	✅ (advisory PID-based)	N/A (remote)
Remote/cloud access	❌	❌	✅ (Microsoft Graph)
`get_workbook()` access	✅	❌	❌
Formula injection defense	✅ (default on)	✅ (default on)	✅ (default on)

Choose openpyxl (default) for local files where you need formatting preservation and formula access. Choose pandas when you prefer DataFrame-based workflows and don't need formatting. Choose graph for remote Excel on OneDrive/SharePoint via Microsoft Graph API.

For detailed engine comparison and benchmarks, see the Usage Guide and Benchmarks.

WHERE Operators

Operator	Example	Description
`=`, `!=`, `<>`	`WHERE id = 1`	Equality / inequality
`>`, `>=`, `<`, `<=`	`WHERE score >= 80`	Comparison
`IS NULL` / `IS NOT NULL`	`WHERE name IS NOT NULL`	NULL checks
`IN`	`WHERE name IN ('Alice', 'Bob')`	Set membership
`BETWEEN`	`WHERE score BETWEEN 70 AND 90`	Inclusive range
`LIKE` / `ILIKE`	`WHERE name LIKE 'A%'`	Pattern matching (ILIKE = case-insensitive)
`NOT LIKE` / `NOT ILIKE`	`WHERE name NOT LIKE 'A%'`	Negated pattern matching
`NOT IN`	`WHERE id NOT IN (1, 2)`	Negated set membership
`NOT BETWEEN`	`WHERE x NOT BETWEEN 1 AND 5`	Negated range
`AND` / `OR` / `NOT`	`WHERE x = 1 AND y = 2`	Logical connectives

NULL semantics: Comparisons with NULL follow SQL three-valued logic (TRUE / FALSE / UNKNOWN). WHERE x = NULL returns no rows; use IS NULL instead.

LIKE patterns: % matches any sequence of characters, _ matches any single character.

with ExcelConnection("sample.xlsx") as conn:
    cursor = conn.cursor()

    # IN operator
    cursor.execute("SELECT * FROM Sheet1 WHERE name IN ('Alice', 'Bob')")

    # BETWEEN operator
    cursor.execute("SELECT * FROM Sheet1 WHERE score BETWEEN 70 AND 90")

    # LIKE operator
    cursor.execute("SELECT * FROM Sheet1 WHERE name LIKE 'A%'")

    # All operators support parameter binding
    cursor.execute("SELECT * FROM Sheet1 WHERE name IN (?, ?)", ("Alice", "Bob"))
    cursor.execute("SELECT * FROM Sheet1 WHERE score BETWEEN ? AND ?", (70, 90))
    cursor.execute("SELECT * FROM Sheet1 WHERE name LIKE ?", ("A%",))

Compound Queries (Set Operations)

with ExcelConnection("sample.xlsx") as conn:
    cursor = conn.cursor()

    cursor.execute("SELECT id FROM t1 UNION SELECT id FROM t2")
    cursor.execute("SELECT id FROM t1 UNION ALL SELECT id FROM t2")
    cursor.execute("SELECT id FROM t1 INTERSECT SELECT id FROM t2")
    cursor.execute("SELECT id FROM t1 EXCEPT SELECT id FROM t2")

Safety Defaults

Formula Injection Defense

By default, excel-dbapi sanitizes cell values on write (INSERT/UPDATE) to prevent formula injection attacks. Strings starting with =, +, -, @, \t, or \r are automatically prefixed with a single quote (') so they are stored as plain text, not executed as formulas.

# Default: sanitization ON (recommended)
with ExcelConnection("sample.xlsx") as conn:
    cursor = conn.cursor()
    cursor.execute("INSERT INTO Sheet1 (id, name) VALUES (?, ?)",
                   (1, "=SUM(A1:A10)"))
    # Stored as: '=SUM(A1:A10)  (safe, not executed as formula)

# Opt out if you intentionally write formulas
with ExcelConnection("sample.xlsx", sanitize_formulas=False) as conn:
    cursor = conn.cursor()
    cursor.execute("INSERT INTO Sheet1 (id, formula) VALUES (?, ?)",
                   (1, "=SUM(A1:A10)"))
    # Stored as: =SUM(A1:A10)  (executed as formula in Excel)

Transactions

with ExcelConnection("sample.xlsx", autocommit=False) as conn:
    cursor = conn.cursor()
    cursor.execute("UPDATE Sheet1 SET name = 'Ann' WHERE id = 1")
    conn.rollback()

Important: Transactions use in-memory snapshots, not write-ahead logging. rollback() restores the last committed snapshot in memory — it does not undo writes already flushed to disk. When autocommit=True (the default), each write is saved immediately and rollback() is not supported. The graph engine does not support transactions at all; writes are applied immediately to the remote workbook.

Cursor Metadata

with ExcelConnection("sample.xlsx") as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT id, name FROM Sheet1")
    print(cursor.description)
    print(cursor.rowcount)

Troubleshooting

"Column 'xyz' not found"

The column name in your SQL doesn't match any header in the sheet.

ProgrammingError: Column 'nmae' not found in Sheet1. Available columns: ['id', 'name', 'email']

Fix: Check the spelling. Column names must match the first row (header) of the sheet exactly.

"Table 'SheetX' not found"

The sheet name in your SQL doesn't match any sheet in the workbook.

ProgrammingError: Table 'Shee1' not found. Available sheets: ['Sheet1', 'Sheet2']

Fix: Check the sheet name spelling. Sheet names are resolved case-insensitively.

PandasEngine drops formatting

PandasEngine reads data into a DataFrame and writes it back. This process drops Excel formatting, charts, images, and formulas.

Fix: Use the default openpyxl engine if you need to preserve formatting.

Integer vs. string comparison (Pandas)

The Pandas engine preserves Python types. If a column contains integers, WHERE id = '2' (string) won't match — use WHERE id = 2 (no quotes).

Fix: Omit quotes around numeric values in WHERE clauses when using the Pandas engine.

Experimental: Remote Excel via Microsoft Graph API

Status: Experimental — API may change in future releases.

excel-dbapi can access remote Excel files on OneDrive/SharePoint via the Microsoft Graph API.

Supported Graph DSNs are ID-based:

msgraph://drives/{drive_id}/items/{item_id}
sharepoint://sites/{site_name}/drives/{drive_id}/items/{item_id}
onedrive://me/drive/items/{item_id}

pip install excel-dbapi[graph]

from excel_dbapi.connection import ExcelConnection

conn = ExcelConnection(
    "msgraph://drives/{drive_id}/items/{item_id}",
    engine="graph",
    credential=your_credential,
    autocommit=True,
)
cursor = conn.cursor()
cursor.execute("SELECT * FROM Sheet1")
print(cursor.fetchall())
conn.close()

The Graph backend is read-only by default. Write operations require explicit opt-in and a credential/token provider with appropriate Graph API permissions.

Graph metadata sync is best-effort for write operations: if worksheet mutation succeeds but metadata sync fails, excel-dbapi keeps the worksheet change and logs a warning.

For DSN formats and dependency choices, see the Usage Guide Graph section.

Related Projects

sqlalchemy-excel — SQLAlchemy dialect that uses excel-dbapi as its DB-API 2.0 driver. Use create_engine("excel:///file.xlsx") for full ORM support.

Examples

examples/basic_usage.py
examples/write_operations.py
examples/transactions.py
examples/advanced_query.py
examples/pandas_engine.py

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.github		.github
docs		docs
examples		examples
src/excel_dbapi		src/excel_dbapi
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
cliff.toml		cliff.toml
codecov.yml		codecov.yml
logo.svg		logo.svg
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Documentation

Limitations

SQL Feature Set (Stable)

Experimental Features

Who is this for?

Installation

Quick Start

Insert, Update, Delete

Multi-row Insert

Create and Drop Sheets

Engine Options

Engine Capability Matrix

WHERE Operators

Compound Queries (Set Operations)

Safety Defaults

Formula Injection Defense

Transactions

Cursor Metadata

Troubleshooting

"Column 'xyz' not found"

"Table 'SheetX' not found"

PandasEngine drops formatting

Integer vs. string comparison (Pandas)

Experimental: Remote Excel via Microsoft Graph API

Related Projects

Examples

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Documentation

Limitations

SQL Feature Set (Stable)

Experimental Features

Who is this for?

Installation

Quick Start

Insert, Update, Delete

Multi-row Insert

Create and Drop Sheets

Engine Options

Engine Capability Matrix

WHERE Operators

Compound Queries (Set Operations)

Safety Defaults

Formula Injection Defense

Transactions

Cursor Metadata

Troubleshooting

"Column 'xyz' not found"

"Table 'SheetX' not found"

PandasEngine drops formatting

Integer vs. string comparison (Pandas)

Experimental: Remote Excel via Microsoft Graph API

Related Projects

Examples

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages