Atompack

Append-only molecule storage for atomistic ML datasets.

Atompack is a Python package plus Rust core crate for writing, reading, and distributing molecular structures with forces, energies, charges, stresses, and custom properties. It is designed for dataset creation, training-time random access, batched loading, and simple distribution as .atp files or shard directories.

Documentation · Hugging Face datasets

Installation

pip install atompack

Hugging Face support ships in the base package.

Install from source

uv pip install "git+https://github.com/LeMaterial/atompack.git@main#subdirectory=atompack-py"

Quick Start

import atompack
import numpy as np

positions = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]], dtype=np.float32)
atomic_numbers = np.array([6, 8], dtype=np.uint8)

mol = atompack.Molecule.from_arrays(positions, atomic_numbers)
mol.energy = -123.456
mol.forces = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], dtype=np.float32)

db = atompack.Database("data.atp", overwrite=True)
db.add_molecule(mol)
db.flush()

db = atompack.Database.open("data.atp")
print(db[0].energy)

batch = db.get_molecules_flat([0])
print(batch["positions"].shape)

Database.open(path) is read-only and mmap-backed by default. Reopen with Database.open(path, mmap=False) when you want to append molecules.

Hugging Face Hub

import atompack

db = atompack.hub.open("LeMaterial/Atompack", "omat/train")
print(len(db))
print(db[0].energy)
db.close()

db = atompack.hub.open("LeMaterial/Atompack", "omol/train")
batch = db.get_molecules([0, 1, 2])
print(len(batch))
db.close()

Atompack is commonly used to reopen remote datasets directly from the Hub, especially from LeMaterial/Atompack. Typical shard layouts include omat/train and omol/train.

If you want a local copy first:

local_path = atompack.hub.download("LeMaterial/Atompack", "omat/train")
db = atompack.hub.open_path(local_path)

Features

Append-friendly storage with explicit flush() commits
Read-only mmap mode for fast indexed access on static datasets
Batch-oriented Python APIs for numpy and ASE ingestion
Builtin support for common atomistic ML fields and custom properties
Hugging Face Hub helpers for upload, download, and read-only reopening
Optional compression with none, lz4, and zstd

Performance

Atompack is optimized for read-heavy atomistic ML workloads: random indexed reads, multiprocessing data loading, and immutable dataset snapshots. The maintained benchmarks show strong read behavior, strong batch-write throughput, and storage efficiency that stays close to compact array-oriented formats.

For the benchmark narrative and current figures, see the release blog post and performance docs.

Documentation

Development

This repository uses uv for Python tooling:

# From the repo root
make ci-py
make py-dev
make docs

Or run the Python tools directly:

cd atompack-py
uv sync --extra dev --locked
uv run --extra dev --locked ruff format python
uv run --extra dev --locked ruff check python
uv run --extra dev --locked --with "maturin>=1.4,<2.0" maturin develop
uv run --extra dev --locked pytest

Rust entrypoints:

cargo run -p atompack --example basic_usage
cargo run -p atompack --release --bin atompack-bench -- --help

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
assets		assets
atompack-py		atompack-py
atompack		atompack
docs		docs
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atompack

Installation

Install from source

Quick Start

Hugging Face Hub

Features

Performance

Documentation

Development

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Atompack

Installation

Install from source

Quick Start

Hugging Face Hub

Features

Performance

Documentation

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages