Skip to content

DrewNow/PIGEN

Repository files navigation

Physics Informed Generation (PIGEN) of Crystal Structures

PIGEN (Physics-Informed Generation) is a framework for generating novel crystal structures by integrating physics-informed sampling, chemically guided control, and structural evaluation into a denoising diffusion model.

Building on DiffCSP, PIGEN introduces two conditioning metrics — compactness and local-environment diversity (MLED) — which guide the generative process toward physically plausible yet structurally diverse configurations. Conditioning on these descriptors consistently increases the fraction of novel crystal frameworks across diffusion architectures, including DiffCSP and MatterGen, demonstrating a clear generalisation edge for PIGEN.

The model enables generation beyond known chemical spaces and supports out-of-distribution extrapolation, yielding a higher proportion of stable, unique structures per batch compared to previous approaches.

This repository accompanies the preprint:

A. Vasylenko et al., "Physics-informed diffusion models for extrapolating crystal structures beyond known motifs", arXiv:2510.23181 (2025).

Setup environment

conda env create -f environment.yml
conda activate pigen

All dependencies are managed via conda; setup.py is only for local package registration.

System Requirements

PIGEN has been tested on:

  • Linux (x86_64) — recommended for full reproducibility and GPU training
  • macOS (ARM, Apple Silicon) — supported for CPU inference and development
  • CUDA ≥ 12.1 - for GPU acceleration

Note for macOS (ARM):
Some PyTorch Geometric packages (torch-scatter, torch-sparse, etc.) are not available through conda.
After activating the environment, install them manually:

conda env create -f environment.osx-arm64.yml
conda activate pigen
python -c "import torch; print('torch version:', torch.__version__)"
pip install torch_geometric
pip install pyg-lib torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-<your_torch_version>+cpu.html

Install package (editable mode)

pip install -e .

Data used for training can be accessed at

https://huggingface.co/datasets/UoLiverpool/Alex_MP_20_M_LED/

Model training

For re-training the model with the default dataset - Alex_MP_20_MLED, run:

python pigen/train.py

This will use the default data and conditioning properties and is equivalent to

python pigen/train.py --data_name Alex_MP_20_M_LED --prop ['entropy_sum', 'target_energy']

,where 'entropy_sum' is a technical term for MLED, 'target_energy' stands for Compactness here

Model Inference

You can use your trained model or download the model's checkpoint from: huggingface.co/DeepDrew/PIGEN/

After downloading, place the checkpoint file in:

checkpoints/

This ensures pigen/generate.py can locate it.

Run

cd pigen
python generate.py

which will generate example structures in examples/ folder. It should take less than 10 minutes.

Code Base and Key Contributions

This repository builds on DiffCSP, an open-source implementation of denoising diffusion probabilistic models for crystal structure prediction. We have further developed and extended it as described below.

  • Physics-informed logic integrated into the sampling process
  • Conditional generation with target-guided control via classifier-free guidance
  • Featurised dataset with local chemical and structural environment feature, enabling out-of-distribution extrapolation
  • Chemistry-informed structure evaluation tools
  • Modular refactoring for better reproducibility and configuration management.
  • Support for PyTorch Distributed Data Parallel to accelerate large-scale training across multiple GPUs or nodes

Tests

Run:

pytest tests 

Run with Docker

Note: The Docker image is intentionally left with a flexible entry point (/bin/bash) to allow the user to either train or generate as needed, following the instructions below. This design choice supports both CPU and GPU environments.

docker build -t pigen .

To run with CPU only:

docker run --rm pigen

To run with GPU:

docker run --rm --gpus all pigen

Project structure

├── checkpoints
├── data
│   └── Alex_MP_20_M_LED/
├── environment.yml
├── log
├── pigen
│   ├── __init__.py
│   ├── assets/
│   ├── common/
│   ├── eval/
│   ├── generate.py
│   ├── normalization
│   ├── partial_sample.py
│   ├── settings.py
│   └── train.py
├── README.md
├── setup.py
├── tests
│   ├── dummy_data/
│   ├── dummy_logs/
│   ├── fixtures/
│   ├── conftest.py
│   ├── test_dependecies.py
│   ├── test_dummy_training.py
│   ├── test_pd_structure_parsing.py
│   └── test_torch_installation.py
└── verify_environment_installs.py

Citing

If you use this code or metrics, please consider citing: A. Vasylenko et al., "Physics-informed diffusion models for extrapolating crystal structures beyond known motifs", arXiv:2510.23181

License and Credit

We gratefully acknowledge the authors of DiffCSP for their contribution to the research and open-source community. This project is licensed under the MIT License, consistent with DiffCSP.
All modifications are © 2025 Andrij Vasylenko.

About

Physics-informed diffusion model for generating novel crystal structures beyond known prototypes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors