[DRAFT] FEAT: ChargE3Net Fine-Tuning Pipeline on LeMatRho#8
Draft
speckhard wants to merge 7 commits intoLeMaterial:mainfrom
Draft
[DRAFT] FEAT: ChargE3Net Fine-Tuning Pipeline on LeMatRho#8speckhard wants to merge 7 commits intoLeMaterial:mainfrom
speckhard wants to merge 7 commits intoLeMaterial:mainfrom
Conversation
- Add wandb integration with --wandb-project/--wandb-entity/--no-wandb flags - Add compute_rmse() and compute_nrmse() validation metrics - Log per-step train loss and per-epoch train/val metrics to W&B - Load WANDB_API_KEY from .env via python-dotenv - Add submit_charge3net.sh for Jean Zay A100 GPU jobs - Add .gitignore (excludes .env, checkpoints, wandb, etc.)
- save_checkpoint now includes global_step - load_checkpoint restores model, optimizer, scheduler, epoch, best_nmape, global_step - SLURM script auto-detects latest.pt and passes --resume-from
Collaborator
Author
|
@mfranckel let me try to create a job script for adastra so you can run this yourself. |
Collaborator
Author
|
Sorry still needs to be cleaned up before you review it, just wanted to make you aware of it |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
`feat/charge3net` — ChargE3Net Fine-Tuning Pipeline
Overview
Fine-tunes ChargE3Net (Koker et al., npj Computational Materials 2024) on LeMatRho charge density data. The approach is:
AIforGreatGood/charge3netrepo as a sibling directory — this provides both the model code (E3DensityModel) and the pre-trained Materials Project checkpoint (models/charge3net_mp.pt, 23 MB, 1.9M params, trained for 245 epochs / 407k steps on MP charge densities).latest.pt.The training pipeline lives in
charge3net_ft/— a self-contained package within this repo.Setup
On Jean Zay, submit with
sbatch submit_charge3net.sh— it auto-resumes from$SCRATCH/charge3net_checkpoints/latest.ptif present.New Files
charge3net_ft/data.pyase.Atoms+ density grids, builds charge3net-compatible graph dicts viaKdTreeGraphConstructor. Per-worker table cache so each chunk file is read from disk once per worker, not once per sample.charge3net_ft/model.pyE3DensityModel. Handles instantiation with MP checkpoint hyperparameters and loading from 3 checkpoint formats (legacy PyTorch Lightning, new charge3net, raw state_dict).charge3net_ft/train.pysubmit_charge3net.shlatest.ptif present.tests/test_metrics.pycompute_nmape,compute_rmse,compute_nrmseincluding padding-mask correctness.tests/test_data.py_parse_grid_json,_row_to_atoms_and_density, and_build_parquet_indexusing synthetic Parquet files — no real data needed.tests/test_model.pyModified Files
pyproject.tomltorch>=2.0,numpy>=1.24,charge3net(git dep); loosened aggressive version pins.gitignore*.ptto prevent accidental checkpoint commitsTraining Results (30 epochs across 3 SLURM jobs)
Best checkpoint:
best.ptat epoch 30 (8.10% val NMAPE). Model is still improving — no plateau observed. Note: these are val-set numbers (used for checkpoint selection). Test-set numbers will be reported after the next training run using the new three-way split.Git Log