Caliby provides a Python API for programmatic use in scripts and notebooks, without needing Hydra or the CLI. Install Caliby as usual (uv pip install -e . or uv pip install "git+https://github.com/ProteinDesignLab/caliby.git") and import directly:
from caliby import load_model- Quick start
- Loading a model
- Cleaning input structures
- Sequence design
- Ensemble-conditioned sequence design
- Scoring
- Positional constraints
- Additional sampling options
- Sidechain packing
- Self-consistency evaluation with AlphaFold2
- One-shot convenience functions
- API reference
from caliby import load_model
# Load the model once and reuse it.
model = load_model("caliby")
# Design 4 sequences for a PDB file.
results = model.sample(
["examples/example_data/native_pdbs/7xhz.cif"],
num_seqs_per_pdb=4,
out_dir="outputs/seq_des",
)
# results is a dict with keys: "example_id", "out_pdb", "seq", "U", "input_seq"
for name, seq, energy in zip(results["example_id"], results["seq"], results["U"]):
print(f"{name}: {seq} (U={energy:.2f})")Use load_model() to load a model checkpoint. The model is returned as a CalibyModel object that you can reuse across multiple calls.
from caliby import load_model
# Load by model name (weights are auto-downloaded from HuggingFace on first use).
model = load_model("caliby") # Default model
model = load_model("soluble_caliby") # Excludes transmembrane proteins
model = load_model("soluble_caliby_v1") # Trained on monomers and interfaces
# Load from a custom checkpoint path.
model = load_model("/path/to/custom_model.ckpt")
# Specify device (defaults to "cuda" if available, else "cpu").
model = load_model("caliby", device="cuda:1")Available model names are listed in the Download model weights section of the README.
We recommend cleaning input PDB/CIF files before sequence design, especially if the structures came from another pipeline or if you plan to generate ensembles with Protpardelle-1c. The cleaner fixes blank chain IDs, removes unresolved atoms, keeps only protein chains, filters to residue names supported by downstream tools, and writes cleaned mmCIF files. This helps avoid downstream parsing, chain ID, and residue alignment issues.
from caliby import clean_pdbs, load_model
cleaned_pdb_paths = clean_pdbs(
["protein1.pdb", "protein2.cif"],
out_dir="outputs/cleaned_pdbs",
num_workers=4,
)
model = load_model("caliby")
results = model.sample(cleaned_pdb_paths, num_seqs_per_pdb=4)Design sequences for one or more PDB/CIF files:
model = load_model("caliby")
results = model.sample(
["protein1.pdb", "protein2.cif"],
num_seqs_per_pdb=4, # 4 sequences per structure
batch_size=4, # batch size for processing
out_dir="outputs/", # directory for output CIF files (temp dir if None)
)
# Access results.
results["example_id"] # e.g. ["protein1_0", "protein1_1", ...]
results["seq"] # designed sequences
results["U"] # Potts energy of each designed sequence
results["out_pdb"] # paths to output CIF files
results["input_seq"] # native sequence of the input structureTo design sequences for all PDBs in a directory:
from pathlib import Path
pdb_dir = Path("examples/example_data/native_pdbs")
pdb_paths = [str(p) for p in sorted(pdb_dir.glob("*.cif"))]
results = model.sample(pdb_paths, num_seqs_per_pdb=4, out_dir="outputs/seq_des")Designing on synthetic structural ensembles (rather than a single static structure) produces sequences that are both more diverse and more likely to fold into the target structure.
Use generate_ensembles() to generate conformer ensembles via Protpardelle-1c partial diffusion:
from caliby import generate_ensembles, load_model
# Generate 32 conformers per PDB.
pdb_to_conformers = generate_ensembles(
["examples/example_data/native_pdbs/7xhz.cif"],
out_dir="outputs/ensembles",
num_samples_per_pdb=32,
batch_size=8,
)
# pdb_to_conformers: {"7xhz": ["outputs/ensembles/.../sample_0.pdb", ...]}
# Run ensemble-conditioned sequence design.
model = load_model("caliby")
results = model.ensemble_sample(
pdb_to_conformers,
num_seqs_per_pdb=4,
out_dir="outputs/seq_des_ensemble",
)We recommend generating at least 32 conformers per PDB, but 16 or 8 can also give good results.
If you have your own ensemble of conformers, pass them as a dict mapping PDB name to a list of file paths. The first path in each list is treated as the primary conformer:
pdb_to_conformers = {
"my_protein": [
"ensembles/my_protein/my_protein.pdb", # primary conformer (first)
"ensembles/my_protein/conformer_1.pdb",
"ensembles/my_protein/conformer_2.pdb",
# ... up to max_num_conformers (default 32)
],
}
results = model.ensemble_sample(
pdb_to_conformers,
num_seqs_per_pdb=4,
out_dir="outputs/seq_des_ensemble",
)All conformers must have matching residue indices and chain IDs. If you encounter a "Residue index / chain ID mismatch" error, see the FAQ in the README.
Score the native sequences of PDB/CIF files:
model = load_model("caliby")
results = model.score(["protein.pdb"])
results["example_id"] # PDB identifiers
results["seq"] # native sequences
results["U"] # global Potts energy
results["U_i"] # per-residue energy contributionsScore a sequence against an ensemble of conformer backbones:
pdb_to_conformers = {
"my_protein": [
"ensembles/my_protein/my_protein.pdb", # primary conformer
"ensembles/my_protein/conformer_1.pdb",
"ensembles/my_protein/conformer_2.pdb",
],
}
results = model.score_ensemble(pdb_to_conformers)The sequence from the primary conformer is scored; sequences of additional conformers are ignored.
Use make_constraints() to build a constraint DataFrame, then pass it to sample() or ensemble_sample():
from caliby import make_constraints
constraints = make_constraints({
"7xhz": {
"fixed_pos_seq": "A6-15,A20-50",
"fixed_pos_scn": "A6-15",
},
})
results = model.sample(
["examples/example_data/native_pdbs/7xhz.cif"],
num_seqs_per_pdb=4,
pos_constraint_df=constraints,
)You can also pass a pd.DataFrame directly with a pdb_key column and any subset of the constraint columns described below.
Fix certain residue positions so they retain their native sequence during design. Residue positions should use label_seq_id (not auth_seq_id). In PyMOL, run set cif_use_auth, off before loading a PDB to view positions in this numbering.
constraints = make_constraints({
"7xhz": {
"fixed_pos_seq": "A6-15,A20-50", # fix sequence at these positions
"fixed_pos_scn": "A6-15", # also fix sidechains (must be subset of fixed_pos_seq)
},
})Restrict which amino acids are allowed at specific positions:
constraints = make_constraints({
"8huz": {
"pos_restrict_aatype": "A6:QR,A7:QR,A8:QR,A9:QR,A10:QR,A11:QR",
},
})Override the sequence at specific positions before conditioning:
constraints = make_constraints({
"7xz3": {
"fixed_pos_override_seq": "A36:C,A37:C,A38:C,A39:C,A40:C",
},
})Tie sampling across residue positions (e.g., for homooligomers):
constraints = make_constraints({
"my_trimer": {
"symmetry_pos": "A10,B10,C10|A11,B11,C11|A12,B12,C12",
},
})Positions separated by commas within a group are tied together. Groups are separated by |.
Multiple constraint types can be combined for the same PDB, and constraints for multiple PDBs can be specified in a single call:
constraints = make_constraints({
"7xhz": {
"fixed_pos_seq": "A6-15",
"pos_restrict_aatype": "A20:AVG,A21:AVG",
},
"8huz": {
"pos_restrict_aatype": "A6:QR,A7:QR",
},
})
results = model.sample(pdb_paths, num_seqs_per_pdb=4, pos_constraint_df=constraints)When using ensemble_sample(), constraints must be expanded so that every conformer in the ensemble gets a matching row. Use make_ensemble_constraints() to handle this automatically:
from caliby import make_ensemble_constraints
# pdb_to_conformers from generate_ensembles() or your own ensemble
pdb_to_conformers = {
"7xhz": ["7xhz.cif", "conformer_0.pdb", "conformer_1.pdb"],
}
constraints = make_ensemble_constraints(
{"7xhz": {"fixed_pos_seq": "A6-15", "pos_restrict_aatype": "A20:AVG"}},
pdb_to_conformers,
)
results = model.ensemble_sample(
pdb_to_conformers,
num_seqs_per_pdb=4,
pos_constraint_df=constraints,
)This is equivalent to calling make_constraints() and then manually replicating each row for every conformer path, but is less error-prone.
Globally exclude certain amino acids from all designed positions:
results = model.sample(pdb_paths, omit_aas=["C", "M"])By default, Caliby anneals the sampling temperature from 1.0 to 0.01. Raise the final temperature for more diverse sequences:
results = model.sample(pdb_paths, temperature=0.1)Temperatures from 0.1 to 0.2 also perform reasonably, but this is case-dependent.
Any parameter from the sampling config can be overridden via sampling_overrides:
results = model.sample(
pdb_paths,
sampling_overrides={
"potts_sampling_cfg": {
"potts_sweeps": 1000,
"regularization": "LCP",
},
},
)Pack sidechains onto backbone structures using a diffusion-based packer. Note that sidechain packing uses a different model checkpoint from sequence design:
packer = load_model("caliby_packer_010") # 0.1A noise (recommended)
results = packer.sidechain_pack(
["protein.pdb"],
out_dir="outputs/packed",
)
results["example_id"] # PDB identifiers
results["out_pdb"] # paths to packed output CIF filesAvailable packer models: caliby_packer_000 (0.0A), caliby_packer_010 (0.1A, recommended), caliby_packer_030 (0.3A).
Evaluate designed sequences by folding them with single-sequence AlphaFold2 and comparing against the design input. Requires the af2 extra:
uv pip install -e ".[af2]"model = load_model("caliby")
# Design sequences.
results = model.sample(pdb_paths, num_seqs_per_pdb=2, out_dir="outputs/designed")
# Fold and evaluate.
sc_results = model.self_consistency_eval(
results["out_pdb"],
out_dir="outputs/af2_eval",
num_models=5,
num_recycles=3,
)
# sc_results: {"protein_0": {"sc_ca_rmsd": 1.2, "avg_ca_plddt": 85.3, "tmalign_score": 0.95}, ...}
for example_id, metrics in sc_results.items():
print(f"{example_id}: scRMSD={metrics['sc_ca_rmsd']:.2f}, "
f"pLDDT={metrics['avg_ca_plddt']:.1f}, TM={metrics['tmalign_score']:.3f}")For quick, one-off runs where you don't need to reuse the model, Caliby provides module-level convenience functions that load the model, run the task, and return results:
from caliby import (
caliby_sample,
caliby_ensemble_sample,
caliby_score,
caliby_score_ensemble,
caliby_sidechain_pack,
)
# These load the model each time — prefer load_model() for repeated calls.
results = caliby_sample(["protein.pdb"], num_seqs_per_pdb=4)
results = caliby_score(["protein.pdb"])
results = caliby_sidechain_pack(["protein.pdb"], model_name="caliby_packer_010")Load a Caliby model for reuse across multiple calls.
| Parameter | Type | Default | Description |
|---|---|---|---|
model_name |
str |
"caliby" |
Model name or path to .ckpt file |
device |
str | None |
None |
Torch device (defaults to "cuda" if available) |
sampling_cfg_path |
str | None |
None |
Custom sampling YAML config path |
Clean input PDB/CIF files and write cleaned mmCIF copies.
| Parameter | Type | Default | Description |
|---|---|---|---|
pdb_paths |
list[str] |
required | Paths to input PDB/CIF files |
out_dir |
str | None |
None |
Output directory for cleaned mmCIF files |
num_workers |
int |
1 |
Number of parallel workers |
Returns: list[str] of cleaned mmCIF paths in the same order as pdb_paths.
| Parameter | Type | Default | Description |
|---|---|---|---|
pdb_paths |
list[str] |
required | Paths to PDB/CIF files |
out_dir |
str | None |
None |
Output directory (temp dir if None) |
num_seqs_per_pdb |
int | None |
1 |
Sequences per structure |
batch_size |
int | None |
4 |
Batch size |
omit_aas |
list[str] | None |
None |
Amino acids to exclude |
num_workers |
int | None |
8 |
Data loading workers |
temperature |
float | None |
0.01 |
Final Potts sampling temperature |
verbose |
bool | None |
True |
Print constraint info |
pos_constraint_df |
DataFrame | None |
None |
Positional constraints |
sampling_overrides |
dict | None |
None |
Advanced config overrides |
Returns: dict with keys "example_id", "out_pdb", "seq", "U", "input_seq".
Same parameters as sample(), plus:
| Parameter | Type | Default | Description |
|---|---|---|---|
pdb_to_conformers |
dict[str, list[str]] |
required | Maps PDB name to conformer paths (first is primary) |
use_primary_res_type |
bool |
True |
Use residue types from the primary conformer |
| Parameter | Type | Default | Description |
|---|---|---|---|
pdb_paths |
list[str] |
required | Paths to PDB/CIF files |
batch_size |
int | None |
4 |
Batch size |
num_workers |
int | None |
8 |
Data loading workers |
sampling_overrides |
dict | None |
None |
Advanced config overrides |
Returns: dict with keys "example_id", "seq", "U", "U_i".
| Parameter | Type | Default | Description |
|---|---|---|---|
pdb_to_conformers |
dict[str, list[str]] |
required | Maps PDB name to conformer paths |
num_workers |
int | None |
8 |
Data loading workers |
sampling_overrides |
dict | None |
None |
Advanced config overrides |
Returns: dict with keys "example_id", "seq", "U", "U_i".
| Parameter | Type | Default | Description |
|---|---|---|---|
pdb_paths |
list[str] |
required | Paths to PDB/CIF files |
out_dir |
str | None |
None |
Output directory (temp dir if None) |
batch_size |
int | None |
4 |
Batch size |
num_workers |
int | None |
8 |
Data loading workers |
sampling_overrides |
dict | None |
None |
Advanced config overrides |
Returns: dict with keys "example_id", "out_pdb".
| Parameter | Type | Default | Description |
|---|---|---|---|
designed_pdbs |
list[str] |
required | Paths to designed PDB/CIF files |
out_dir |
str | None |
None |
Output directory (temp dir if None) |
num_models |
int |
5 |
Number of AF2 models to sample |
sample_models |
bool |
True |
Randomly sample from the 5 AF2 models |
num_recycles |
int |
3 |
AF2 recycling iterations |
use_multimer |
bool |
False |
Use AF2-Multimer |
Returns: dict mapping example_id to {"sc_ca_rmsd", "avg_ca_plddt", "tmalign_score"}.
Build a positional constraint DataFrame from a dict.
make_constraints({"pdb_key": {"fixed_pos_seq": "A1-50", ...}})Build a constraint DataFrame expanded across ensemble conformers. Wraps make_constraints() and replicates each PDB's constraints for every conformer.
make_ensemble_constraints(
{"7xhz": {"fixed_pos_seq": "A6-15"}},
{"7xhz": ["7xhz.cif", "conf_0.pdb", "conf_1.pdb"]},
)| Parameter | Type | Default | Description |
|---|---|---|---|
pdb_paths |
list[str] |
required | Paths to input PDB/CIF files |
out_dir |
str |
required | Output directory |
num_samples_per_pdb |
int |
32 |
Conformers per structure |
batch_size |
int |
8 |
Batch size for Protpardelle |
model_params_path |
str | None |
None |
Model weights directory |
sampling_yaml_path |
str | None |
None |
Protpardelle config path |
seed |
int |
0 |
Random seed |
Returns: dict mapping PDB stem to list of generated conformer file paths.