Skip to content

Commit e4ed1ec

Browse files
Dec 3rd
1 parent 6f691fa commit e4ed1ec

94 files changed

Lines changed: 3651 additions & 209 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

MANIFEST.in

Lines changed: 0 additions & 8 deletions
This file was deleted.

NEW_REPO/VERSION

Lines changed: 0 additions & 1 deletion
This file was deleted.

NEW_REPO/__init__.py

Lines changed: 0 additions & 4 deletions
This file was deleted.

README.md

Lines changed: 102 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,102 @@
1-
# NEW_REPO
2-
3-
This repository serves only as a Python template for new projects.
4-
5-
## Create a new repository
6-
7-
- Create a [new repo](https://github.com/new) and select `CellProfiling/cell-pro-template` as template repository.
8-
- Clone your new repo.
9-
- Search and replace all occurences of `NEW_REPO`, `AUTHOR_NAME` and `AUTHOR_EMAIL`. Replace `NEW_REPO` with the name of the new repo.
10-
- Add package requirements in `install_requires` in [`setup.py`](setup.py) and in [`requirements.txt`](requirements.txt) as needed.
11-
- Update this `README.md` with a description of and instructions for your new repo.
12-
13-
## Development
14-
15-
- Install and set up development environment.
16-
17-
```sh
18-
pip install -r requirements_dev.txt
19-
```
20-
21-
This will install all requirements.
22-
It will also install this package in development mode, so that code changes are applied immediately without reinstall necessary.
23-
24-
- Here's a list of development tools we use.
25-
- [black](https://pypi.org/project/black/)
26-
- [flake8](https://pypi.org/project/flake8/)
27-
- [pydocstyle](https://pypi.org/project/pydocstyle/)
28-
- [pylint](https://pypi.org/project/pylint/)
29-
- [pytest](https://pypi.org/project/pytest/)
30-
- [tox](https://pypi.org/project/tox/)
31-
- It's recommended to use the corresponding code formatter and linters also in your code editor to get instant feedback. A popular editor that can do this is [`vscode`](https://code.visualstudio.com/).
32-
- Run all tests, check formatting and linting.
33-
34-
```sh
35-
tox
36-
```
37-
38-
- Run a single tox environment.
39-
40-
```sh
41-
tox -e lint
42-
```
43-
44-
- Reinstall all tox environments.
45-
46-
```sh
47-
tox -r
48-
```
49-
50-
- Run pytest and all tests.
51-
52-
```sh
53-
pytest
54-
```
55-
56-
- Run pytest and calculate coverage for the package.
57-
58-
```sh
59-
pytest --cov-report term-missing --cov=NEW_REPO
60-
```
61-
62-
- Continous integration is by default supported via [GitHub actions](https://help.github.com/en/actions). GitHub actions is free for public repos and comes with 2000 free Ubuntu build minutes per month for private repos.
63-
64-
- To activate continuous integration testing on Travis CI, add a `.travis.yml` file with this contents to the repo.
65-
66-
```yaml
67-
dist: xenial
68-
language: python
69-
cache: pip
70-
python:
71-
- "3.6"
72-
- "3.7"
73-
- "3.8"
74-
install:
75-
- pip install -U tox-travis
76-
script: tox
77-
```
78-
79-
Note that Travis CI is free for public repos, but requires a subscription for private repos.
1+
# ProtVL Inference Pipeline
2+
3+
A multi-GPU inference pipeline for generating protein expression images using ProtVL.
4+
5+
## Overview
6+
7+
This script performs conditional image generation for proteins. It takes reference microscopy channels (DAPI, tubulin, ER) as input and generates predicted protein expression patterns. Supports distributed inference across multiple GPUs via HuggingFace Accelerate.
8+
9+
## Requirements
10+
11+
- Python 3.x
12+
- PyTorch
13+
- HuggingFace Diffusers & Accelerate
14+
- timm
15+
- NumPy, Pandas, SciPy
16+
- tifffile
17+
- tqdm
18+
19+
## Usage
20+
21+
CPU:
22+
```bash
23+
python ordinary_sampler_standalone.py \
24+
--csv_file_path", "p4ha2_example.csv \
25+
--model_path, ./checkpoint-1020000/ \
26+
--vae_path ./vae \
27+
--antibody_map_path ./antibody_map.pkl \
28+
--cell_line_map_path ./cell_line_dict.pkl \
29+
--antibody_map_path ./antibody_dict.pkl \
30+
--mixed_precision ./example_output\
31+
--batch_size 16 \
32+
--num_workers, 4 \
33+
--num_inference_steps 100
34+
```
35+
36+
Single GPU:
37+
```bash
38+
python ordinary_sampler_standalone.py \
39+
--csv_file_path", "p4ha2_example.csv \
40+
--model_path, ./checkpoint-1020000/ \
41+
--vae_path ./vae \
42+
--antibody_map_path ./antibody_map.pkl \
43+
--cell_line_map_path ./cell_line_dict.pkl \
44+
--antibody_map_path ./antibody_dict.pkl \
45+
--mixed_precision ./example_output\
46+
--batch_size 16 \
47+
--num_workers, 4 \
48+
--num_inference_steps 100
49+
```
50+
51+
Multi-GPU with Accelerate:
52+
```bash
53+
accelerate launch --num_processes 4 ordinary_sampler_standalone.py \
54+
--csv_file_path", "p4ha2_example.csv \
55+
--model_path, ./checkpoint-1020000/ \
56+
--vae_path ./vae \
57+
--antibody_map_path ./antibody_map.pkl \
58+
--cell_line_map_path ./cell_line_dict.pkl \
59+
--antibody_map_path ./antibody_dict.pkl \
60+
--mixed_precision ./example_output\
61+
--batch_size 16 \
62+
--num_workers 4 \
63+
--num_inference_steps 100
64+
```
65+
66+
67+
### Key Arguments
68+
69+
| Argument | Default | Description |
70+
|----------|---------|-------------|
71+
| `--model_path` | Required | Path to pretrained DiT model |
72+
| `--vae_path` | Required | Path to VAE checkpoint |
73+
| `--csv_file_path` | Required | CSV with image paths and metadata |
74+
| `--cell_line_map_path` | Required | Cell line name-to-index mapping |
75+
| `--antibody_map_path` | Required | Antibody name-to-index mapping |
76+
| `--output_dir` | `output` | Output directory for generated images |
77+
| `--batch_size` | 4 | Samples per GPU |
78+
| `--num_inference_steps` | 50 | Diffusion sampling steps |
79+
| `--mixed_precision` | `no` | Mixed precision mode (`no`, `fp16`, `bf16`) |
80+
81+
## Input Format
82+
83+
**CSV file** must contain columns:
84+
- `image_path`: Path to input TIFF
85+
- `cell_line_name`: Cell line identifier
86+
- `gene_name`: Target protein/antibody name
87+
88+
**Image format**: Normalized (-1 to 1) 3 or 4-channel TIFF (DAPI, Antibody (Optional), Tubulin, ER) with shape (H, W, C)
89+
90+
## Output
91+
92+
For each input image, generates:
93+
- `{basename}_{cell_line}_{protein}_pred.tif`: Predicted protein + reference channels
94+
- `{basename}_{cell_line}_{protein}_real.tif`: Ground truth + reference channels (if available)
95+
96+
Output TIFFs have 4 channels in order: DAPI, Protein, Tubulin, ER
97+
98+
## Logging
99+
100+
Synchronized logs across all GPUs are written to `--log_dir`:
101+
- `inference_log_{timestamp}.txt`: Human-readable log
102+
- `metrics_{timestamp}.json`: Machine-parseable metrics

antibody_map.pkl

149 KB
Binary file not shown.

cell_line_map.pkl

503 Bytes
Binary file not shown.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{
2+
"_class_name": "DiTTransformer2DModelWithCrossAttention",
3+
"_diffusers_version": "0.33.1",
4+
"_name_or_path": "/scratch/groups/emmalu/marvinli/twisted_diffusion/latent_diffusion_edm/output_crossattn_L/checkpoint-895000",
5+
"activation_fn": "gelu-approximate",
6+
"attention_bias": true,
7+
"attention_head_dim": 64,
8+
"cross_attention_dim": null,
9+
"dropout": 0.0,
10+
"in_channels": 32,
11+
"norm_elementwise_affine": false,
12+
"norm_eps": 1e-05,
13+
"norm_num_groups": 32,
14+
"norm_type": "ada_norm_zero_continuous",
15+
"num_attention_heads": 16,
16+
"num_cell_labels": 42,
17+
"num_embeds_ada_norm": 1000,
18+
"num_layers": 24,
19+
"num_protein_labels": 12810,
20+
"out_channels": 16,
21+
"patch_size": 2,
22+
"positional_embeddings": null,
23+
"sample_size": 64,
24+
"upcast_attention": false
25+
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{
2+
"_class_name": "DiTTransformer2DModelWithCrossAttention",
3+
"_diffusers_version": "0.33.1",
4+
"activation_fn": "gelu-approximate",
5+
"attention_bias": true,
6+
"attention_head_dim": 64,
7+
"cross_attention_dim": null,
8+
"decay": 0.9995,
9+
"dropout": 0.0,
10+
"in_channels": 32,
11+
"inv_gamma": 1.0,
12+
"min_decay": 0.0,
13+
"norm_elementwise_affine": false,
14+
"norm_eps": 1e-05,
15+
"norm_num_groups": 32,
16+
"norm_type": "ada_norm_zero_continuous",
17+
"num_attention_heads": 16,
18+
"num_cell_labels": 42,
19+
"num_embeds_ada_norm": 1000,
20+
"num_layers": 24,
21+
"num_protein_labels": 12810,
22+
"optimization_step": 1020000,
23+
"out_channels": 16,
24+
"patch_size": 2,
25+
"positional_embeddings": null,
26+
"power": 0.75,
27+
"sample_size": 64,
28+
"upcast_attention": false,
29+
"update_after_step": 0,
30+
"use_ema_warmup": true
31+
}

0 commit comments

Comments
 (0)