Skip to content

Commit 2ffa7bf

Browse files
committed
Release Inference code and ckpt
1 parent a37d9c6 commit 2ffa7bf

98 files changed

Lines changed: 17108 additions & 11 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

LICENSE.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2025 Johannes Schusterbauer
3+
Copyright (c) 2025 CompVis - Computer Vision and Learning LMU Munich
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 56 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,26 +18,72 @@
1818
</p>
1919

2020

21-
<p align="center">
22-
<a href="https://compvis.github.io/SCFlow/"><img src="docs/static/figures/badge-website.svg" alt="Website"></a>
23-
<a href="https://arxiv.org/abs/2508.03402"><img src="https://img.shields.io/badge/arXiv-PDF-b31b1b" alt="Paper"></a>
24-
</p>
21+
<a href="https://compvis.github.io/SCFlow/"><img src="docs/static/figures/badge-website.svg" alt="Website"></a>
22+
<a href="https://arxiv.org/abs/2508.03402"><img src="https://img.shields.io/badge/arXiv-PDF-b31b1b" alt="Paper"></a>
23+
<a href="https://huggingface.co/CompVis/SCFlow"><img src="https://img.shields.io/badge/HuggingFace-Weights-orange" alt="Paper"></a>
24+
25+
This repository contains the official implementation of the paper "SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models".
26+
We proposed a flow-matching framework that learns an invertible mapping between style-content mixtures and their separate representations, avoiding explicit disentanglement objectives. Together with the method, we have curated a 510k synthetic dataset consisting of 10k content instances and 51 distinct styles.
27+
2528

2629
<p align="center">
27-
<img src="docs/static/images/teaser.jpg" alt="Cover" width="75%">
30+
<img src="docs/static/images/teaser.jpg" alt="Cover" width="80%">
2831
</p>
2932

3033

31-
<!--
34+
35+
## 🛠️ Setup
36+
Create the enviroment with conda:
37+
```bash
38+
conda create -n scflow python=3.10
39+
conda activate scflow
40+
pip install -r requirements.txt
41+
```
42+
The enviroment was tested on `Ubuntu 22.04.5 LTS` with `CUDA 12.1`. You can *optionally* install jupyter-notebook to run the notebook provided in [`notebooks`](https://github.com/CompVis/SCFlow/tree/main/notebooks)
43+
44+
Download the model checkpoints:
45+
```bash
46+
mkdir ckpts
47+
cd ckpts
48+
49+
# model checkpoint
50+
wget -O scflow_last.ckpt https://huggingface.co/CompVis/SCFlow/resolve/main/scflow_last.ckpt?dowload=true
51+
52+
# unclip checkpoint for visualization
53+
wget -O sd21-unclip-l.ckpt https://huggingface.co/CompVis/SCFlow/resolve/main/sd21-unclip-l.ckpt?dowload=true
54+
```
55+
## 🔥 Usage
56+
Inference forward (merge content and style)
57+
```bash
58+
bash scripts/inference_forward.sh
59+
```
60+
Inference reverse (disentangle content and style from a given reference)
61+
```bash
62+
bash scripts/inference_reverse.sh
63+
```
64+
65+
Training (coming soon)
66+
```bash
67+
bash ...
68+
```
69+
70+
## 🗂️ Dataset
71+
Coming soon
3272

3373
## 🎓 Citation
3474

75+
3576
If you use this codebase or otherwise found our work valuable, please cite our paper:
3677
```bibtex
37-
TBD
38-
``` -->
78+
@article{ma2025scflow,
79+
title={SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models},
80+
author={Ma, Pingchuan and Yang, Xiaopei and Li, Yusong and Gui, Ming and Krause, Felix and Schusterbauer, Johannes and Ommer, Bj{\"o}rn},
81+
journal={arXiv preprint arXiv:2508.03402},
82+
year={2025}
83+
}
84+
```
3985

4086
## 🔥 Updates and Backlogs
4187
- [x] **[06.08.2025]** [ArXiv](https://arxiv.org/abs/2508.03402) paper avaiable.
42-
- [ ] Release Inference code and ckpt
43-
- [ ] Host the dataset and training code
88+
- [x] **[12.08.2025]** Release Inference code and ckpt
89+
- [ ] Host the dataset and training code

configs/ViT-L-14_stats.th

6.91 KB
Binary file not shown.

configs/inference.yaml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
model:
2+
scale_factor: 0.6304
3+
fm:
4+
target: scflow.cfm.FlowMatching
5+
params:
6+
sigma_min: 1.0e-08
7+
net_cfg:
8+
target: scflow.models.kakaomodels.prior.PriorTransformer
9+
params:
10+
xf_width: 2048
11+
xf_layers: 12
12+
xf_heads: 32
13+
xf_final_ln: true
14+
clip_dim: 1536
15+
16+
train:
17+
lr: 1.0e-05
18+
weight_decay: 0.0
19+
lr_scheduler_patience: 20
20+
cal_metrics: true
21+
ema_rate: 0.999
22+
ema_update_every: 1
23+
ema_update_after_step: 1000
24+
use_ema_for_sampling: true
25+
checkpoint_callback_params:
26+
every_n_train_steps: 800000
27+
save_top_k: -1
28+
verbose: False
29+
save_last: false
30+
auto_insert_metric_name: false
31+
trainer_params:
32+
max_epochs: 40
33+
num_sanity_val_steps: 0
34+
accumulate_grad_batches: 1
35+
log_every_n_steps: 50
36+
limit_val_batches: 64
37+
val_check_interval: 20000
38+
precision: 16
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
model:
2+
base_learning_rate: 1.0e-04
3+
target: scflow.ldm.models.diffusion.ddpm.ImageEmbeddingConditionedLatentDiffusion
4+
params:
5+
embedding_dropout: 0.25
6+
parameterization: "v"
7+
linear_start: 0.00085
8+
linear_end: 0.0120
9+
log_every_t: 200
10+
timesteps: 1000
11+
first_stage_key: "jpg"
12+
cond_stage_key: "txt"
13+
image_size: 96
14+
channels: 4
15+
cond_stage_trainable: false
16+
conditioning_key: crossattn-adm
17+
scale_factor: 0.18215
18+
monitor: val/loss_simple_ema
19+
use_ema: False
20+
21+
embedder_config:
22+
target: scflow.ldm.modules.encoders.modules.ClipImageEmbedder
23+
params:
24+
model: "ViT-L/14"
25+
26+
noise_aug_config:
27+
target: scflow.ldm.modules.encoders.modules.CLIPEmbeddingNoiseAugmentation
28+
params:
29+
clip_stats_path: "configs/ViT-L-14_stats.th"
30+
timestep_dim: 768
31+
noise_schedule_config:
32+
timesteps: 1000
33+
beta_schedule: squaredcos_cap_v2
34+
35+
unet_config:
36+
target: scflow.ldm.modules.diffusionmodules.openaimodel.UNetModel
37+
params:
38+
num_classes: "sequential"
39+
adm_in_channels: 1536
40+
use_checkpoint: True
41+
image_size: 32 # unused
42+
in_channels: 4
43+
out_channels: 4
44+
model_channels: 320
45+
attention_resolutions: [ 4, 2, 1 ]
46+
num_res_blocks: 2
47+
channel_mult: [ 1, 2, 4, 4 ]
48+
num_head_channels: 64
49+
use_spatial_transformer: True
50+
use_linear_in_transformer: True
51+
transformer_depth: 1
52+
context_dim: 1024
53+
legacy: False
54+
55+
first_stage_config:
56+
target: scflow.ldm.models.autoencoder.AutoencoderKL
57+
params:
58+
embed_dim: 4
59+
monitor: val/rec_loss
60+
ddconfig:
61+
attn_type: "vanilla-xformers"
62+
double_z: true
63+
z_channels: 4
64+
resolution: 256
65+
in_channels: 3
66+
out_ch: 3
67+
ch: 128
68+
ch_mult:
69+
- 1
70+
- 2
71+
- 4
72+
- 4
73+
num_res_blocks: 2
74+
attn_resolutions: [ ]
75+
dropout: 0.0
76+
lossconfig:
77+
target: torch.nn.Identity
78+
79+
cond_stage_config:
80+
target: scflow.ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
81+
params:
82+
freeze: True
83+
layer: "penultimate"

image_samples/Cubism/02316.png

333 KB
Loading

image_samples/Cubism/09728.png

1.42 MB
Loading

image_samples/Cyberpunk/02316.png

395 KB
Loading

image_samples/Cyberpunk/09728.png

392 KB
Loading
458 KB
Loading

0 commit comments

Comments
 (0)