Woosh - Sound Effect Generative Models

This repository provides inference code and open weights for the sound effect generative models developed at Sony AI. The current public release includes four models addressing the text-to-audio (T2A) and video-to- audio (V2A) tasks:

Audio encoder/decoder (Woosh-AE): High-quality latent encoder/decoder providing latents for generative modeling and decoding audio from generated latents.
Text conditioning (Woosh-CLAP): Multimodal text-audio alignment model providing token la- tents for diffusion model conditioning.
T2A Generation (Woosh-Flow and Woosh-DFlow): Original and distilled LDMs generating au- dio unconditionally or from given a text prompt.
V2A Generation (Woosh-VFlow): Multimodal LDM generating audio from a video sequence with optional text prompts.

Installation

Start by installing uv first

 pip install uv

and then the Woosh environment, with either:

cpu support,

uv sync --extra cpu

or cuda support,

uv sync --extra cuda

Download model weights

Open model weights are available for all Woosh models trained on public datasets. You can download and unzip the pretrained weights from the releases page, or otherwise using the github CLI as

gh release download v1.0.0
unzip '*.zip'

The checkpoints should be located in folders named checkpoints/MODEL_NAME, each containing config and weight files.

Download media samples

We provide audio samples to be used as inputs to our test_Woosh-*.py test scripts. You can download and unzip the file samples.zip from the releases page, or otherwise using the github CLI as

gh release download v1.0.0 -p 'samples.zip'
unzip samples.zip

Usage

Test scripts

An inference test script for every model is provided. Just run any of the following

uv run test_Woosh-AE.py
uv run test_Woosh-Flow.py
uv run test_Woosh-DFlow.py
uv run test_Woosh-VFlow.py
uv run test_Woosh-DVFlow.py
uv run test_Woosh-CLAP.py

and the generated audio/video will be written to outputs/ as .wav/.mp4 audio/video files.

Check our tech report on arxiv.org for a description of all models.

Gradio demos

Two basic Gradio demos, for Woosh-Flow and Woosh-DFlow models, are available. To launch a Gradio demo locally, run one of the following

uv run gradio_Woosh-Flow.py
uv run gradio_Woosh-DFlow.py

Open a web browser on the same machine and access the demo at https://127.0.0.1:7860.

API server

Woosh models can be served via our API server. Check the API folder for usage details.

Citation

For details about model architecture, training and evaluation, please check our tech report available on arxiv.org.

@misc{hadjeres2026,
      title={Woosh: A Sound Effects Foundation Model},
      author={Gaetan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrichi, Hakim Missoum, Joan Serrà and Yuki Mitsufuji},
      year={2026},
      eprint={2604.01929},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2604.01929},
}

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

The majority of the code in this repository is released under the MIT license. The video-to-audio Woosh-VFlow and Woosh-DVFlow models use adapted code from MM-AUDIO and MotionFormer. The code for these models is made available under Apache v2 license terms.
The open weights in the releases page are released under the CC-BY-NC license.
The test audio and video samples in the releases page contain their individual license terms in the corresponding download file.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
api		api
checkpoints		checkpoints
reaper_script		reaper_script
woosh		woosh
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE.Apachev2		LICENSE.Apachev2
LICENSE.Freesound		LICENSE.Freesound
LICENSE.MIT		LICENSE.MIT
README.md		README.md
gradio_Woosh-DFlow.py		gradio_Woosh-DFlow.py
gradio_Woosh-Flow.py		gradio_Woosh-Flow.py
gradio_Woosh-VFlow.py		gradio_Woosh-VFlow.py
pyproject.toml		pyproject.toml
test_Woosh-AE.py		test_Woosh-AE.py
test_Woosh-CLAP.py		test_Woosh-CLAP.py
test_Woosh-DFlow.py		test_Woosh-DFlow.py
test_Woosh-DVFlow.py		test_Woosh-DVFlow.py
test_Woosh-Flow.py		test_Woosh-Flow.py
test_Woosh-VFlow.py		test_Woosh-VFlow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Woosh - Sound Effect Generative Models

Installation

Download model weights

Download media samples

Usage

Test scripts

Gradio demos

API server

Citation

Contributing

License

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Woosh - Sound Effect Generative Models

Installation

Download model weights

Download media samples

Usage

Test scripts

Gradio demos

API server

Citation

Contributing

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages