GitHub - yasuohasegawa/ios-fastspeech2-hifigan: On-device iOS Text-to-Speech using FastSpeech2 and HiFi-GAN (Japanese & English)

🗣 iOS FastSpeech2 + HiFi-GAN

This project integrates FastSpeech2 and HiFi-GAN for on-device TTS (Text-to-Speech) on iOS. It currently supports both Japanese and English, with one voice per language.

Model export to Core ML and some challenging implementation parts were assisted by ChatGPT and Gemini.

📦 Models Used

Japanese: espnet/kan-bayashi_jsut_fastspeech2

English: espnet/kan-bayashi_ljspeech_fastspeech2

🈶 Japanese TTS

We use OpenJTalk to extract Japanese phonemes from input text. These phonemes are then used as input for the FastSpeech2 model.

🇺🇸 English TTS

For English, we convert graphemes to phonemes using the CMU Pronouncing Dictionary. The phonemes are then used as input for FastSpeech2.

🔊 Naturalness via HiFi-GAN

To improve naturalness, the mel-spectrogram output from FastSpeech2 is passed to HiFi-GAN to synthesize waveform audio.

📦 Dependencies

This project relies on the following external libraries and tools:

OpenJTalkForiOS Used for extracting Japanese phonemes from input text. Follow the installation instructions in the repo to integrate it into your Xcode project.

⚠️ Device Compatibility

❌ Xcode Simulator is not supported.
✅ Tested only on iPhone 15 Pro.
Other devices are untested.

⚠️ Disclaimer

This project is provided as-is. Please use and test at your own risk — we do not provide support.

📄 License

This project is licensed under the Apache License 2.0.

The models and code are based on:

Modifications include conversion to Core ML format and integration with iOS.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ios-fastspeech2-hifigan		ios-fastspeech2-hifigan
LICENSE		LICENSE
README.md		README.md
convert_models.py		convert_models.py
convert_models_en.py		convert_models_en.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗣 iOS FastSpeech2 + HiFi-GAN

📦 Models Used

🈶 Japanese TTS

🇺🇸 English TTS

🔊 Naturalness via HiFi-GAN

📦 Dependencies

⚠️ Device Compatibility

⚠️ Disclaimer

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🗣 iOS FastSpeech2 + HiFi-GAN

📦 Models Used

🈶 Japanese TTS

🇺🇸 English TTS

🔊 Naturalness via HiFi-GAN

📦 Dependencies

⚠️ Device Compatibility

⚠️ Disclaimer

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages