|
1 | 1 | # Audio & Speech |
2 | | -- Speech-Resources: [[Github:zh-cn](https://github.com/ddlBoJack/Speech-Resources)] |
3 | | -- Awesome-Speech-Pretraining: [[Github:zh-cn](https://github.com/ddlBoJack/Awesome-Speech-Pretraining)] |
| 2 | +- [Speech-Resources](https://github.com/ddlBoJack/Speech-Resources): 语音方向实验室/公司/资源/实习等,欢迎推荐或自荐 |
| 3 | +- [metame-ai/awesome-audio-plaza](https://github.com/metame-ai/awesome-audio-plaza): Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation |
| 4 | +- [SpeechTasks](https://github.com/WangHelin1997/SpeechTasks): This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent speech tool development, and speech applications. |
| 5 | +- [ai-audio-startups](https://github.com/csteinmetz1/ai-audio-startups): Community list of startups working with AI in audio and music technology |
| 6 | +- [speech_rankings](https://github.com/mutiann/speech_rankings): A CSRankings-like index for speech researchers |
| 7 | +- [INTERSPEECH-2023-Papers](https://github.com/DmitryRyumin/INTERSPEECH-2023-Papers): INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. |
| 8 | + |
| 9 | +## SSL |
| 10 | +- [Awesome-Speech-Pretraining](https://github.com/ddlBoJack/Awesome-Speech-Pretraining): Paper, Code and Statistics for Self-Supervised Learning and Pre-Training on Speech. |
| 11 | +- [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq): Facebook AI Research Sequence-to-Sequence Toolkit written in Python. |
| 12 | + |
| 13 | +## ASR |
| 14 | +- [kaldi](https://github.com/kaldi-asr/kaldi): Kaldi Speech Recognition Toolkit |
| 15 | +- next-gen kaldi |
| 16 | + - [k2-fsa/icefall](https://github.com/k2-fsa/icefall): The icefall project contains speech-related recipes for various datasets using k2-fsa and lhotse. |
| 17 | + - [lhotse-speech/lhotse](https://github.com/lhotse-speech/lhotse): Tools for handling speech data in machine learning projects. |
| 18 | +- [openai/whisper](https://github.com/openai/whisper): Robust Speech Recognition via Large-Scale Weak Supervision |
| 19 | +- [awesome-whisper](https://github.com/sindresorhus/awesome-whisper): Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI |
| 20 | + |
| 21 | +## Generation |
| 22 | +- [open-mmlab/Amphion](https://github.com/open-mmlab/Amphion): Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. |
| 23 | +- [facebookresearch/audiocraft](https://github.com/facebookresearch/audiocraft): Audiocraft is a library for audio processing and generation with deep learning. |
| 24 | +- [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo): NeMo: a framework for generative AI |
| 25 | + |
| 26 | +## Audio/Speech LLM |
| 27 | +- [QwenLM/Qwen-Audio](https://github.com/QwenLM/Qwen-Audio): The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud. |
| 28 | +- [awesome-large-audio-models](https://github.com/EmulationAI/awesome-large-audio-models): Collection of resources on the applications of Large Language Models (LLMs) in Audio AI. |
| 29 | +- [Large-Audio-Models](https://github.com/liusongxiang/Large-Audio-Models): Keep track of big models in audio domain, including speech, singing, music etc. |
| 30 | + |
| 31 | +## Dataset |
| 32 | +- [speech-datasets-collection](https://github.com/RevoSpeechTech/speech-datasets-collection): a curated list of speech datasets (110+ datasets, 75+ easy to download) |
| 33 | +- [ai-audio-datasets](https://github.com/Yuan-ManX/ai-audio-datasets): This is a list of datasets consisting of speech, music, and sound effects |
| 34 | +- [ULCA-asr-dataset-corpus](https://github.com/Open-Speech-EkStep/ULCA-asr-dataset-corpus): asr dataset corpus collection |
| 35 | +- [coqui-ai/open-speech-corpora](https://github.com/coqui-ai/open-speech-corpora): A list of accessible speech corpora for ASR, TTS, and other Speech Technologies |
| 36 | +- [voice_datasets](https://github.com/jim-schwoebel/voice_datasets): A comprehensive list of open-source datasets for voice and sound computing (95+ datasets). |
| 37 | +- [audio-datasets](https://github.com/DagsHub/audio-datasets): open-source audio datasets |
| 38 | +- [speech_dataset](https://github.com/double22a/speech_dataset): The dataset of Speech Recognition |
| 39 | +- [k2-fsa/libriheavy](https://github.com/k2-fsa/libriheavy): Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context |
| 40 | +- [facebookresearch/libri-light](https://github.com/facebookresearch/libri-light) |
0 commit comments