Speech-AI-Forge is a project developed around TTS generation model
Robust Speech Recognition via Large-Scale Weak Supervision
Code for openai.fm, a demo for the OpenAI Speech API
PersonaPlex code
StreamSpeech is a seamless model for offline speech recognition
Use Microsoft Edge's online text-to-speech service from Python
A robust, efficient, low-latency speech-to-text library
A high-quality rapid TTS voice cloning model
Port of OpenAI's Whisper model in C/C++
Industrial-level controllable zero-shot text-to-speech system
Towards Human-Level Text-to-Speech through Style Diffusion
Audio foundation model excelling in audio understanding
End-to-end speech processing toolkit
Qwen3-ASR is an open-source series of ASR models
Qwen3-TTS is an open-source series of TTS models
TTS with kokoro and onnx runtime
Open-source multi-speaker long-form text-to-speech model
A TTS that fits in your CPU (and pocket)
Synchronized Translation for Videos
MARS5 speech model (TTS) from CAMB.AI
Towards Human-Sounding Speech
SOTA discrete acoustic codec models with 40/75 tokens per second
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Open source text-to-speech tool, supports extra-long text
Speakr is a personal, self-hosted web application