A suite of advanced multi-modal LLMs
Automatically translates the text of a video based on a subtitle file
Management of Yandex Station and other smart home devices
SoTA open-source TTS
Multi-lingual large voice generation model, providing inference
читание, чтение, говорение
Scalable generative AI framework built for researchers and developers
LLM-based Reinforcement Learning audio edit model
Bailing is a voice dialogue robot similar to GPT-4o
Generate audiobooks from e-books
Han Language Processing
Reading book source
Instant voice cloning by MIT and MyShell. Audio foundation model
VITS2 backbone with multilingual-bert
C++ inference library for multiple SVC/TTS
Unofficial (Golang) Go bindings for the Hugging Face Inference API
Official PyTorch Implementation
Anki flashcards on Android
Ito, smart dictation in every application
Chat & pretrained large audio language model proposed by Alibaba Cloud
Multi-modal large language model designed for audio understanding
Virtual AI anchor that combines state-of-the-art technology
Foundational Models for State-of-the-Art Speech and Text Translation
Conversational voice AI agents
Easy-to-use Speech Toolkit including Self-Supervised Learning model