Vosk is an offline open source speech recognition toolkit. It enables speech recognition for 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi, Czech, Polish. More to come. Vosk models are small (50 Mb) but provide continuous large vocabulary transcription, zero-latency response with streaming API, reconfigurable vocabulary and speaker identification. Speech recognition bindings are implemented for various programming languages like Python, Java, Node.JS, C#, C++, Rust, Go and others. Vosk supplies speech recognition for chatbots, smart home appliances, and virtual assistants. It can also create subtitles for movies, and transcription for lectures and interviews. Vosk scales from small devices like Raspberry Pi or Android smartphones to big clusters.
Features
- Supports 20+ languages and dialects
- Works offline, even on lightweight devices
- Installs with simple pip3 install vosk
- Portable per-language models are only 50Mb each
- Provides streaming API for the best user experience
- Allows quick reconfiguration of vocabulary for best accuracy