voice to text free download

Showing 293 open source projects for "voice to text"

View related business solutions

Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
Activate Self-Service Analytics from Your Data Warehouse | Kubit
Unlock the Power of Your Data

Kubit delivers enterprise-grade customer journey analytics—directly from your existing data warehouse. Enable every team with self-service insights, driving faster, data-driven decisions with minimal engineering overhead or data movement.

Learn More
1

Voice-Pro

Comprehensive Gradio WebUI for audio processing

Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.

1 Review

Downloads: 38 This Week

Last Update: 2025-12-05
See Project
2

clone-voice

A sound cloning tool with a web interface, using your voice

Clone-voice is a local voice-cloning tool that lets you synthesize speech in any target voice or convert one recording into another voice using the same timbre. It is built around Coqui’s XTTS-v2 model, so it inherits multilingual support and modern neural TTS quality while wrapping it in a user-friendly desktop workflow. The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control...

Downloads: 10 This Week

Last Update: 2025-11-28
See Project
3

TTS Voice Wizard

Speech to Text to Speech, sends text as OSC messages

Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!) You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods.

Downloads: 18 This Week

Last Update: 2025-11-02
See Project
4

GLM-4-Voice

GLM-4-Voice | End-to-End Chinese-English Conversational Model

GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. ...

Downloads: 2 This Week

Last Update: 2 days ago
See Project
ClouDNS provides fast and secure Free DNS Hosting and Premium DNS Hosting with Global Anycast DNS Network.
Forever Free DNS Hosting with Dynamic DNS

ClouDNS is the biggest European provider of globally Managed DNS services, including GeoDNS, Anycast DNS and DDoS protected DNS. We also offer Domain names, Monitoring services, SSL Certificates, and Google Workspace.

Learn More
5

Real-Time Voice Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder.

Downloads: 7 This Week

Last Update: 2025-12-15
See Project
6

OpenAI.fm

Code for openai.fm, a demo for the OpenAI Speech API

OpenAI.fm is an official interactive demo application built to showcase the OpenAI Speech API and its advanced text-to-speech capabilities, providing developers and creators with a hands-on web interface to convert text into high-quality, customizable audio using state-of-the-art TTS models. Developed using Next.js and the OpenAI Speech API, this demo illustrates how the latest neural voice models can produce natural, expressive speech with adjustable styles and voices, highlighting features like emotional range, tone, and real-time playback. ...

Downloads: 185 This Week

Last Update: 2026-01-28
See Project
7

Parlant

The behavior guidance framework for customer-facing LLM agents

Parlant is a lightweight speech-to-text and text-to-speech framework designed for real-time AI-driven voice applications.

Downloads: 2 This Week

Last Update: 5 days ago
See Project
8

Applio

A simple, high-quality voice conversion tool focused on ease of use

Applio is a high-quality voice conversion toolkit designed to make modern RVC/VITS-based voice cloning accessible to non-experts. It focuses strongly on ease of use: installation scripts for Windows, Linux, and macOS set up dependencies and then launch a browser-based Gradio interface. Within that interface, users can train and run voice conversion models for tasks like singing conversion, speech-to-speech transformation, and voice cloning. The project is structured to be flexible through...

Downloads: 39 This Week

Last Update: 2026-01-20
See Project
9

Voicebox

The open-source voice synthesis studio powered by Qwen3-TTS

Voicebox is a local-first voice synthesis studio that aims to bring professional, DAW-like voice generation workflows to a desktop app while keeping models and voice data entirely on your machine. It positions itself as an open-source alternative to cloud voice platforms by emphasizing privacy, offline use, and freedom from subscriptions or usage caps. The tool supports downloading voice models, cloning voices from short audio samples, and generating speech locally, then organizing the...

Downloads: 8 This Week

Last Update: 4 days ago
See Project
Atera IT Autopilot
Ensure operational efficiency at any scale with 24/7 autonomous IT support.

IT Autopilot takes the pressure off your team by handling first-tier support across the channels your end users already live in — email, chat, Slack, Teams, and your Customer Portal. It doesn’t just respond to end-user queries, issues, and crises — it solves them.

Learn More
10

LuxTTS

A high-quality rapid TTS voice cloning model

LuxTTS is an open-source text-to-speech (TTS) system focused on delivering high-quality, rapid voice synthesis and voice cloning that runs extremely fast and efficiently on consumer hardware. It implements a lightweight architecture based on ZipVoice and optimized sampling techniques so that it can generate speech at speeds up to roughly 150 times real-time on a single GPU and faster than real-time on CPU, all while producing audio at high fidelity with 48 kHz quality. ...

Downloads: 7 This Week

Last Update: 2026-01-28
See Project
11

RealtimeSTT

A robust, efficient, low-latency speech-to-text library

RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.

Downloads: 4 This Week

Last Update: 2025-07-03
See Project
12

IndexTTS2

Industrial-level controllable zero-shot text-to-speech system

IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output.

Downloads: 8 This Week

Last Update: 2025-11-27
See Project
13

Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models

Qwen3-TTS is an open-source text-to-speech (TTS) project built around the Qwen3 large language model family, focused on generating high-quality, natural-sounding speech from plain text input. It provides researchers and developers with tools to transform text into expressive, intelligible audio, supporting multiple languages and voice characteristics tuned for clarity and fluidity.

Downloads: 5 This Week

Last Update: 2026-01-26
See Project
14

OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model

OpenVoice is a versatile instant voice cloning system that can replicate a speaker’s tone color from just a short audio clip and then generate speech in multiple languages. It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak...

Downloads: 8 This Week

Last Update: 2025-11-28
See Project
15

ebook2audiobook

Generate audiobooks from e-books, voice cloning & 1107+ languages

ebook2audiobook is a tool to convert legally obtained eBooks (non-DRM) into fully narrated audiobooks, complete with chapters and metadata. It automates the pipeline: it reads the eBook file, splits it into appropriate segments (chapters, paragraphs), uses text-to-speech (TTS) models to synthesize audio, optionally applies voice cloning, and outputs a final audiobook — ideal for people who prefer listening over reading, or for accessibility purposes. The tool supports a wide array of underlying TTS backends (XTTSv2, Bark, VITS, Fairseq, Tacotron2, YourTTS and more), which gives flexibility depending on hardware availability, voice preference, and language. ...

Downloads: 45 This Week

Last Update: 2026-02-02
See Project
16

VibeVoice ComfyUI

ComfyUI integration for Microsoft's VibeVoice text-to-speech model

VibeVoice ComfyUI is a comprehensive wrapper that integrates Microsoft’s VibeVoice text-to-speech models directly into ComfyUI workflows. It exposes VibeVoice as a set of custom nodes so you can build single-speaker and multi-speaker voice generation pipelines visually, combining TTS with other audio or generative components. The integration supports high-quality single-speaker synthesis as well as scripted multi-speaker conversations, with optional voice cloning from audio samples for each speaker. ...

Downloads: 3 This Week

Last Update: 2025-11-28
See Project
17

PersonaPlex

PersonaPlex code

...PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.

Downloads: 12 This Week

Last Update: 6 days ago
See Project
18

Sopro TTS

A lightweight text-to-speech model with zero-shot voice cloning

Sopro TTS is an open-source text-to-speech (TTS) project that implements a lightweight model capable of producing speech from text with zero-shot voice cloning, meaning it can mimic a speaker’s voice from only a few seconds of reference audio. Built with a 169 million-parameter architecture that uses dilated convolutions and cross-attention layers instead of large Transformer stacks, it achieves relatively fast real-time performance even on CPUs (about a 0.25 real-time factor measured on an M3 base). ...

Downloads: 0 This Week

Last Update: 2026-02-06
See Project
19

ElevenLabs Python

The official Python SDK for the ElevenLabs API

elevenlabs-python is the official Python SDK for the ElevenLabs API, giving developers a convenient way to access ElevenLabs’ high-quality, lifelike voices. The library wraps the HTTP API into a typed Python client, so you can perform text-to-speech, streaming, voice cloning, voice management, and agents-related operations with simple method calls. It exposes ElevenLabs’ main models such as Eleven Multilingual v2, Eleven Flash v2.5, and Eleven Turbo v2.5, each targeting different trade-offs between latency, cost, and quality. The SDK is designed for quick setup: after installing the package and setting an API key, you can generate speech in multiple languages and play or process the resulting audio bytes. ...

Downloads: 4 This Week

Last Update: 5 days ago
See Project
20

VideoChat

Real-time voice interactive digital human

VideoChat is a real-time voice-interactive “digital human” system that combines automatic speech recognition, large language models, text-to-speech, and talking-head generation into a single conversational pipeline. It supports both pure end-to-end voice solutions based on multimodal large language models (GLM-4-Voice feeding directly into talking-head generation) and a more traditional cascaded pipeline using ASR → LLM → TTS → talking head.

Downloads: 0 This Week

Last Update: 2025-12-18
See Project
21

sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.

Downloads: 216 This Week

Last Update: 6 hours ago
See Project
22

Spark TTS

Spark-TTS Inference Code

...The project supports zero-shot voice cloning, meaning it can imitate a new speaker’s voice without dedicated training for that specific voice, and works across languages, including English and Chinese, even in cross-lingual code-switching scenarios. Spark-TTS allows users to control speech characteristics like gender, pitch, and speaking rate to customize synthesized output and support virtual speaker creation.

Downloads: 3 This Week

Last Update: 2026-02-04
See Project
23

MetaVoice-1B

Foundational model for human-like, expressive TTS

MetaVoice — in the form of its source repository “metavoice-src” — is a large-scale text-to-speech (TTS) model. Specifically, the base model (MetaVoice-1B) uses around 1.2 billion parameters and has been trained on a massive dataset — reportedly around 100,000 hours of speech data. The goal is to provide human-like, expressive, and flexible TTS: able to generate natural-sounding speech that can handle diverse inputs and likely generalize over voice styles, intonation, prosody, and perhaps multiple languages or accents. ...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
24

EasyVoice

Open source text-to-speech tool, supports extra-long text

easyVoice is an open-source text-to-speech platform aimed at turning long-form text and novels into high-quality audio, with a strong focus on usability and scalability. It provides a web interface where users can paste or upload large texts and generate speech and subtitles in a single workflow, even for works exceeding 100,000 characters. The system supports multi-role voice acting, letting users assign different neural voices to different characters or narrative roles and configure parameters such as rate, pitch, and volume per role. ...

Downloads: 4 This Week

Last Update: 2026-01-26
See Project
25

Handy STT

A free, open source, and extensible speech-to-text application

Handy is a free, open-source, offline speech-to-text application built for privacy, accessibility, and extensibility. Developed using Tauri (Rust + React/TypeScript), it runs natively across Windows, macOS, and Linux while performing local speech recognition without sending any audio to cloud servers. Handy allows users to start transcription instantly using a configurable keyboard shortcut—press to record, release to transcribe—and automatically pastes the resulting text into any active text field. ...

Downloads: 48 This Week

Last Update: 3 days ago
See Project