[go: up one dir, main page]

Showing 293 open source projects for "voice to text"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Unimus makes Network Automation and Configuration Management easy. Icon
    Unimus makes Network Automation and Configuration Management easy.

    Deploying Unimus to manage your entire network requires only minutes, allowing for rapid deployment without headaches.

    We aim to make automation, disaster recovery, change management and configuration auditing painless and affordable for a network of any size.
    Learn More
  • 1
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    Handy is a free, open-source, offline speech-to-text application built for privacy, accessibility, and extensibility. Developed using Tauri (Rust + React/TypeScript), it runs natively across Windows, macOS, and Linux while performing local speech recognition without sending any audio to cloud servers. Handy allows users to start transcription instantly using a configurable keyboard shortcut—press to record, release to transcribe—and automatically pastes the resulting text into any active text field. ...
    Downloads: 48 This Week
    Last Update:
    See Project
  • 2
    chatgpt-on-wechat

    chatgpt-on-wechat

    A chatbot built based on a large model

    chatgpt-on-wechat turns your WeChat client (including personal accounts) into an intelligent chatbot powered by large language models like ChatGPT, enabling automated replies, context-aware conversations, and media handling directly inside chats. It receives text and voice messages from private and group chats, forwards them to an AI model using official APIs, and returns replies that feel natural and contextually relevant, creating more engaging interactions without manual typing. Beyond simple text, the bot supports voice recognition and automatic voice or text responses, image generation based on descriptions, and independent memory of multi-turn conversations per user or group. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    OpenAI-Compatible Edge-TTS API

    OpenAI-Compatible Edge-TTS API

    Free, high-quality text-to-speech API endpoint to replace OpenAI

    OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to equivalent Edge voices. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Bolna

    Bolna

    Conversational voice AI agents

    Bolna is an end-to-end open-source platform for building conversational voice AI agents, enabling developers to create voice-first conversational assistants efficiently.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Create stunning, professional email signatures in minutes Icon
    Create stunning, professional email signatures in minutes

    For companies looking to create, assign and manage all their employees email signatures and add targeted marketing banners.

    Create, assign and manage all your employees’ email signatures and add targeted marketing banners. Stop getting worked up about your signatures! Leverage a centralized interface to easily create and manage the email signatures of all your employees. Take advantage of each email to broadcast and amplify your brand. Letsignit helps you regain control over your digital identity. Harmonize 100% of your employee’s email signatures in just a few clicks! 121 professional emails are received and 40 are sent every day by an employee. With Letsignit, turn every email into a powerful communication opportunity: send the right message to the right person at the right time! Innovative more than tech, inspiring more than following. Authentic more than overrated, close more than "think big", trustworthy more than doubtful. Hands-on more than complex, available but yet premium, fun but yet expert.
    Learn More
  • 5
    KrillinAI

    KrillinAI

    Video translation and dubbing tool powered by LLMs

    ...It integrates several stages of the pipeline: video acquisition (either from local files or remote via download tools), speech recognition (ASR), subtitle segmentation and alignment, machine translation (with context-aware translation to preserve semantics), and voice cloning + text-to-speech (TTS) to produce dubbed audio tracks. KrillinAI supports both landscape and portrait videos, which makes it suitable for a wide range of platforms — from YouTube to TikTok or other vertical-video sites — and ensures correct formatting and layout for the final video. The tool offers “one-click” workflows and desktop versions, lowering the barrier for users who may not be familiar with video editing or audio processing pipelines.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Translate-Subtitle-File

    Translate-Subtitle-File

    Subtitle Creation Assistant

    Subtitle group machine translation assistant - [Function 1: Translate subtitle file] .srt .ass .vtt [Function 2: Voice to text] (Drag in video or audio to recognize subtitles) (The latest version v4.1.0 Update time 2021 2 May 23) 12 translation service providers can be configured, such as Google, Baidu, Tencent, Caiyun, IBM, Azure, Amazon, etc. (6 voice service providers can be configured: Alibaba Cloud, Xunfei, Tencent Cloud, IBM, Azure, Amazon ) Advantages: 1. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    Speakr

    Speakr

    Speakr is a personal, self-hosted web application

    Speakr is an open-source, real-time text-to-speech (TTS) web application that allows users to convert written text into natural-sounding speech in just a few clicks. It provides a clean, user-friendly interface where users can input text, choose a voice style or language, and immediately hear the output, making it ideal for accessibility, content creation, and learning applications.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Workspace management made easy, fast and affordable. Icon
    Workspace management made easy, fast and affordable.

    For companies searching for a desk booking software for safe and flexible working

    The way we work has changed and Clearooms puts you in complete control of your hybrid workspace. Both meeting rooms and hot desk booking can be easily managed to ensure flexible and safe working, however big or small your organisation.
    Learn More
  • 10

    VOIP-VOICE-TO-TEXT&ANALYS

    Convert VoIP calls to text and analyze them with AI

    The VoIP voice-to-text software for Issabel is an intelligent, AI-based solution that converts calls into accurate Persian text. After each call, the audio file is sent to the GPT-4O AI engine, producing editable transcripts. The software also provides AI-powered call analysis, extracting key points, customer requests, satisfaction levels, and sensitive topics, all stored in the database.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 11
    DiscordBotClient

    DiscordBotClient

    A patched version of discord, with bot login support

    A patched version of Discord, with bot login support. Discord Bot Client allows you to use your bot, just like any other user account, except for Friends and Groups.
    Downloads: 103 This Week
    Last Update:
    See Project
  • 12
    FastRTC

    FastRTC

    The python library for real-time communication

    ...It abstracts away much of the complexity that typically comes with implementing WebRTC by providing a simple interface — e.g. a Stream class — that can be mounted within a web backend (for example a FastAPI application). This makes it particularly well suited for building real-time voice (or video) interfaces for applications such as AI assistants, live chat, or collaborative audio/video tools. FastRTC also integrates nicely with UI frameworks (e.g. via a web demo using Gradio), so developers can rapidly prototype and deploy real-time streaming applications without deep knowledge of low-level WebRTC internals. Because voice-enabled AI agents often involve many moving parts (speech-to-text, text processing, text-to-speech, streaming, session/chat management), FastRTC helps by handling the streaming aspect, leaving the rest to be plugged in modularly.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    CosyVoice

    CosyVoice

    Multi-lingual large voice generation model, providing inference

    CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    MeloTTS

    MeloTTS

    High-quality multi-lingual text-to-speech library by MyShell.ai

    MeloTTS is an open-source text-to-speech (TTS) system that generates natural-sounding speech from text input. It utilizes advanced machine-learning models to produce high-quality audio outputs.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    Signal Desktop

    Signal Desktop

    Private messenger for Windows, Mac, and Linux

    ...We can't read your messages or listen to your calls, and no one else can either. Privacy isn’t an optional mode, it’s just the way that Signal works. Every message, every call, every time. Share text, voice messages, photos, videos, GIFs and files for free. Signal uses your phone's data connection so you can avoid SMS and MMS fees. Make crystal-clear voice and video calls to people who live across town, or across the ocean, with no long-distance charges. Add a new layer of expression to your conversations with encrypted stickers. ...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 18
    Dicio assistant

    Dicio assistant

    Dicio assistant app for Android

    Dicio is a free and open source voice assistant for Android that focuses on strong privacy by running its understanding and response generation directly on the device whenever possible. It supports multiple input and output methods, including hotword-based voice input using the Vosk speech-to-text engine and a graphical interface for users who prefer to tap instead of talk.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    MiniMax-MCP

    MiniMax-MCP

    Official MiniMax Model Context Protocol (MCP) server

    MiniMax-MCP is the official Model Context Protocol (MCP) server for accessing MiniMax’s multimodal generative APIs from MCP-compatible clients. It acts as a bridge between tools like Claude Desktop, Cursor, Windsurf, OpenAI Agents, and the MiniMax platform, exposing capabilities such as text-to-speech, voice cloning, image generation, text-to-image, video generation, image-to-video, text-to-video, and music generation. The server is written in Python and distributed under the MIT license, with a pyproject.toml and uv-based workflow that makes installation and execution reproducible. Configuration is handled through JSON files that tell MCP clients how to launch the server (typically via uvx minimax-mcp) and which environment variables to use for the API key, host, and output directory. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    MegaTTS 3

    MegaTTS 3

    Official PyTorch Implementation

    MegaTTS3 is an open-source text-to-speech (TTS) and voice-cloning system from ByteDance that aims to deliver high-quality, expressive speech synthesis, including zero-shot voice cloning of previously unseen speakers. Its backbone is a lightweight diffusion-transformer (on the order of ~0.45 B parameters), which enables efficient inference while still producing high-fidelity audio.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance without task-specific fine‐tuning. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance. The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    AI Runner

    AI Runner

    Offline inference engine for art, real-time voice conversations

    AI Runner is an offline inference engine designed to run a collection of AI workloads on your own machine, including image generation for art, real-time voice conversations, LLM-powered chatbots and automated workflows. It is implemented as a desktop-oriented Python application and emphasizes privacy and self-hosting, allowing users to work with text-to-speech, speech-to-text, text-to-image and multimodal models without sending data to external services. At the core of its LLM stack is a mode-based architecture with specialized “modes” such as Author, Code, Research, QA and General, and a workflow manager that automatically routes user requests to the right agent based on the task. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 24
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    pyttsx3 is an offline text-to-speech library for Python that wraps native speech engines instead of calling cloud APIs. It is designed to work entirely without an internet connection, making it suitable for local automation, kiosks, accessibility tools, and embedded applications. On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 25
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. ...
    Downloads: 23 This Week
    Last Update:
    See Project