[go: up one dir, main page]

Showing 136 open source projects for "voice recognition"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Run your private office with the ONLYOFFICE Icon
    Run your private office with the ONLYOFFICE

    Secure office and productivity apps

    A Comprehensive Alternative to Office 365 for Business
    Learn More
  • 1
    TTS Voice Wizard

    TTS Voice Wizard

    Speech to Text to Speech, sends text as OSC messages

    Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!) You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods. You can send what you say as OSC messages to VRChat to be displayed on your avatar using KillFrenzyAvatarText or VRChats Chatbox. ...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 2
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. ...
    Downloads: 70 This Week
    Last Update:
    See Project
  • 4
    sherpa-onnx

    sherpa-onnx

    Speech-to-text, text-to-speech, and speaker recognition

    Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.
    Downloads: 216 This Week
    Last Update:
    See Project
  • The #1 AI-Powered eLearning Platform Icon
    The #1 AI-Powered eLearning Platform

    For users seeking a platform to generate online courses using AI

    Transform your content into engaging eLearning experiences with Coursebox, the #1 AI-powered eLearning authoring tool. Our platform automates the course creation process, allowing you to design a structured course in seconds. Simply make edits, add any missing elements, and your course is ready to go. Whether you want to publish privately, share publicly, sell your course, or export it to your LMS, Coursebox has you covered.
    Learn More
  • 5
    Vocode

    Vocode

    Build voice-based LLM agents. Modular + open source

    Vocode is an open source library that makes it easy to build voice-based LLM apps. Using Vocode, you can build real-time streaming conversations with LLMs and deploy them to phone calls, Zoom meetings, and more. You can also build personal assistants or apps like voice-based chess. Vocode provides easy abstractions and integrations so that everything you need is in a single library.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Fun Audio Chat

    Fun Audio Chat

    Large Audio Language Model built for natural interactions

    Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Parlant

    Parlant

    The behavior guidance framework for customer-facing LLM agents

    Parlant is a lightweight speech-to-text and text-to-speech framework designed for real-time AI-driven voice applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    VideoChat

    VideoChat

    Real-time voice interactive digital human

    VideoChat is a real-time voice-interactive “digital human” system that combines automatic speech recognition, large language models, text-to-speech, and talking-head generation into a single conversational pipeline. It supports both pure end-to-end voice solutions based on multimodal large language models (GLM-4-Voice feeding directly into talking-head generation) and a more traditional cascaded pipeline using ASR → LLM → TTS → talking head.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Bolna

    Bolna

    Conversational voice AI agents

    Bolna is an end-to-end open-source platform for building conversational voice AI agents, enabling developers to create voice-first conversational assistants efficiently.
    Downloads: 1 This Week
    Last Update:
    See Project
  • All-in-one solution to control corporate spending Icon
    All-in-one solution to control corporate spending

    Issuance in seconds. Full spending control. Perfect for media buying.

    Wallester Business is a leading world-class solution to optimize your company’s financial processes! Issuing virtual and physical corporate expense cards with an IBAN account, expense monitoring, limit regulation, convenient accounting, subscription control — manage your finance on all-in-one platform in real time! Wallester Business benefits your business growth!
    Learn More
  • 10
    chatgpt-on-wechat

    chatgpt-on-wechat

    A chatbot built based on a large model

    chatgpt-on-wechat turns your WeChat client (including personal accounts) into an intelligent chatbot powered by large language models like ChatGPT, enabling automated replies, context-aware conversations, and media handling directly inside chats. It receives text and voice messages from private and group chats, forwards them to an AI model using official APIs, and returns replies that feel natural and contextually relevant, creating more engaging interactions without manual typing. Beyond simple text, the bot supports voice recognition and automatic voice or text responses, image generation based on descriptions, and independent memory of multi-turn conversations per user or group. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Alan AI

    Alan AI

    In-App assistant SDK to build a multimodal conversational UX websites

    ...A powerful web-based IDE where you can write, test and debug dialog scenarios for your voice assistant or chatbot. Alan's AI-backend powered by the industry’s best Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) and Speech Synthesis. The Alan Cloud provisions and handles the infrastructure required to maintain your voice deployments and perform all the voice processing tasks. To voice enable your app, you only need to get the Alan Client SDK and drop it to your app. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    ElatoAI

    ElatoAI

    Realtime AI Voice Agents with SoTA Multimodal AI models on Arduino ESP

    ElatoAI is a real-time AI voice agent platform built around IoT hardware (ESP32) that enables continuous speech-to-speech conversations using state-of-the-art multimodal voice models with minimal latency and global performance via edge computing. The system integrates voice synthesis and recognition by connecting an ESP32 device through secure WebSockets to edge server functions written in Deno, allowing users to speak naturally with AI agents hosted through cloud APIs including OpenAI’s Realtime API, Gemini’s Live API, xAI’s Grok Voice Agent API, and others. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    TEN Framework

    TEN Framework

    TEN, a voice agent framework to create conversational AI.

    TEN (Transformative Extensions Network) is a voice agent framework for creating conversational AI applications, focusing on high performance and modularity.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Leku

    Leku

    Map location picker component for Android

    ...Component library for Android that uses Google Maps and returns a latitude, longitude and an address based on the location picked with the Activity provided. Note that you have the voice_search_extra_language that is used for the language of the voice recognition. Replace it with the allowed voice recognition locale for your language. We encourage you to add these languages to this component, please fork this project and submit new languages with a PR. It's possible to hide or show some of the information shown after selecting a location. Using the bundle parameter LocationPickerActivity. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Alan AI for Android

    Alan AI for Android

    Assistant SDK to build a multimodal conversational UX for Android

    ...A powerful web-based IDE where you can write, test and debug dialog scenarios for your voice assistant or chatbot. Alan's AI-backend powered by the industry’s best Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) and Speech Synthesis. The Alan Cloud provisions and handles the infrastructure required to maintain your voice deployments and perform all the voice processing tasks. Voice enable your app, you only need to get the Alan Client SDK and drop it into your app. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    ...To further refine accuracy and responsiveness, Handy integrates Silero’s Voice Activity Detection (VAD) for silence filtering, ensuring only speech segments are processed.
    Downloads: 48 This Week
    Last Update:
    See Project
  • 18
    Alan AI for iOS

    Alan AI for iOS

    In-App assistant SDK to build a multimodal conversational UX for iOS

    ...A powerful web-based IDE where you can write, test and debug dialog scenarios for your voice assistant or chatbot. Alan's AI-backend powered by the industry’s best Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) and Speech Synthesis. The Alan Cloud provisions and handles the infrastructure required to maintain your voice deployments and perform all the voice processing tasks. Voice enable your app, you only need to get the Alan Client SDK and drop it into your app. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    XiaoZhi AI Chatbot

    XiaoZhi AI Chatbot

    Build your own AI friend

    xiaozhi-esp32 is an open-source project that guides users in building their own AI-powered conversational companion using the ESP32 microcontroller. The project provides detailed instructions on assembling the hardware, setting up the software, and integrating AI models to enable natural language interactions. This DIY approach offers an accessible entry point into AI and hardware development.
    Downloads: 239 This Week
    Last Update:
    See Project
  • 20
    KrillinAI

    KrillinAI

    Video translation and dubbing tool powered by LLMs

    KrillinAI is an end-to-end content localization, translation, and dubbing tool aimed at helping creators transform videos into multiple languages with minimal manual effort. It integrates several stages of the pipeline: video acquisition (either from local files or remote via download tools), speech recognition (ASR), subtitle segmentation and alignment, machine translation (with context-aware translation to preserve semantics), and voice cloning + text-to-speech (TTS) to produce dubbed audio tracks. KrillinAI supports both landscape and portrait videos, which makes it suitable for a wide range of platforms — from YouTube to TikTok or other vertical-video sites — and ensures correct formatting and layout for the final video. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 21
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Gemini Next Chat

    Gemini Next Chat

    Deploy your private Gemini application for free with one click

    ...It is built with Next.js/TypeScript and targets developers and hobbyists who want a self-hosted solution for interacting with advanced multimodal models (text, image, voice). It supports features like image recognition, voice-based conversation, plugins (web search, ArXiv search, weather, etc.), and client apps (tray app) for greater convenience. The project emphasizes “one-click” deployment, aiming to make it easy to spin up a custom chat front end without deep infra-setup. It’s licensed under MIT and has an active community of contributors; documentation and release notes note support for newer features like mixed image+text generation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Voxal voice changer

    Voxal voice changer

    Transform your voice in real-time voxal voice changer

    Voxal Voice Changer is a program that allows you to modify your voice by applying various effects (e.g. pitch change, echo, etc.) in real-time. Effects can be added in any sequence and in any combination, allowing you to distort your voice beyond recognition. Take your audio to the next level! Our powerful Voice Changer software lets you morph your voice in real-time with stunning AI-powered quality.
    Leader badge">
    Downloads: 19 This Week
    Last Update:
    See Project
  • 24
    Lumo iPhone App

    Lumo iPhone App

    iOS application for Lumo

    ...Built with SwiftUI, the iOS app wraps a secure web-powered interface and communicates with the Lumo service in a way that ensures zero-access encryption, meaning even Proton cannot read or log user chats—only the device holder can decrypt them. It includes native features like on-device voice recording and speech recognition, flexible navigation, and payment integration for subscription plans if users choose expanded capabilities. The app’s architecture uses a combination of WebView and JavaScript bridges to power responsive chat UI while retaining strong data protection principles.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Textream

    Textream

    Textream is a free macOS teleprompter app for streamers, interviewers

    Textream is an open-source, free macOS teleprompter application designed for streamers, podcasters, presenters, and interviewers who want a smooth, distraction-free way to stay on script. It runs natively on macOS and leverages on-device speech recognition to highlight each word in real time as you speak, keeping your focus where it belongs — on delivery rather than memorization. The interface supports multiple modes of use, such as classic constant-scroll auto-scrolling, voice-activated scrolling that pauses when you’re silent, and direct word tracking that syncs the displayed script to your spoken pace. ...
    Downloads: 2 This Week
    Last Update:
    See Project