[go: up one dir, main page]

Speech to Text Software for Windows

View 45 business solutions

Browse free open source Speech to Text software and projects for Windows below. Use the toggles on the left to filter open source Speech to Text software by OS, license, language, programming language, and project status.

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • The Leading All-In-One Vacation Rental Software for Property Managers Icon
    The Leading All-In-One Vacation Rental Software for Property Managers

    Hostaway helps you grow your property management business by automating and streamlining every aspect of your business

    The dashboard and mobile app allows users to manage their marketing, sales, accounting, reporting, payment and communication needs all in one place. As premium partners of channels such as VRBO, Booking.com, Airbnb, Homeaway and Expedia, with the ability to manage advanced setups, no other platform gives you the type of control and peace of mind that a Hostaway user has. The software is designed with teams in mind - it's easy to train staff and keep them happy while improving business at the same time! Hostaway also provides a booking engine, wordpress website and both marketing and sales tools for managing your valuable direct bookings.
    Learn More
  • 1
    sherpa-onnx

    sherpa-onnx

    Speech-to-text, text-to-speech, and speaker recognition

    Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.
    Downloads: 216 This Week
    Last Update:
    See Project
  • 2
    Buzz

    Buzz

    Transcribe and translate audio offline on your personal computer

    Buzz transcribes and translates audio to text offline using OpenAI's Whisper. Import audio and video files into Buzz and export them as TXT, SRT, or VTT files. Buzz supports Whisper, Whisper.cpp, Faster Whisper, Whisper-compatible models from the Hugging Face repository, and the OpenAI Whisper API. Get linux versions from: - https://flathub.org/apps/io.github.chidiwilliams.Buzz - https://snapcraft.io/buzz Home page of Buzz https://github.com/chidiwilliams/buzz Note for Windows: App is not signed, you will get a warning when you install it. Select More info -> Run anyway.
    Leader badge">
    Downloads: 3,922 This Week
    Last Update:
    See Project
  • 3
    CMU Sphinx

    CMU Sphinx

    Speech Recognition Toolkit

    Thank you for visiting! ----> Maintenance and improvement work has MOVED to https://cmusphinx.github.io/ Please go there for the most recent software and documentation. <---- CMUSphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition systems.
    Leader badge">
    Downloads: 514 This Week
    Last Update:
    See Project
  • 4
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
    Downloads: 70 This Week
    Last Update:
    See Project
  • Free Website Monitoring Service | UptimeRobot Icon
    Free Website Monitoring Service | UptimeRobot

    The free online uptime monitoring service with an App is available for iOS and Android.

    With the Free Plan, you can monitor up to 50 URLs, check for a website's content (using the keyword monitor), ping your server or monitor your ports in 5-minute intervals. You can create a status page to showcase your uptime. SMS or Call alerts can be bought anytime.
    Learn More
  • 5
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    Handy is a free, open-source, offline speech-to-text application built for privacy, accessibility, and extensibility. Developed using Tauri (Rust + React/TypeScript), it runs natively across Windows, macOS, and Linux while performing local speech recognition without sending any audio to cloud servers. Handy allows users to start transcription instantly using a configurable keyboard shortcut—press to record, release to transcribe—and automatically pastes the resulting text into any active text field. Its backend leverages OpenAI’s Whisper models for GPU-accelerated speech recognition and Parakeet V3 for efficient CPU-only transcription with automatic language detection. To further refine accuracy and responsiveness, Handy integrates Silero’s Voice Activity Detection (VAD) for silence filtering, ensuring only speech segments are processed.
    Downloads: 48 This Week
    Last Update:
    See Project
  • 6
    DeepSpeech

    DeepSpeech

    Open source embedded speech-to-text engine

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. A pre-trained English model is available for use and can be downloaded following the instructions in the usage docs. If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech releases page.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 7
    Vibe

    Vibe

    Transcribe on your own

    Vibe is an open-source project by thewh1teagle designed to deliver a collaborative and interactive social application experience, though its specifics depend on its evolving community scope; its development often focuses on connecting users through dynamic features that can include chat, shared spaces, and immersive interactions. The repository typically includes backend logic, frontend integration, and real-time communication stacks to support live user engagement, performance optimizations, and modular features that adapt to social workflows. Because open-source social platforms benefit from transparency and community contribution, Vibe’s codebase allows developers to experiment with new social features, customize existing components, and build integrations with popular services for authentication, media sharing, and notifications. Projects like Vibe often emphasize scalability, responsive design, and extensibility so that communities of users can grow without major rewrites.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 8
    Whishper

    Whishper

    Transcribe any audio to text, translate and edit subtitles 100% locall

    Open-source, local-first audio transcription and subtitling suite with a simple web UI. Thanks to open-source technologies, Whishper can run 100% offline. Your data never leaves your computer. Whishper allows you to translate your transcriptions to and from more than 60 languages thanks to Argos Translate and LibreTranslate. Download the transcriptions in many formats (json, txt, vtt, srt). Easily edit your subtitles right in the Web-UI.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 9
    TTS Voice Wizard

    TTS Voice Wizard

    Speech to Text to Speech, sends text as OSC messages

    Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!) You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods. You can send what you say as OSC messages to VRChat to be displayed on your avatar using KillFrenzyAvatarText or VRChats Chatbox. The app can translate your speech from one language to over 20 other support languages. There are 100+ different voices with various customization options so you can pick a voice that best suits you. Display the current song you are listening to on Spotify or via your browser. Display tracker and controller battery life in conjunction with XSOverlay. Use in conjunction with HRtoVRChat_OSC to enable you to display your heartrate in VRChat's Chatbox.
    Downloads: 18 This Week
    Last Update:
    See Project
  • Supercharge Your Manufacturing with Easy MRP and MES Software Icon
    Supercharge Your Manufacturing with Easy MRP and MES Software

    Designed for SME manufacturers who want to reduce wasteful manual processing, save time and increase profits.

    Flowlens eliminates stock-outs, shortage and overstocks, avoiding costly production delays. Stay in control of inventory levels and keep production running smoothly with real-time visibility and easy-to-use stock management. Import bulk data with ease.
    Learn More
  • 10
    Translate-Subtitle-File

    Translate-Subtitle-File

    Subtitle Creation Assistant

    Subtitle group machine translation assistant - [Function 1: Translate subtitle file] .srt .ass .vtt [Function 2: Voice to text] (Drag in video or audio to recognize subtitles) (The latest version v4.1.0 Update time 2021 2 May 23) 12 translation service providers can be configured, such as Google, Baidu, Tencent, Caiyun, IBM, Azure, Amazon, etc. (6 voice service providers can be configured: Alibaba Cloud, Xunfei, Tencent Cloud, IBM, Azure, Amazon ) Advantages: 1. You can use multiple service providers, 2. You can configure your own API Key to use your own account's free quota, such as Tencent's free translation quota of 5 million characters per month, IBM's 500-minute speech-to-text free quota (tern. best The domain name has expired and I don't want to renew it.) Azure speech-to-text and DeepL free version have problems, it is normal to not use it, please wait for the next version to fix. Machine translation of subtitle files, use machine translation to process files.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    RealtimeSTT

    RealtimeSTT

    A robust, efficient, low-latency speech-to-text library

    RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Scribe

    Scribe

    Free, open-source, and offline speech-to-text & voice control app.

    > Scribe is a free and open-source desktop assistant that brings powerful speech-to-text and voice control capabilities directly to your PC. It allows you to dictate text into any application, create custom voice commands, launch programs, and automate your workflow with text replacements. > Designed with privacy as a top priority, Scribe works completely offline. Your voice data never leaves your computer. Powered by the Vosk engine, it supports multiple languages and provides high-quality recognition without an internet connection. > Scribe is the perfect tool for anyone looking to boost productivity, improve accessibility, or simply interact with their computer in a new, hands-free way.
    Downloads: 89 This Week
    Last Update:
    See Project
  • 13

    SoundTranscriber

    SoundTranscriber can be used to generate automatic transcription / aut

    SoundTranscriber can be used to generate automatic transcription / aut
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    Whisper Batch Transcriber

    Whisper Batch Transcriber

    Unlimited, private and free Speech-To-Text program

    ## About: Automatically transcribe all of your voice recordings into clean, organized, neat text files. It's free, fully automated, unlimited, using state-of-the-art speech-to-text technology. Works 100% offline on your computer, privately and locally. ## Usecases: Convert speeches, podcasts, webinars, monologues, storytellings and other audio speech into a formatted .txt file. One sentence per new line. ## Notes: - Its 2GB in size and requires 2-6GB of GPU VRAM too. (basically you need atleast a mid-range gaming PC to use this.) - Its fairly slow to start (10min) and transcribe, this is normal behavior. - Includes a python installer to install Python on your computer so you can directly run the 'whisper_transcriber.py' file like you would an .exe by double-clicking it. (I did this because compiling to exe made it slower) - I made it as easy as possible for a layperson to use, so despite its crude looks, its as good as a GUI application experience. Enjoy freedom!
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Speech Recognition in English & Polish

    Speech Recognition in English & Polish

    Speech recognition software for English & Polish languages

    Software for speech recognition in English & Polish languages. Basic versions of SkryBot: 1. SkryBot Home Speech (English Language) - https://sourceforge.net/projects/skrybotdomowy/files/ReleasesEnglish/InstalatorSkryBotHomeSpeechDemo-2.6.9.18117.exe/download 2. SkryBot DoMowy (Polish Language) - https://sourceforge.net/projects/skrybotdomowy/files/ReleasesPolish/InstalatorSkryBotDoMowyDemo-2.4.9.18117.exe/download More help: https://sourceforge.net/p/skrybotdomowy/wiki/ Domain advanced versions (Polish Language) 1. SkryBot Prawo - for judicial professionals. 2. SkryBot Administracyjny - for civil and government administration. 3. SkryBot Medycyna Rodzinna - for physicians Professional version of SkryBot (commercial) offers you: 1. Audio conversion and cutting sound files into smaller ones. 2. Searching for words or phrases in sound files (recognized by SkryBot). 3. Editing sound files and automatic cutting off long silence parts in audio file.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    VATSG

    VATSG

    Video automatic transcribe and translated subtitle generator

    It generates srt format subtitle from videofile which can be any source language that whisper support , and then make translated subtitle file of your target language which deepl support. This is the subtitle generator(VATSG) which use [moviepy](https://github.com/Zulko/moviepy) to generate mp3 and then use [faster-whisper](https://github.com/guillaumekln/faster-whisper) to get text recognition and then use deepl-api to generate your target language subtitle file(srt format) If you are a general user who want to view any video file and mp3 file to your language, It will provide way. It's very easy to use because it has simple gui and very intuitive. So you can easily use it for any purpose. Now, you can choose to download either window installer setup type or uninstalled type. Enjoy and support my consistent development!
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    ILA - teachable voice assistant

    ILA - teachable voice assistant

    ILA is a fully customizable and teachable voice assistant for Java

    ILA stands for (kind of) intelligent, learning assistant and is a speech recognition system aka voice assistant very similar to Siri, Google Now and Cortana. ILA is fully customizable and you can teach her/him/it new things by yourself like executing system commands, opening web pages, programs and apps or just some basic conversation :-) ILA runs on Java und thus is compatible to Windows, Mac and Linux. It is designed to integrate with your home enviroment and for example build up your own, free and open Amazon Echo replacement ;-) Right now the key components of ILA are the open source speech recognition CMU Sphinx-4, Google (Speech Recognition/Text-To-Speech) and MaryTTS (Text-To-Speech). The goal is to make ILA completely free of Google by improving all aspects of the open source systems. Since version 3.3 users can also write own add-ons to extend ILA. ILA's successor is the SEPIA Framework: https://sepia-framework.github.io/ Hope you enjoy ILA - Florian
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    AzioSpeech Recognition and Translation

    AzioSpeech Recognition and Translation

    AzioSpeech Recognition and Translation

    Starting from version 1.2.1.0, the project has been renamed to AzioSpeech Recognition and Translation and is officially published in the Microsoft Store at: https://apps.microsoft.com/detail/9PFV5DG73198 A desktop application built with Avalonia UI that provides real-time speech recognition and translation using Azure Speech Services. Convert spoken words into text and translate them into multiple languages with professional-grade accuracy. Important Setup Requirements Before using this application, you MUST have: 1. Azure Account Setup Active Azure Subscription - Create a free account at portal.azure.com Azure Speech Service Resource - You must create your own Speech Service within your Azure subscription Valid API Key & Region - Obtain these credentials from your Azure Speech Service resource 2. Windows Privacy Settings CRITICAL: Microphone Access Required You must grant microphone access through Windows settings
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    XR3Capture

    XR3Capture

    Take screen shots of your computer!

    Comments: Capture your computer screen a lot easier with this app. System Requirements: Java 1.8.0_45++ required. GitHub (https://github.com/goxr3plus/XR3Capture)
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Conversations

    Conversations

    App in java for chatting to a generative A.I. (involving tts and stt)

    Java application for chatting to generative AI Llama3. * The user can speak into the microphone (speechToText), edit the recognized text and send it to the AI. * The AI ​​responds and the server returns that response in real time, and the sentences converted to audio (textToSpeech), and the application broadcasts them through the speaker. The application is prepared so that only one user occupies the server's resources, so if the server is busy, in theory it will not let you connect. There is a demo video that shows how it works: https://frojasg1.com:8443/resource_counter/resourceCounter?operation=countAndForward&url=https%3A%2F%2Ffrojasg1.com%2Fdemos%2Faplicaciones%2Fchat%2F20240815.Demo.Chat.mp4%3Forigin%3Dsourceforge&origin=web
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    JAVT - Just Another Voice Transformer

    JAVT - Just Another Voice Transformer

    Just Another Speech Recognition and Text to Speech software.

    JAVT or Just Another Voice Transformer (formerly, it is called Just Another Video Transcriber) is a Speech Recognition software that also support text to Speech and simple media conversion. JAVT allows you to convert from video files to audio wav file using ffmpeg, and then transcribe the audio file to text using either Microsoft SAPI or CMU Sphinx. You can also open a text file and allow JAVT to read it out for you through text to speech conversion.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Anthromorphic Scribe

    Anthromorphic Scribe

    Provides speech to text gui to sphinx4

    It provides an interactive speech to text application that uses sphinx 4. With this you can use pre-recorded audio, record your own voice and convert incompatible audio/video to be compatible with sphinx 4. It currently supports U.S English by using hub4 acoustic and language model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Coqui STT

    Coqui STT

    The deep learning toolkit for speech-to-text

    Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models. Coqui STT is battle-tested in both production and research. Multiple possible transcripts, each with an associated confidence score. Experience the immediacy of script-to-performance. With Coqui text-to-speech, production times go from months to minutes. With Coqui, the post is a pleasure. Effortlessly clone the voices of your talent and have the clone handle the problems in post. With Coqui, dubbing is a delight. Effortlessly clone the voice of your talent into another language and let the clone do the dub. With text-to-speech, experience the immediacy of script-to-performance. Cast from a wide selection of high-quality, directable, emotive voices or clone a voice to suit your needs. With Coqui text-to-speech, production times go from months to minutes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    JKVSRG Eng Amharic Translator

    JKVSRG Eng Amharic Translator

    Translation between English and Amharic

    JKVSRG English and Amharic translator translates English to Amharic and Amharic to English. User can transliterate English to Amharic with the help of Amharic Letter System which is included in this software. In addition, User can also translate English text file to Amharic text file and vice-versa while comparing it with other Amharic Translators. This translator is an interactive and user friendly translator because of its special feature speech synthesize (text to speech). It runs on all windows operation system. It takes less memory and few seconds to install. When using it, users ensure internet connection and speaker in their system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    JKVSRG Tamil and Amharic Translator

    JKVSRG Tamil and Amharic Translator

    Attractive and user friendly Tamil and Amharic translator

    JKVSRG Tamil and Amharic translator translates between Tamil and Amharic. User can transliterate English to Tamil and English to Amharic with the help of Tamil and Amharic Letter System which is included in this software. In addition, User can also translate multilingual text file to Tamil and Amharic text file. It is an interactive and user friendly translator because of its special feature speech synthesize (text to speech). It runs on windows. It takes a smaller amount of memory and few seconds to install. Users make sure internet connection and speaker in their system while using it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next