[go: up one dir, main page]

Showing 373 open source projects for "transcribe audio to text"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • The Easy Way To Build A Referral Program Icon
    The Easy Way To Build A Referral Program

    Referral Factory is the #1 referral software used by SMEs and Marketers.

    Referral Factory offers over 1000 pre-built referral program templates you can use as your own, or you can build your own referral program from scratch. You get unlimited referral campaigns on all plans, and brilliant support from their team of referral marketing experts.
    Learn More
  • 1
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using pip install SpeechRecognition. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 2
    Buzz

    Buzz

    Transcribe and translate audio offline on your personal computer

    Buzz transcribes and translates audio to text offline using OpenAI's Whisper. Import audio and video files into Buzz and export them as TXT, SRT, or VTT files. Buzz supports Whisper, Whisper.cpp, Faster Whisper, Whisper-compatible models from the Hugging Face repository, and the OpenAI Whisper API. Get linux versions from: - https://flathub.org/apps/io.github.chidiwilliams.Buzz - https://snapcraft.io/buzz Home page of Buzz https://github.com/chidiwilliams/buzz Note for Windows: App is not signed, you will get a warning when you install it. ...
    Leader badge">
    Downloads: 3,922 This Week
    Last Update:
    See Project
  • 3
    Frescobaldi

    Frescobaldi

    LilyPond sheet music text editor

    Frescobaldi is a free and open source LilyPond sheet music text editor. Designed to be powerful yet lightweight and easy-to-use, Frescobaldi offers great functionality and a host of useful features such as music view with advanced two-way Point & Click, Midi capturing to enter music, a Snippet Manager and many more. Frescobaldi is named after Girolamo Frescobaldi (1583-1643), an Italian composer of keyboard music in the late Renaissance and early Baroque period.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 4
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Dynamic Work and Complex Project Management Platform | Quickbase Icon
    Dynamic Work and Complex Project Management Platform | Quickbase

    Quickbase is the leading application platform for dynamic work.

    Our no-code platform lets you easily create, connect, and customize enterprise applications that fix visibility and workflow gaps without replacing a single system.
    Learn More
  • 5
    Podcastfy.ai

    Podcastfy.ai

    Transforming Multimodal Content into Captivating Multilingual Audio

    Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization and scale.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    AudioNotes

    AudioNotes

    Extract audio and video content and organize it into a Markdown note

    AudioNotes is an application (or proof-of-concept) that likely combines audio recording or playback with note-taking or annotation functionality — enabling users to record voice or audio and attach textual or timestamped notes, making it ideal for lectures, interviews, meetings, or personal memos. Such a tool offers a more expressive and flexible way to capture and revisit information: instead of just typed notes or raw audio, users get both audio context and structured notes. As an...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Speakr

    Speakr

    Speakr is a personal, self-hosted web application

    Speakr is an open-source, real-time text-to-speech (TTS) web application that allows users to convert written text into natural-sounding speech in just a few clicks. It provides a clean, user-friendly interface where users can input text, choose a voice style or language, and immediately hear the output, making it ideal for accessibility, content creation, and learning applications.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Moshi

    Moshi

    A speech-text foundation model for real time dialogue

    ...Moshi models two streams of audio: one corresponds to Moshi, and the other one to the user. At inference, the stream from the user is taken from the audio input, and the one for Moshi is sampled from the model's output. Along these two audio streams, Moshi predicts text tokens corresponding to its own speech, its inner monologue, which greatly improves the quality of its generation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    ...PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.
    Downloads: 12 This Week
    Last Update:
    See Project
  • Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution Icon
    Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution

    For Windows-Centric Organizations Looking for Secure File Transfer solutions

    Globalscape’s Enhanced File Transfer (EFT) platform is a comprehensive, user-friendly managed file transfer (MFT) software. Thousands of Windows-Centric Organizations trust Globalscape EFT for their mission-critical file transfers.
    Learn More
  • 10
    Translate-Subtitle-File

    Translate-Subtitle-File

    Subtitle Creation Assistant

    Subtitle group machine translation assistant - [Function 1: Translate subtitle file] .srt .ass .vtt [Function 2: Voice to text] (Drag in video or audio to recognize subtitles) (The latest version v4.1.0 Update time 2021 2 May 23) 12 translation service providers can be configured, such as Google, Baidu, Tencent, Caiyun, IBM, Azure, Amazon, etc. (6 voice service providers can be configured: Alibaba Cloud, Xunfei, Tencent Cloud, IBM, Azure, Amazon ) Advantages: 1.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    Text to Chord

    Text to Chord

    Turn words into chords

    Convert words and sentences to 5 note chords you can use to inspire music creation. Have fun turning your name, your city name, your friends' names, your team's name, your pet's name into wild and original harmonies that go beyond serialism and classic jazz.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Tagify

    Tagify

    Lightweight, efficient Tags input component in Vanilla JS

    Transforms an input field or a textarea into a Tags component, in an easy, customizable way, with great performance and a small code footprint, exploded with features. Customizable HTML templates for the different areas of the component (wrapper, tags, dropdown, dropdown item, dropdown header, dropdown footer) Shows suggestions list (flexible settings & styling) at full (component) width or next to the typed texted (caret) Allows setting suggestions' aliases for easier fuzzy-searching....
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Screenity

    Screenity

    The most powerful screen recorder & annotation tool for Chrome

    ...Annotate your screen to give feedback, emphasize your clicks, edit your recording, and much more. Make unlimited recordings of your tab, desktop, any application, and camera. Annotate by drawing anywhere on the screen, adding text, and creating arrows. Highlight your clicks, focus on your mouse, or hide it from the recording. Individual microphone and computer audio controls, push to talk, and more. Custom countdowns, show controls only on hover, and many other customization options. Export as mp4, gif, and webm, or save the video directly to Google Drive. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 14
    Peer Calls

    Peer Calls

    Group peer to peer video calls for everyone written in Go

    Peer Calls is a self-hosted, open-source WebRTC-based video and audio calling platform for group communication. Designed for simplicity and privacy, it allows anyone to run their own video conferencing service without relying on third-party providers. Peer Calls supports multi-user rooms, screen sharing, and chat, all delivered via a clean web interface. It’s great for small teams, communities, and educational groups seeking secure and customizable alternatives to mainstream conferencing tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    Omilo - a text to speech application

    Omilo is a simple text to speech application

    Omilo is a simple text to speech application for Windows and Linux using Festival, Flite, Marytts and Piper voices.
    Leader badge">
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    react-use

    react-use

    Component for React

    Tracks device battery state. Plays audio and exposes its controls. Tracks geo location state of user's device. Triggers callback when user clicks outside target area. Tracks mouse hover state of some element. Display an element or video full-screen. Tracks location hash value. Tracks whether user is being inactive. Tracks an HTML element's intersection. Synthesizes speech from a text string.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17

    Russian Text-to-speech programs

    читание, чтение, говорение

    For Windows (on Linux trought Wine can work) 3 russian text-to-speech programs (Chitanie, Chtenie and Govorenie). If you want donate. paypal.me/alkbab Читание, Чтение, Говорение есть программы пробующие преобразовать русский текст в русскую речь . Для Windows. На Linux через Wine... Кто хочет может пожертвовать paypal.me/alkbab
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Simple TTS Reader

    Simple TTS Reader

    A small clipboard reader

    Simple TTS Reader is a small utility that reads text from your clipboard using Microsoft Speech API. Whenever you copy any text, the app instantly converts it into spoken words. Select your preferred speech engine from those installed on your system, such as Microsoft Zira, and adjust speed and volume for personalized playback. The application can also be minimized to the system tray. Plus, it is free and comes with an intuitive interface that makes it accessible to everyone.
    Leader badge">
    Downloads: 100 This Week
    Last Update:
    See Project
  • 19

    Equalizer APO

    A system-wide equalizer for Windows 7 / 8 / 8.1 / 10 / 11

    Equalizer APO is a parametric / graphic equalizer for Windows. It is implemented as an Audio Processing Object (APO) for the system effect infrastructure introduced with Windows Vista. Requirements: - Windows Vista or later (Windows 7 - 11 have been tested) - CPU architecture x64 (64 bit), x86 (32 bit) or ARM64 (on Windows 10/11) - applications must not bypass the system effect infrastructure (APIs like ASIO or WASAPI exclusive mode can't be used) Equalizer APO can be used in conjunction with Room EQ Wizard (http://www.roomeqwizard.com/), because it can read its filter text file format. ...
    Leader badge">
    Downloads: 85,974 This Week
    Last Update:
    See Project
  • 20
    Fx FloorBoard

    Fx FloorBoard

    Graphical editor software for many Boss and Roland effect & synth unit

    Editors for the BOSS GT-1, 3, 5, 6, 8, Pro, 10, 100, 001 Guitar Multi-Effects Processors, and the BOSS GT-1B,6B,10B Bass Multi-Effects Processors. And for the Katana Amplifier. and various Boss and Roland guitar synths. This software can visually edit parameters via Midi,USB, or Bluetooth(via external device on some units) on the Multi-Effects/Amp/Synth Processor.
    Leader badge">
    Downloads: 608 This Week
    Last Update:
    See Project
  • 21

    Subtitle-Workshop-Classic-v6.3.4

    Subtitle Editor derived from 6.0c, but with VLC and Hunspell checker

    Audio waveform, VLC Video Renderer, UTF8 coding, Audio stream detection and Selection, Resizeable screens, Hunspell spellcheck, Easy shortcut editing, user profiles and more than 70 filetypes supported.
    Leader badge">
    Downloads: 95 This Week
    Last Update:
    See Project
  • 22
    eGuideDog free software for the blind
    eGuideDog project develops free software for the blind. Currently, we focus on WebSpeech, Ekho TTS and WebAnywhere.
    Leader badge">
    Downloads: 146 This Week
    Last Update:
    See Project
  • 23
    htmid

    htmid

    Generative Music For Beginners and Everyone Else

    Generative music is a fascinating and innovative approach to music creation that involves creating procedurally generated music that evolves and changes over time. Whether you're a beginner or a seasoned musician, this guide will introduce you to the world of generative music and show you how to create your own live music performances. Generative music is music that is ever-changing and created in real-time. It can be created by anyone, with or without musical experience. Learn how to...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 24
    RemoteTTS

    RemoteTTS

    Tool to remotely activate Text-To-Speech (TTS) on a server

    The tool provides a simple TCP/UDP interface to let a remote machine perform TTS outputs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Shutter Encoder

    Shutter Encoder

    Free professional video converter Windows|Mac|Linux

    Shutter Encoder is an video, audio and image converter based on FFmpeg and other great tools. It has been designed by video editors in order to be as accessible and efficient as possible. It's a swiss knife tool for any video editor. Link to website & downloads : https://www.shutterencoder.com - Without conversion: Cut without re-encoding, Replace audio, Rewrap, Conform, Merge, Extract, Subtitling, Video inserts - Sound conversions: WAV, AIFF, FLAC, ALAC, MP3, AAC, AC3,...
    Leader badge">
    Downloads: 96 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next