[go: up one dir, main page]

Showing 166 open source projects for "transcription"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Intelligent Automation Solutions Built for Modern Finance Teams Icon
    Intelligent Automation Solutions Built for Modern Finance Teams

    We do CFO stuff.

    Digitally transform your business with workflow automation and integrated payment solutions. Digitally store and secure your data with advanced search and accessibility features that keeps your documents at the tip of your team’s fingers.
    Learn More
  • 1
    Amical

    Amical

    Open Source AI Dictation App

    ...It leverages both local and cloud-based AI models, letting users seamlessly switch between providers for the ideal balance of speed, precision, and control, and understands the context of each app in use to automatically format text in a tone and style appropriate to the platform. Users can enhance transcription accuracy with custom vocabulary tailored to industry jargon, proper nouns, and personal terms, and set up personalized voice shortcuts to trigger workflows or dictate across applications. Amical supports multilingual dictation with over 50 languages at native-level accuracy. Its features include a floating desktop widget for easy access, voice-activated commands, custom hotkeys, transcription history, and more.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Note67

    Note67

    A private, local meeting notes assistant

    note67 is a private, local meeting notes assistant application that combines audio capture, transcription, and AI-powered summarization to help users document conversations and meetings on their own devices without relying on cloud services. Built with a cross-platform architecture using Rust (via Tauri) for backend logic and a TypeScript/React frontend, it prioritizes privacy by performing audio transcription locally with Whisper models and generating summaries with locally-hosted AI, eliminating the need to send sensitive meeting content to external servers. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    ...Developed using Tauri (Rust + React/TypeScript), it runs natively across Windows, macOS, and Linux while performing local speech recognition without sending any audio to cloud servers. Handy allows users to start transcription instantly using a configurable keyboard shortcut—press to record, release to transcribe—and automatically pastes the resulting text into any active text field. Its backend leverages OpenAI’s Whisper models for GPU-accelerated speech recognition and Parakeet V3 for efficient CPU-only transcription with automatic language detection. ...
    Downloads: 48 This Week
    Last Update:
    See Project
  • 4
    Qwen3-ASR

    Qwen3-ASR

    Qwen3-ASR is an open-source series of ASR models

    ...This makes Qwen3-ASR suitable for voice-driven applications like AI assistants, dictation tools, speech analytics pipelines, and accessibility features, where accurate and fluid transcription is critical.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Securely stream and govern industrial data to power intelligent operations with agentic insights. Icon
    Securely stream and govern industrial data to power intelligent operations with agentic insights.

    For IoT Developers, Solution Architects, Technical Architects, CTOs, OT/IT Engineers

    Trusted MQTT Platform — Fully-managed and cloud-native MQTT platform for bi-directional IoT data movement.
    Learn More
  • 5
    Concordia

    Concordia

    Crowdsourcing platform for full text transcription and tagging

    Concordia is a platform for crowdsourcing transcription and tagging of text in digitized images. It was developed by the Library of Congress so that volunteers of all backgrounds could transcribe and tag digitized images of manuscripts and typed materials from the Library’s collections that could not otherwise be done by optical character recognition.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    BlogWizard

    BlogWizard

    Generate blog articles from video or audio

    ...For content creators, educators, or businesses producing audio/video content, blogwizard automates the tedious, manual process of transcription + blog writing, saving time while ensuring output quality.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 38 This Week
    Last Update:
    See Project
  • 8
    Whishper

    Whishper

    Transcribe any audio to text, translate and edit subtitles 100% locall

    Open-source, local-first audio transcription and subtitling suite with a simple web UI. Thanks to open-source technologies, Whishper can run 100% offline. Your data never leaves your computer. Whishper allows you to translate your transcriptions to and from more than 60 languages thanks to Argos Translate and LibreTranslate. Download the transcriptions in many formats (json, txt, vtt, srt).
    Downloads: 19 This Week
    Last Update:
    See Project
  • 9
    Basic Pitch

    Basic Pitch

    A lightweight audio-to-MIDI converter with pitch bend detection

    ...Provide a compatible audio file and a basic-pitch will generate a MIDI file, complete with pitch bends. The basic pitch is instrument-agnostic and supports polyphonic instruments, so you can freely enjoy transcription of all your favorite music, no matter what instrument is used. Basic pitch works best on one instrument at a time.
    Downloads: 33 This Week
    Last Update:
    See Project
  • Rental Property Management Software Made Easy Icon
    Rental Property Management Software Made Easy

    Save 15 hours a month, put your rental portfolio on autopilot and make accounting a breeze.

    DoorLoop is built for property managers, management companies, owners, landlords, investors, community associations, or anyone managing any property worldwide.
    Learn More
  • 10
    Meetily

    Meetily

    Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper

    ...It’s built for organizations that want meeting intelligence without sending recordings or transcripts to third-party cloud services, which helps address compliance and data sovereignty requirements. The app supports live transcription with local model options (including Whisper- and Parakeet-based workflows) and presents the transcript as the meeting happens, making it useful both for note-taking and accessibility. After or during the session, it can produce structured, AI-generated summaries, and it’s designed to be flexible about where that summarization comes from, supporting local providers as well as external endpoints when allowed by policy.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 11
    HeartMuLa

    HeartMuLa

    A Family of Open Sourced Music Foundation Models

    ...The project also includes HeartCodec, a music codec optimized for high reconstruction fidelity, enabling efficient tokenization and reconstruction workflows that are critical for training and generation pipelines. For text extraction from audio, it provides HeartTranscriptor, a Whisper-based model tuned specifically for lyrics transcription, which helps bridge generated or recorded audio back into structured text. It also introduces HeartCLAP, which aligns audio and text into a shared embedding space.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 12
    Vosk Speech Recognition Toolkit

    Vosk Speech Recognition Toolkit

    Offline speech recognition API for Android, iOS, Raspberry Pi

    ...It can also create subtitles for movies, and transcription for lectures and interviews. Vosk scales from small devices like Raspberry Pi or Android smartphones to big clusters.
    Downloads: 99 This Week
    Last Update:
    See Project
  • 13
    AutoCut

    AutoCut

    Cut videos with a text editor

    ...This approach transforms video editing into a textual editing task, greatly lowering the barrier to editing for users who find traditional video editors complex or unintuitive. AutoCut supports multiple transcription backends, including Whisper and faster-whisper modes, allowing users to choose based on speed or accuracy needs. After editing the transcript text, the corresponding video clips are merged into the final output, and the tool also produces matching subtitle files. Its command-line interface can be integrated into scripts, making it suitable for automated workflows or batch processing.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    SoniTranslate

    SoniTranslate

    Synchronized Translation for Videos

    SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets the generated dub track stay in sync with the original video structure. The project supports a wide range of languages for translation, spanning major world languages (English, Spanish, French, German, Chinese, Arabic, etc.) and many regional or less widely spoken languages, making it suitable for broad internationalization. ...
    Downloads: 42 This Week
    Last Update:
    See Project
  • 15
    Actions

    Actions

    Supercharge your shortcuts

    ...Want to run shortcuts directly from the iOS Lock Screen? Check out my Quick Launch app. And trigger shortcuts on your Mac from your iOS device with my Hyperduck app. And for high-quality transcription, see my Aiko app.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    # Radio Transcription Tool v3.1 A professional Python application for recording and transcribing Dutch and Belgian radio streams using OpenAI Whisper API, with advanced keyword extraction powered by KeyBERT. ## 🎯 Features - **Live Radio Recording**: Record streams from 40+ Dutch and Belgian radio stations - **Live Stream Listening**: Listen to radio streams without recording - **AI Transcription**: High-quality transcription using OpenAI Whisper API - **Smart Keyword Extraction**: Advanced phrase analysis with KeyBERT - **Professional UI**: Modern Tkinter interface with Bluvia branding - **Organized Output**: Timestamped folders with MP3 recordings and transcriptions - **API Key Management**: Built-in OpenAI API key configuration
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    whisper.cpp

    whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples. whisper.cpp supports integer quantization of the Whisper ggml models. ...
    Downloads: 392 This Week
    Last Update:
    See Project
  • 18
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    OpenAI .NET

    OpenAI .NET

    The official .NET library for the OpenAI API

    ...Every synchronous method has an async counterpart, and the library offers convenient streaming primitives for chat completions so you can process tokens as they arrive. It supports tool/function calling, structured outputs via JSON schema, audio input/output, image generation, embeddings, Whisper transcription, and assistants with retrieval augmented generation.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    ...It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. GLM-4-Voice builds upon the bilingual strengths of the GLM architecture, supporting both Chinese and English, and is designed to handle long-form conversations with context retention. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    RealtimeSTT

    RealtimeSTT

    A robust, efficient, low-latency speech-to-text library

    RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    Groq TypeScript / Node.s

    Groq TypeScript / Node.s

    The official Node.js / Typescript library for the Groq API

    Groq TypeScript / Node.s (also often referred to as “groq-sdk” on npm) is the official Node.js / TypeScript client library for Groq’s REST API, enabling JavaScript/TypeScript developers to integrate LLM and AI-powered services into web backends, serverless functions, or frontend apps. It exports strongly-typed interfaces for models, chat completions, file uploads (e.g. for audio transcription), and other endpoints, allowing for better type safety and developer experience when using Groq from TypeScript. The library also supports passing different input types (file streams, blobs, fetch responses) for media-related endpoints, making it flexible for diverse environments (backend, browser, serverless). With this SDK, developers can call Groq’s models, transcribe audio, perform file uploads — all with minimal boilerplate — which streamlines creation of AI-enabled applications in the JavaScript/TypeScript ecosystem.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Groq Python

    Groq Python

    The official Python Library for the Groq API

    Groq Python is the official Python SDK for the Groq REST API, giving Python developers straightforward access to Groq’s LLM, chat, audio, and other AI services. Through this library, you can call Groq’s models from Python code — for example to request chat completions, code generation, transcription, or any supported endpoint — using idiomatic Python syntax. The SDK handles authentication (via environment variable or parameter), defines proper type-safe request/response data types, and supports both synchronous and asynchronous usage patterns depending on your application needs. This makes it easy to integrate Groq-powered AI capabilities into backend services, data pipelines, research notebooks, or applications written in Python. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    WhisperKit

    WhisperKit

    On-device Speech Recognition for Apple Silicon

    WhisperKit is a Swift package that integrates OpenAI's popular Whisper speech recognition model with Apple's CoreML framework for efficient, local inference on Apple devices. Whisper has pulled the future forward when fast, free and virtually error-free translation and transcription will be ubiquitous. It inspired numerous developers to improve and deploy it with minimal friction and maximum performance. We founded Argmax in November 2023 to empower developers and enterprises everywhere to deploy commercial-scale inference workloads on user devices. The fast-growing need for Whisper inference in production convinced us to take it on as our first project.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines that chain together transforms, filters, and exporters, enabling automation of tedious data preparation steps and accelerating insights with minimal code. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next