Search Results for "speech to speech translation"

Sort By:

Showing 1246 open source projects for "speech to speech translation"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Workable Hiring Software - Hire The Best People, Fast
Find the best candidates with the best recruitment software

Workable is the preferred software for today's recruiting industry and HR teams, trusted by over 6,000 companies to streamline their hiring processes. Finding the right person for the job has never been easier—users now possess the ability to manage multiple hiring pipelines at once, from posting a job to sourcing candidates. Workable is also seamlessly integrated between desktop and mobile, allowing admins full control and flexibility all in the ATS without needing additional software.

Learn More
1

Speech Note

Speech Note Linux app. Note taking, reading and translating

Speech Note is a Linux desktop and Sailfish OS application for taking, reading, and translating notes with integrated offline speech technology. It combines speech-to-text, text-to-speech, and machine translation in a single interface, allowing users to dictate notes, listen back to them, and translate them without ever sending data to the cloud.

Downloads: 6 This Week

Last Update: 2025-11-28
See Project
2

Fish Speech

SOTA Open Source TTS

Fish Speech is a state-of-the-art open-source text-to-speech project that has evolved into the OpenAudio series of advanced TTS models. The repository hosts the code and tooling for training, fine-tuning, and serving high-quality TTS, while the current flagship models (OpenAudio-S1 and S1-mini) are distributed via Fish Audio’s playground and Hugging Face.

Downloads: 8 This Week

Last Update: 2025-11-28
See Project
3

Whisper

Robust Speech Recognition via Large-Scale Weak Supervision

OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. ...

Downloads: 70 This Week

Last Update: 2025-06-26
See Project
4

StreamSpeech

StreamSpeech is a seamless model for offline speech recognition

...The model supports eight tasks: offline ASR, speech-to-text translation, speech-to-speech translation, and TTS, as well as their streaming or simultaneous counterparts, all handled by the same underlying system. During simultaneous translation, StreamSpeech can optionally output intermediate ASR transcripts and text translations, giving users or downstream applications real-time visibility into what the system is hearing and how it is translating.

Downloads: 2 This Week

Last Update: 2025-11-28
See Project
Project Planning and Management Software | Planview
For Enterprise PMOs

Planview® ProjectAdvantage (formerly Sciforma) is an enterprise-centric project and portfolio management (PPM) software designed to enable change, drive innovation, and lead in a company's digital transformation. With ProjectAdvantage, teams can strategically track and monitor project data in order to make relevant decisions. It offers multiple features focused on strategic management, functional management, and execution management. A highly scalable and cost-effective solution, ProjectAdvantage is available in various deployment models.

Learn More
5

Vosk Speech Recognition Toolkit

Offline speech recognition API for Android, iOS, Raspberry Pi

Speech recognition bindings are implemented for various programming languages like Python, Java, Node.JS, C#, C++, Rust, Go and others. Vosk supplies speech recognition for chatbots, smart home appliances, and virtual assistants. It can also create subtitles for movies, and transcription for lectures and interviews. Vosk scales from small devices like Raspberry Pi or Android smartphones to big clusters.

Downloads: 99 This Week

Last Update: 2024-04-22
See Project
6

Voice-Pro

Comprehensive Gradio WebUI for audio processing

Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.

1 Review

Downloads: 38 This Week

Last Update: 2025-12-05
See Project
7

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation model

Speech-AI-Forge is a full-stack project built around modern text-to-speech generation models, providing both an API server and a Gradio-based web UI for interactive use. At its core, it acts as a hub that wires together multiple speech-related capabilities, including TTS, speech-to-text and LLM-based control flows, behind a consistent interface.

Downloads: 4 This Week

Last Update: 2026-02-02
See Project
8

SoniTranslate

Synchronized Translation for Videos

SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets the generated dub track stay in sync with the original video structure. ...

Downloads: 42 This Week

Last Update: 2025-11-28
See Project
9

ESPnet

End-to-end speech processing toolkit

ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
Cloud-Based Software Licensing - Zentitle by Nalpeiron
The #1 Software Licensing Solution. Release new Software License Models fast with no engineering. Increase software sales and drive up revenues.

1000’s software companies have used Zentitle to launch new software products fast and control their entitlements easily - many going from startup to IPO on our platform. Our software monetization infrastructure allows you to easily build or

Learn More
10

KrillinAI

Video translation and dubbing tool powered by LLMs

KrillinAI is an end-to-end content localization, translation, and dubbing tool aimed at helping creators transform videos into multiple languages with minimal manual effort. It integrates several stages of the pipeline: video acquisition (either from local files or remote via download tools), speech recognition (ASR), subtitle segmentation and alignment, machine translation (with context-aware translation to preserve semantics), and voice cloning + text-to-speech (TTS) to produce dubbed audio tracks. ...

Downloads: 9 This Week

Last Update: 2025-11-28
See Project
11

AzioSpeech Recognition and Translation

AzioSpeech Recognition and Translation

Starting from version 1.2.1.0, the project has been renamed to AzioSpeech Recognition and Translation and is officially published in the Microsoft Store at: https://apps.microsoft.com/detail/9PFV5DG73198 A desktop application built with Avalonia UI that provides real-time speech recognition and translation using Azure Speech Services. Convert spoken words into text and translate them into multiple languages with professional-grade accuracy.

Downloads: 2 This Week

Last Update: 1 day ago
See Project
12

Polyglot

Cross-platform AI language practice app

...It includes translation features, dark mode, playback of the user’s own recorded speech, and word highlighting that tracks the progress of synthesized audio to make following along easier. Polyglot also integrates additional AI providers, supports configurable conversation scenarios, and lets users personalize avatars, making the experience more engaging and flexible.

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
13

Pot Desktop

A cross-platform software for text translation and recognition

Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. The tool supports external plugin extensions, which means its functionality can be expanded far beyond the built-in options: you can add translation engines, OCR backends, TTS engines, vocabulary export (e.g. for language learning), and more. ...

Downloads: 8 This Week

Last Update: 2025-11-28
See Project
14

Translate-Subtitle-File

Subtitle Creation Assistant

...You can use multiple service providers, 2. You can configure your own API Key to use your own account's free quota, such as Tencent's free translation quota of 5 million characters per month, IBM's 500-minute speech-to-text free quota (tern. best The domain name has expired and I don't want to renew it.) Azure speech-to-text and DeepL free version have problems, it is normal to not use it, please wait for the next version to fix. Machine translation of subtitle files, use machine translation to process files.

Downloads: 9 This Week

Last Update: 2025-09-09
See Project
15

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

...Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. High performance on many standard benchmarks: ASR, speech-emotion recognition, vocal sound classification, speech translation etc.

Downloads: 1 This Week

Last Update: 2025-09-23
See Project
16

sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.

Downloads: 216 This Week

Last Update: 1 hour ago
See Project
17

OpenAI.fm

Code for openai.fm, a demo for the OpenAI Speech API

OpenAI.fm is an official interactive demo application built to showcase the OpenAI Speech API and its advanced text-to-speech capabilities, providing developers and creators with a hands-on web interface to convert text into high-quality, customizable audio using state-of-the-art TTS models. Developed using Next.js and the OpenAI Speech API, this demo illustrates how the latest neural voice models can produce natural, expressive speech with adjustable styles and voices, highlighting features like emotional range, tone, and real-time playback. ...

Downloads: 185 This Week

Last Update: 2026-01-28
See Project
18

speech intonator

The purpose of the project is to develop audio processing algorithms

The initial version of the main branch of the project has been completed. The main name of the project is "Java audio mixer Summaha". The second name of the project is "Sound Arithmometer". Main purpose - production of musical sound remixes from a set of samples. The name "Summaha" rhymes well with 'Yamaha' and creates motivation and inspiration to achieve a sound quality comparable to with a well-known brand. Detailed documentation in 'read' signature files. Anyone who is...

Downloads: 0 This Week

Last Update: 2025-10-13
See Project
19

EasyVoice

Open source text-to-speech tool, supports extra-long text

easyVoice is an open-source text-to-speech platform aimed at turning long-form text and novels into high-quality audio, with a strong focus on usability and scalability. It provides a web interface where users can paste or upload large texts and generate speech and subtitles in a single workflow, even for works exceeding 100,000 characters. The system supports multi-role voice acting, letting users assign different neural voices to different characters or narrative roles and configure parameters such as rate, pitch, and volume per role. ...

Downloads: 4 This Week

Last Update: 2026-01-26
See Project
20

WhisperKit

On-device Speech Recognition for Apple Silicon

WhisperKit is a Swift package that integrates OpenAI's popular Whisper speech recognition model with Apple's CoreML framework for efficient, local inference on Apple devices. Whisper has pulled the future forward when fast, free and virtually error-free translation and transcription will be ubiquitous. It inspired numerous developers to improve and deploy it with minimal friction and maximum performance. We founded Argmax in November 2023 to empower developers and enterprises everywhere to deploy commercial-scale inference workloads on user devices. ...

Downloads: 5 This Week

Last Update: 2025-11-07
See Project
21

TTS Voice Wizard

Speech to Text to Speech, sends text as OSC messages

Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!) You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods.

Downloads: 18 This Week

Last Update: 2025-11-02
See Project
22

SpeechRecognition

Speech recognition module for Python

Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details).

Downloads: 20 This Week

Last Update: 2025-12-31
See Project
23

Transcoder

Hardware-accelerated video transcoding using Android MediaCodec APIs

Transcoder by DeepMedia is an AI-powered video-to-video speech translation engine that enables fully automated multilingual dubbing. Unlike traditional speech translation systems that rely on multi-stage pipelines, Transcoder directly translates one speaker’s video into another language while preserving facial expressions, lip-sync, and vocal identity. Designed for real-time use and production-grade pipelines, Transcoder combines advanced deep learning models with GPU acceleration to deliver high-quality translations across languages. ...

Downloads: 8 This Week

Last Update: 2025-03-25
See Project
24

Lingvo

Framework for building neural networks

...It has been used to implement state of the art architectures such as recurrent neural networks, Transformer models, variational autoencoder hybrids, and multi task systems. Lingvo includes reference models and configurations for domains like machine translation, automatic speech recognition, language modeling, image understanding, and 3D object detection. Centralized hyperparameter configuration files allow researchers to share exact experiment setups so others can retrain and compare results reliably.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
25

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai

MeloTTS is an open-source text-to-speech (TTS) system that generates natural-sounding speech from text input. It utilizes advanced machine-learning models to produce high-quality audio outputs.

Downloads: 4 This Week

Last Update: 2025-01-06
See Project