audio synthesis free download

Showing 45 open source projects for "audio synthesis"

View related business solutions

Python Linux Clear Filters & Widen Search

Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
La version gratuite d'Auth0 s'enrichit !
Gratuit pour 25 000 utilisateurs avec intégration Okta illimitée : concentrez-vous sur le développement de vos applications.

Vous l'avez demandé, nous l'avons fait ! Les versions gratuite et payante d'Auth0 incluent des options qui vous permettent de développer, déployer et faire évoluer vos applications en toute sécurité. Utilisez Auth0 dès maintenant pour découvrir tous ses avantages.

Essayez Auth0 gratuitement
1

Kitten TTS

State-of-the-art TTS model under 25MB

KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.

Downloads: 14 This Week

Last Update: 2025-08-08
See Project
2

Podcastfy.ai

Transforming Multimodal Content into Captivating Multilingual Audio

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization...

Downloads: 3 This Week

Last Update: 2024-11-16
See Project
3

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

...-of-the-art performance in many multimodal benchmarks, particularly spoken language understanding, audio reasoning, image/video understanding, etc. Very strong benchmark performance across modalities (audio understanding, speech recognition, image/video reasoning) and often outperforming or matching single-modality models at a similar scale. Real-time streaming responses, including natural speech synthesis (text-to-speech) and chunked inputs for low latency interaction.

Downloads: 3 This Week

Last Update: 2025-09-23
See Project
4

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai

MeloTTS is an open-source text-to-speech (TTS) system that generates natural-sounding speech from text input. It utilizes advanced machine-learning models to produce high-quality audio outputs.

Downloads: 2 This Week

Last Update: 2025-01-06
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

GLM-4-Voice

GLM-4-Voice | End-to-End Chinese-English Conversational Model

GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility...

Downloads: 1 This Week

Last Update: 2025-10-04
See Project
6

VALL-E

PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems. VALL...

Downloads: 7 This Week

Last Update: 2023-04-14
See Project
7

CSM (Conversational Speech Model)

A Conversational Speech Generation Model

The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.

Downloads: 4 This Week

Last Update: 2025-03-19
See Project
8

NÜWA - Pytorch

Implementation of NÜWA, attention network for text to video synthesis

Implementation of NÜWA, state of the art attention network for text-to-video synthesis, in Pytorch. It also contains an extension into video and audio generation, using a dual decoder approach. It seems as though a diffusion-based method has taken the new throne for SOTA. However, I will continue on with NUWA, extending it to use multi-headed codes + hierarchical causal transformer. I think that direction is untapped for improving on this line of work. In the paper, they also present a way...

Downloads: 1 This Week

Last Update: 2023-03-22
See Project
9

Deepvoice3_pytorch

PyTorch implementation of convolutional neural networks

An open source implementation of Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.

Downloads: 1 This Week

Last Update: 2024-08-13
See Project
The only CRM built for B2C
Stop chasing transactions. Klaviyo turns customers into diehard fans—obsessed with your products, devoted to your brand, fueling your growth.

Klaviyo unifies your customer profiles by capturing every event, and then lets you orchestrate your email marketing, SMS marketing, push notifications, WhatsApp, and RCS campaigns in one place. Klaviyo AI helps you build audiences, write copy, and optimize — so you can always send the right message at the right time, automatically. With real-time attribution and insights, you'll be able to make smarter, faster decisions that drive ROI.

Learn More
10

Swami Project

A SoundFont editor and other software for editing, managing and sharing sample based MIDI instrument files for computer music composition. Support for other formats is planned.

3 Reviews

Downloads: 1 This Week

Last Update: 2019-03-09
See Project
11

Loris

C++ class library for sound analysis, synthesis, and morphing

Loris is a library for sound analysis, synthesis, and morphing, developed by Kelly Fitz and Lippold Haken at the CERL Sound Group. Loris includes a C++ class library, Python module, C-linkable interface, command line utilities, and documentation.

1 Review

Downloads: 6 This Week

Last Update: 2016-08-23
See Project
12

The MusicKit

The MusicKit & SndKit is an object-oriented software system for building music, sound, signal processing & MIDI applications. The distribution is a comprehensive package that includes on-line documentation, code examples, utilities, applications & scores

Downloads: 4 This Week

Last Update: 2016-05-23
See Project
13

Nsound

A C++ library and Python module for audio synthesis featuring dynamic digital filters. Nsound lets you easily shape waveforms and write to disk or plot them. Nsound aims to be as powerful as Csound but easy to use.

Downloads: 2 This Week

Last Update: 2015-12-12
See Project
14

Steel TTS

A cross-platform wrapper for common text-to-speech engines in Python

Steel is a cross-platform package for using common text-to-speech (speech synthesis) engines in Python. Steel currently supports the following TTS software: - Microsoft Speech API 5 (SAPI5) - eSpeak - NS Speech Synthesis - FreeTTS Documentation: http://sourceforge.net/p/steeltts/wiki/ Bug Tracker: http://sourceforge.net/p/steeltts/tickets/ If you are interested in contributing to the Steel TTS codebase, or would like to make a feature-request, please contact the lead...

Downloads: 1 This Week

Last Update: 2016-03-15
See Project
15

Simpl

Simpl is an open source library for sinusoidal modelling written in the Python programming language and making use of SciPy.

Downloads: 0 This Week

Last Update: 2014-02-23
See Project
16

InproTK

An Incremental Spoken Dialogue Processing Toolkit

InproTK is an Incremental Spoken Dialogue Processing Toolkit, that is, a toolkit to help you build dialogue systems that listen and talk incrementally, allowing for advanced interactional behaviour. Please see our Wiki for more information: http://sourceforge.net/p/inprotk/wiki/

Downloads: 1 This Week

Last Update: 2015-06-16
See Project
17

pyespeak

Python to eSpeak speech synthesis

ctypes Python module for eSpeak http://espeak.sf.net speech synthesis

Downloads: 0 This Week

Last Update: 2017-10-28
See Project
18

gmf_synth

A graphical interface GUI for Fluidsynth Soundfont Player

A graphical interface for software synthesizer or sound-samplers. Currently supported is fluidsynth. Can be used to play SoundFonts, SF2 and MIDI files. Required is an installation of fluidsynth. Written in Python / Qt4.

Downloads: 0 This Week

Last Update: 2016-10-27
See Project
19

Speect

Speect is a multilingual TTS system. It offers a full text-to-speech system with various API's, as well as an environment for research and development of TTS systems and voices. It is written in ANSI C and uses a plug-in mechanism for extensions. Speect also includes an extensive set of Python bindings for quick implementation of new ideas, these bindings are derived from SWIG interface files and can easily be extended for other languages supported by SWIG. Speect is free and open...

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
20

nxAlpha

SuperCollider Code for Livecoding Experimental Sound

SuperCollider Code for Livecoding Experimental Sound

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
21

AudiOpen

AudiOpen is a framework and GUI for audio synthesis, with the goal of developing an environment where obtaining the waveform you want is very intuitive by taking a very systematic/mathematical approach to defining audio and supporting Python scripts.

1 Review

Downloads: 0 This Week

Last Update: 2015-07-23
See Project
22

slurry

slurry is a simple python program that plays sounds at random. it is being created primarily for an experimental film screening in June 2010. it will continue to be developed after this.

Downloads: 0 This Week

Last Update: 2013-05-21
See Project
23

ASTA - Auto. Subtitle Timing Annotator

A collection of scripts and programs to automatically annotate video/audio for subtitles. Basically relies on a MARSYAS (Music Analysis, Retrieval and Synthesis for Audio Signals) plug-in for detecting human voice in polyphonic recordings.

Downloads: 0 This Week

Last Update: 2014-04-24
See Project
24

pymbrola: a python phonemiser for MBROLA

pymbrola aims to be a universal text-to-phoneme engine which supports and promotes the use of the MBROLA TTS synthesizer.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
25

cwtext text to morse code converter

Convert text to International Morse Code. Input is ASCII text. Output can be: - . -..- - on the console, raw 8bit PCM suitable for piping to /dev/audio, .wav files or even (mp3|ogg). Good for headlines on your MP3 player or code practice.

7 Reviews

Downloads: 6 This Week

Last Update: 2013-03-22
See Project