audio video sync free download

676 projects for "audio video sync" with 1 filter applied:

BSD Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Optimize every aspect of hiring with Greenhouse Recruiting
Hire for what’s next.

What’s next for many of us is changing. Your company’s ability to hire great talent is as important as ever – so you’ll be ready for whatever’s ahead. Whether you need to scale your team quickly or improve your hiring process, Greenhouse gives you the right technology, know-how and support to take on what’s next.

Learn More
1

Shairport Sync

AirPlay audio player

Audio played by a Shairport Sync-powered device stays synchronized with the source and hence with similar devices playing the same source. In this way, synchronized multi-room audio is possible for players that support it, such as iTunes and the macOS Music app. Shairport Sync runs on Linux, FreeBSD and OpenBSD. It does not support AirPlay video or photo streaming.

Downloads: 3 This Week

Last Update: 14 hours ago
See Project
2

LTX-Video

Official repository for LTX-Video

LTX-Video is a sophisticated multimedia processing framework from Lightricks designed to handle high-quality video editing, compositing, and transformation tasks with performance and scalability. It provides runtime components that efficiently decode, encode, and manipulate video streams, frame buffers, and audio tracks while exposing a rich API for building customized editing features like transitions, effects, color grading, and keyframe automation. ...

Downloads: 8 This Week

Last Update: 2026-01-11
See Project
3

SoniTranslate

Synchronized Translation for Videos

SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets the generated dub track stay in sync with the original video structure. ...

Downloads: 42 This Week

Last Update: 2025-11-28
See Project
4

HunyuanCustom

Multimodal-Driven Architecture for Customized Video Generation

HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images.

Downloads: 1 This Week

Last Update: 2025-10-15
See Project
The CI/CD Platform built for Mobile DevOps
For mobile app developers interested in a powerful CI/CD platform for mobile app development and mobile DevOps

Save time, money, and developer frustration with fast, flexible, and scalable mobile CI/CD that just works. Whether you swear by native or would rather go cross-platform, we have you covered. From Swift to Objective-C, Java to Kotlin, as well as Xamarin, Cordova, Ionic, React Native, and Flutter: Whatever you choose, we will automatically configure your initial workflows and have you building in minutes.

Learn More
5

HunyuanVideo-Foley

Multimodal Diffusion with Representation Alignment

HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. ...

Downloads: 1 This Week

Last Update: 2025-09-28
See Project
6

HunyuanVideo-Avatar

Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model

...Emotion control by extracting emotion reference images and transferring emotional style into video sequences.

Downloads: 1 This Week

Last Update: 2025-12-16
See Project
7

AI YouTube Shorts Generator

A python tool that uses GPT-4, FFmpeg, and OpenCV

AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies subtitle overlays, producing a polished short video without manual editing. ...

Downloads: 11 This Week

Last Update: 2026-02-05
See Project
8

HLS.js

HLS.js is a JavaScript library that plays HLS in browsers

HLS.js is a JavaScript library that implements an HTTP Live Streaming client. It relies on HTML5 video and MediaSource Extensions for playback. It works by transmuxing MPEG-2 Transport Stream and AAC/MP3 streams into ISO BMFF (MP4) fragments. Transmuxing is performed asynchronously using a Web Worker when available in the browser. HLS.js also supports HLS + fmp4, as announced during WWDC2016. HLS.js works directly on top of a standard HTML<video> element. HLS.js is written in ECMAScript6...

Downloads: 18 This Week

Last Update: 2025-11-18
See Project
9

BlogWizard

Generate blog articles from video or audio

BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections, formatting, and possibly metadata. ...

Downloads: 1 This Week

Last Update: 2025-12-19
See Project
Cloudbrink Personal SASE service
For companies looking for low maintenance, secure, high performance connectivity for hybrid and remote workers

Cloudbrink’s Personal SASE is a high-performance connectivity and security service that delivers a lightning-fast, in-office experience to the modern hybrid workforce anywhere. Combining high-performance ZTNA with Automated Moving Target Defense (AMTD), and Personal SD-WAN all connections are ultra-secure.

Learn More
10

Remotion

Make videos programmatically with React

...The framework supports exporting to standard video formats, audio synchronization, frame callbacks, and powerful tooling for previewing and debugging, so teams can iterate quickly and reliably.

Downloads: 3 This Week

Last Update: 1 day ago
See Project
11

FastRTC

The python library for real-time communication

FastRTC is a Python library designed to simplify real-time communication (RTC), especially for audio and video streaming applications. It abstracts away much of the complexity that typically comes with implementing WebRTC by providing a simple interface — e.g. a Stream class — that can be mounted within a web backend (for example a FastAPI application). This makes it particularly well suited for building real-time voice (or video) interfaces for applications such as AI assistants, live chat, or collaborative audio/video tools. ...

Downloads: 2 This Week

Last Update: 2025-11-28
See Project
12

LiveAvatar

Streaming Real-time Audio-Driven Avatar Generation

LiveAvatar is an open-source research and implementation project that provides a unified framework for real-time, streaming, interactive avatar video generation driven by audio and other control signals. It implements techniques from state-of-the-art diffusion-based avatar modeling to support infinite-length continuous video generation with low latency, enabling interactive AI avatars that maintain continuity and realism over extended sessions. The project co-designs algorithms and system optimizations, such as block-wise autoregressive processing and fast sampling strategies, to deliver real-time frame rates (e.g., ~45 FPS on appropriate GPU clusters) while handling non-stop generation without quality degradation. ...

Downloads: 2 This Week

Last Update: 2026-01-30
See Project
13

Kaset

The missing YouTube Music macOS app

Kaset is a social audio platform framework that allows users to host, share, and interact with audio content in community-oriented spaces, combining elements of podcasting, voice rooms, and feedback-driven discovery. It provides an interface where creators can upload episodes, host live or scheduled voice sessions, and cultivate listener communities through comments, reactions, and follow systems. The platform emphasizes audio discovery with playlists, curated channels, and trending audio...

Downloads: 4 This Week

Last Update: 2026-02-03
See Project
14

Membrane Core

The core of Membrane Framework, multimedia processing framework

membrane_core is the foundation of the Membrane multimedia framework for Elixir, providing the abstractions and runtime needed to build real-time audio and video pipelines. It models media processing as a graph of lightweight, supervised OTP processes—elements connected by links—so work is isolated, fault-tolerant, and easy to scale or reconfigure at runtime. The core defines a clear lifecycle and callback API for elements, plus concepts like buffers, events, and capabilities/format negotiation to keep components interoperable and type-safe. ...

Downloads: 5 This Week

Last Update: 2025-12-22
See Project
15

SALMONN family

A suite of advanced multi-modal LLMs

SALMONN is a family of advanced multi-modal large language models (LLMs) developed by ByteDance — designed to handle and integrate multiple data modalities (e.g. text, audio, video) rather than just plain text. The repository bundles different branches targeting specialized tasks (e.g. video-SALMONN, speech-quality assessment, general multimodal tasks), suggesting that the project is modular and extensible across domains. SALMONN aims to push the frontier of multi-modal AI by allowing models to process and reason over diverse inputs, which can be useful for applications such as video understanding, speech analytics, cross-modal retrieval, and general AI capable of interpreting rich, multi-sensory data. ...

Downloads: 1 This Week

Last Update: 2025-12-02
See Project
16

ffmpeg.wasm

FFmpeg for browser, powered by WebAssembly

ffmpeg.wasm is a pure WebAssembly (and JavaScript/TypeScript) port of FFmpeg that enables in-browser media recording, conversion, and streaming—letting developers perform video/audio processing entirely client-side without server uploads. Transpiled via Emscripten from FFmpeg and its codecs into WebAssembly. Supports both single-threaded and multi-threaded cores using web workers. Written in TypeScript for improved developer experience.

Downloads: 11 This Week

Last Update: 2025-08-12
See Project
17

Vidi2

Large Multimodal Models for Video Understanding and Editing

Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and even video question answering. ...

Downloads: 1 This Week

Last Update: 2026-01-22
See Project
18

Laravel FFMpeg

This package provides an integration with FFmpeg for Laravel

...You can easily add a watermark using the addWatermark method. With the WatermarkFactory, you can open your watermark file from a specific disk, just like opening an audio or video file. When you discard the fromDisk method, it uses the default disk specified in the filesystems.php configuration file.

Downloads: 7 This Week

Last Update: 2025-04-01
See Project
19

Live API Web Console

A react-based starter app for using the Live API over websockets

Live API Web Console is a React starter that demonstrates how to use Gemini’s Live API over WebSockets to build real-time, multimodal experiences. The app includes modules for streaming audio playback, recording user media from the microphone, webcam, or even screen capture, and it surfaces a unified event log so you can debug the session as it flows. Configuration lives in a simple .env file and the project boots with standard web tooling, letting you experiment quickly with models, system...

Downloads: 6 This Week

Last Update: 2025-10-14
See Project
20

Story Flicks

Generate high-definition story short videos with one click using AI

...Because the project is open and modifiable, developers can customize the generation pipeline: adjust story structure, alter rendering parameters, tweak video quality or resolution, or integrate with other AI models (e.g. for audio, voice-over, or image-to-video). It’s especially useful as a starting template or experimentation ground for developers building automated content-creation tools.

Downloads: 5 This Week

Last Update: 2025-12-14
See Project
21

xgplayer

A HTML5 video player with a parser that saves traffic

xgplayer is a web-friendly, open-source media player library maintained by ByteDance, designed for playing audio/video streams in browsers or web applications with robust control, flexibility, and extensibility. It abstracts many of the lower-level complexities of HTML5 media, providing a consistent API for playback control, custom UI overlays, adaptive streaming, plugin hooks, and cross-browser compatibility. Because of its emphasis on modularity and extensibility, xgplayer can be embedded into modern web projects and customized — developers can add controls, custom buffering strategies, subtitle handling, adaptive bitrate streaming, or integrate with other web-based video infrastructures. ...

Downloads: 2 This Week

Last Update: 2025-12-01
See Project
22

Open Vision Agents by Stream

Build Vision Agents quickly with any model or video provider

Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. ...

Downloads: 2 This Week

Last Update: 1 day ago
See Project
23

Network Audio System

Network transparent, client/server audio transport system

The Network Audio System is a network transparent, client/server audio transport system. It can be described as the audio equivalent of an X server. This project is currently in maintenance mode. Patches are accepted, releases will be sporadic.

">

Downloads: 28 This Week

Last Update: 2025-03-19
See Project
24

LTX-2

Python inference and LoRA trainer package for the LTX-2 audio–video

LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries,...

Downloads: 60 This Week

Last Update: 5 days ago
See Project
25

EasyVoice

Open source text-to-speech tool, supports extra-long text

...It offers streaming playback so audio starts almost immediately, even for very long inputs, and automatically generates subtitle files suitable for video production or translation workflows. Under the hood, easyVoice uses a modern stack with Vue 3 and Element Plus on the front end, Node.js and Express on the back end, and TTS engines such as Microsoft Azure TTS and OpenAI-compatible APIs, orchestrated through ffmpeg.

Downloads: 4 This Week

Last Update: 2026-01-26
See Project