HunyuanCustom

HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for identity reinforcement and modality-specific condition injection. Text-image fusion module based on LLaVA for improved multimodal understanding. Applicable to single- and multi-subject scenarios, video editing/replacement, singing avatars etc.

Features

Supports multimodal conditions: text, image, audio, and video input modalities
Identity / subject consistency modules (e.g. image ID enhancement, temporal concatenation) across frames
Text-image fusion module based on LLaVA for improved multimodal understanding
AudioNet module for hierarchical alignment of audio conditions
Video-driven condition injection via patchify-based feature alignment network
Applicable to single- and multi-subject scenarios, video editing / replacement, singing avatars etc.

Project Samples

Project Activity

See All Activity >

Follow HunyuanCustom

HunyuanCustom Web Site

User Reviews

Be the first to post a review of HunyuanCustom!

Additional Project Details

Programming Language

Python

Related Categories

Python AI Video Generators, Python AI Models

Registered

2025-09-23

Similar Business Software

HunyuanCustom

HunyuanCustom is a multi-modal customized video generation framework that emphasizes subject consistency while supporting image, audio, video, and text conditions. Built upon HunyuanVideo, it introduces a text-image fusion module based on LLaVA for enhanced multi-modal understanding, along with...

See Software
LTX

Control every aspect of your video using AI, from ideation to final edits, on one holistic platform. We’re pioneering the integration of AI and video production, enabling the transformation of a single idea into a cohesive, AI-generated video. LTX empowers individuals to share their visions,...

See Software
HunyuanVideo-Avatar

HunyuanVideo‑Avatar supports animating any input avatar images to high‑dynamic, emotion‑controllable videos using simple audio conditions. It is a multimodal diffusion transformer (MM‑DiT)‑based model capable of generating dynamic, emotion‑controllable, multi‑character dialogue videos. It...

See Software
Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
LM-Kit.NET

LM-Kit.NET is a cutting-edge, high-level inference SDK designed specifically to bring the advanced capabilities of Large Language Models (LLM) into the C# ecosystem. Tailored for developers working within .NET, LM-Kit.NET provides a comprehensive suite of powerful Generative AI tools, making...

See Software
Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software

Report inappropriate content

HunyuanCustom

Multimodal-Driven Architecture for Customized Video Generation

Get an email when there's a new version of HunyuanCustom

Features

Project Samples

Project Activity

Categories

Follow HunyuanCustom

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered