Recovering the Visual Space from Any Views
A general-purpose AI image generation framework that supports HF
Foundational Models for State-of-the-Art Speech and Text Translation
Multimodal embedding and reranking models built on Qwen3-VL
Code for running inference and finetuning with SAM 3 model
Build Vision Agents quickly with any model or video provider
Open-source MCP server that gives your coding agent
Sharp Monocular Metric Depth in Less Than a Second
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
NVR with realtime local object detection for IP cameras
Inference script for Oasis 500M
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Open source text-to-speech tool, supports extra-long text
An Open Source package that allows video game creators
Workflow and speech recognition app
Overcoming Data Limitations for High-Quality Video Diffusion Models
A Customizable Image-to-Video Model based on HunyuanVideo
Detect faces in an image
Standalone chat system developed in php, mysql and javascript
A Free, complete, enterprise grade, open source exam management system
Learning to Act by Watching Unlabeled Online Videos
A deep learning library for video understanding research
The official pytorch implementation of our paper
Playout Server that can act as a TV Station in a Box