A Unified Framework for Text-to-3D and Image-to-3D Generation
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Multimodal Diffusion with Representation Alignment
Ling is a MoE LLM provided and open-sourced by InclusionAI
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Implementation of "MobileCLIP" CVPR 2024
Pokee Deep Research Model Open Source Repo
Qwen2.5-VL is the multimodal large language model series
Chat & pretrained large audio language model proposed by Alibaba Cloud
A series of math-specific large language models of our Qwen2 series
Qwen3-omni is a natively end-to-end, omni-modal LLM
Video understanding codebase from FAIR for reproducing video models
Learning to Act by Watching Unlabeled Online Videos
Open-source, high-performance Mixture-of-Experts large language model
Powerful open source image generation model
Open-Source Financial Large Language Models!
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Blazeface is a lightweight model that detects faces in images
Detect faces in an image
A CNN model that predicts human joints from RGB images of a person
Runtime extension of Proximus enabling Deployment on AMD Ryzen™ AI
4M: Massively Multimodal Masked Modeling
A repository of trained models
ChatGPT integration with Unity Editor
Programmatic access to the AlphaGenome model