ICLR2024 Spotlight: curation/training code, metadata, distribution
Repo for external large-scale work
Memory-efficient and performant finetuning of Mistral's models
Implementation of "MobileCLIP" CVPR 2024
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Open, non-commercial SDXL model for quality image generation
Reasoning-powered OCR VLM for converting complex documents to Markdown
Production-tested AI infrastructure tools
React app for inspecting, building and debugging with the Realtime API
Instructions on how to use the Realtime API on Microcontrollers
Vision-language-action model for robot control via images and text
800,000 step-level correctness labels on LLM solutions to MATH problem
Phi-3.5 for Mac: Locally-run Vision and Language Models
Analyze computation-communication overlap in V3/R1
Generate embeddings from large-scale graph-structured data
QwQ-32B is a reasoning-focused language model for complex tasks
An advanced bilingual image editing with semantic control
Instruction-tuned 7B language model for chat and complex tasks
Powerful 14B LLM with strong instruction and long-text handling
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Multimodal 7B model for image, video, and text understanding tasks
Qwen3-Next: 80B instruct LLM with ultra-long context up to 1M tokens
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Foundational Models for State-of-the-Art Speech and Text Translation
Official code for Style Aligned Image Generation via Shared Attention