Reasoning-powered OCR VLM for converting complex documents to Markdown
Production-tested AI infrastructure tools
Python example app from the OpenAI API quickstart tutorial
React app for inspecting, building and debugging with the Realtime API
Instructions on how to use the Realtime API on Microcontrollers
Vision-language-action model for robot control via images and text
800,000 step-level correctness labels on LLM solutions to MATH problem
A Production-ready Reinforcement Learning AI Agent Library
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Phi-3.5 for Mac: Locally-run Vision and Language Models
Analyze computation-communication overlap in V3/R1
Latent Diffusion and Stable Diffusion Implementation
A mix of GAN implementations including progressive growing
Generate embeddings from large-scale graph-structured data
QwQ-32B is a reasoning-focused language model for complex tasks
An advanced bilingual image editing with semantic control
Chat & pretrained large vision language model
Instruction-tuned 7B language model for chat and complex tasks
Powerful 14B LLM with strong instruction and long-text handling
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Multimodal 7B model for image, video, and text understanding tasks
Qwen3-Next: 80B instruct LLM with ultra-long context up to 1M tokens
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Foundational Models for State-of-the-Art Speech and Text Translation