Qwen-Image is a powerful image generation foundation model
Foundation model for image generation
EPUB to audiobook converter, optimized for Audiobookshelf
Focus on prompting and generating
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
A Powerful Native Multimodal Model for Image Generation
OCRmyPDF adds an OCR text layer to scanned PDF files
CLIP, Predict the most relevant text snippet given an image
Comprehensive Markdown plugin built for Django
Official inference repo for FLUX.2 models
Implementation of Imagen, Google's Text-to-Image Neural Network
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Chat & pretrained large vision language model
Awesome multilingual OCR toolkits based on PaddlePaddle
Label Studio is a multi-type data labeling and annotation tool
A Unified Framework for Text-to-3D and Image-to-3D Generation
Stable Diffusion web UI
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Official MiniMax Model Context Protocol (MCP) server
Multimodal-Driven Architecture for Customized Video Generation
Generating Immersive, Explorable, and Interactive 3D Worlds
Collection of Gemma 3 variants that are trained for performance
Official inference repo for FLUX.1 models
text and image to video generation: CogVideoX (2024) and CogVideo