[go: up one dir, main page]

Browse free open source Python AI Models and projects below. Use the toggles on the left to filter open source Python AI Models by OS, license, language, programming language, and project status.

  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • The All-in-One Commerce Platform for Businesses - Shopify Icon
    The All-in-One Commerce Platform for Businesses - Shopify

    Shopify offers plans for anyone that wants to sell products online and build an ecommerce store, small to mid-sized businesses as well as enterprise

    Shopify is a leading all-in-one commerce platform that enables businesses to start, build, and grow their online and physical stores. It offers tools to create customized websites, manage inventory, process payments, and sell across multiple channels including online, in-person, wholesale, and global markets. The platform includes integrated marketing tools, analytics, and customer engagement features to help merchants reach and retain customers. Shopify supports thousands of third-party apps and offers developer-friendly APIs for custom solutions. With world-class checkout technology, Shopify powers over 150 million high-intent shoppers worldwide. Its reliable, scalable infrastructure ensures fast performance and seamless operations at any business size.
    Learn More
  • 1
    Qwen3 Embedding

    Qwen3 Embedding

    Designed for text embedding and ranking tasks

    Qwen3-Embedding is a model series from the Qwen family designed specifically for text embedding and ranking tasks. It builds upon the Qwen3 base/dense models and offers several sizes (0.6B, 4B, 8B parameters), for both embedding and reranking, with high multilingual capability, long‐context understanding, and reasoning. It achieves state-of-the-art performance on benchmarks like MTEB (Multilingual Text Embedding Benchmark) and supports instruction-aware embedding (i.e. embedding task instructions along with queries) and flexible embedding/vector dimension definitions. It is meant for tasks such as text retrieval, classification, clustering, bitext mining, and code retrieval.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Ring

    Ring

    Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI

    Ring is a reasoning Mixture-of-Experts (MoE) large language model (LLM) developed by inclusionAI. It is built from or derived from Ling. Its design emphasizes reasoning, efficiency, and modular expert activation. In its “flash” variant (Ring-flash-2.0), it optimizes inference by activating only a subset of experts. It applies reinforcement learning/reasoning optimization techniques. Its architectures and training approaches are tuned to enable efficient and capable reasoning performance. Reasoning-optimized model with reinforcement learning enhancements. Efficient architecture and memory design for large-scale reasoning. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    sg2im is a research codebase that learns to synthesize images from scene graphs—structured descriptions of objects and their relationships. Instead of conditioning on free-form text alone, it leverages graph structure to control layout and interactions, generating scenes that respect constraints like “person left of dog” or “cup on table.” The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts. This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. In practice, sg2im demonstrates how structured semantics can guide generative models to produce controllable, compositional imagery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Style Aligned

    Style Aligned

    Official code for Style Aligned Image Generation via Shared Attention

    StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even as objects or attributes change. The result is more consistent edits across a set, which is crucial for workflows like product variations, character sheets, or brand-coherent art. The repository provides reproducible scripts, reference prompts, and guidance for tuning strengths so users can dial in subtle retouches or bolder substitutions. Because it builds on widely used diffusion checkpoints, creators can integrate it without training or dataset collection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 5
    Surya

    Surya

    Implementation of the Surya Foundation Model for Heliophysics

    Surya is an open‑source, AI‑based foundation model for heliophysics developed collaboratively by NASA (via the IMPACT AI team) and IBM. Named after the Sanskrit word for “sun,” Surya is trained on nine years of high‑resolution solar imagery from NASA’s Solar Dynamics Observatory (SDO). It is designed to forecast solar phenomena—such as flares, solar wind, irradiance, and active region behavior—by predicting future solar images with a sophisticated long–short vision transformer architecture, thereby enabling improved space weather forecasting. Foresees solar flares, wind, EUV spectra, and active region formation in advance. Achieves approximately 16% improvement in forecasting accuracy over traditional methods. 366-million‑parameter foundation model capturing general-purpose solar representations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Tencent-Hunyuan-Large

    Tencent-Hunyuan-Large

    Open-source large language model family from Tencent Hunyuan

    Tencent-Hunyuan-Large is the flagship open-source large language model family from Tencent Hunyuan, offering both pre-trained and instruct (fine-tuned) variants. It is designed with long-context capabilities, quantization support, and high performance on benchmarks across general reasoning, mathematics, language understanding, and Chinese / multilingual tasks. It aims to provide competitive capability with efficient deployment and inference. FP8 quantization support to reduce memory usage (~50%) while maintaining precision. High benchmarking performance on tasks like MMLU, MATH, CMMLU, C-Eval, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    TimeSformer

    TimeSformer

    The official pytorch implementation of our paper

    TimeSformer is a vision transformer architecture for video that extends the standard attention mechanism into spatiotemporal attention. The model alternates attention along spatial and temporal dimensions (or designs variants like divided attention) so that it can capture both appearance and motion cues in video. Because the attention is global across frames, TimeSformer can reason about dependencies across long time spans, not just local neighborhoods. The official implementation in PyTorch provides configurations, pretrained models, and training scripts that make it straightforward to evaluate or fine-tune on video datasets. TimeSformer was influential in showing that pure transformer architectures—without convolutional backbones—can perform strongly on video classification tasks. Its flexible attention design allows experimenting with different factoring (spatial-then-temporal, joint, etc.) to trade off compute, memory, and accuracy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ToMe (Token Merging)

    ToMe (Token Merging)

    A method to increase the speed and lower the memory footprint

    ToMe (Token Merging) is a PyTorch-based optimization framework designed to significantly accelerate Vision Transformer (ViT) architectures without retraining. Developed by researchers at Facebook (Meta AI), ToMe introduces an efficient technique that merges similar tokens within transformer layers, reducing redundant computation while preserving model accuracy. This approach differs from token pruning, which removes background tokens entirely; instead, ToMe merges tokens based on feature similarity, allowing it to compress both foreground and background information efficiently. ToMe integrates seamlessly into existing transformer models such as DeiT, MAE, SWAG, and timm ViTs, offering 2–3x speedups during inference and substantial efficiency gains during training. The method can be applied dynamically at inference time or incorporated into training for improved performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Tracking Any Point (TAP)

    Tracking Any Point (TAP)

    DeepMind model for tracking arbitrary points across videos & robotics

    TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art accuracy and speed on TAP-Vid. RoboTAP demonstrates how TAPIR-style tracks can drive real-world robot manipulation via efficient imitation, and ships with a dataset of annotated robotics videos. The repo provides JAX and PyTorch checkpoints, Colab demos, and a real-time live demo that runs on a GPU to let you select and track points interactively.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    UCO3D

    UCO3D

    Uncommon Objects in 3D dataset

    uCO3D is a large-scale 3D vision dataset and toolkit centered on turn-table videos of everyday objects drawn from the LVIS taxonomy. It provides about 170,000 full videos per object instance rather than still frames, along with per-video annotations including object masks, calibrated camera poses, and multiple flavors of point clouds. Each sequence also ships with a precomputed 3D Gaussian Splat reconstruction, enabling fast, differentiable rendering workflows and modern implicit/point-based modeling experiments. The repository includes automated downloaders with checksum verification, fine-grained controls to fetch only selected modalities or super-categories, and a lightweight Python API for loading frames, geometry, and splats on demand. Metadata is indexed in SQLite for quick queries at scale, and helper builders handle alignment, undistortion, frame extraction from videos, and cropping around the object.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    VMZ (Video Model Zoo)

    VMZ (Video Model Zoo)

    VMZ: Model Zoo for Video Modeling

    The codebase was designed to help researchers and practitioners quickly reproduce FAIR’s results and leverage robust pre-trained backbones for downstream tasks. It also integrates Gradient Blending, an audio-visual modeling method that fuses modalities effectively (available in the Caffe2 implementation). Although VMZ is now archived and no longer actively maintained, it remains a valuable reference for understanding early large-scale video model training, transfer learning, and multimodal integration strategies that influenced modern architectures like SlowFast and X3D.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    VisualGLM-6B

    VisualGLM-6B

    Chinese and English multimodal conversational language model

    VisualGLM-6B is an open-source multimodal conversational language model developed by ZhipuAI that supports both images and text in Chinese and English. It builds on the ChatGLM-6B backbone, with 6.2 billion language parameters, and incorporates a BLIP2-Qformer visual module to connect vision and language. In total, the model has 7.8 billion parameters. Trained on a large bilingual dataset — including 30 million high-quality Chinese image-text pairs from CogView and 300 million English pairs — VisualGLM-6B is designed for image understanding, description, and question answering. Fine-tuning on long visual QA datasets further aligns the model’s responses with human preferences. The repository provides inference APIs, command-line demos, web demos, and efficient fine-tuning options like LoRA, QLoRA, and P-tuning. It also supports quantization down to INT4, enabling local deployment on consumer GPUs with as little as 6.3 GB VRAM.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Watermark Anything

    Watermark Anything

    Official implementation of Watermark Anything with Localized Messages

    Watermark Anything (WAM) is an advanced deep learning framework for embedding and detecting localized watermarks in digital images. Developed by Facebook Research, it provides a robust, flexible system that allows users to insert one or multiple watermarks within selected image regions while maintaining visual quality and recoverability. Unlike traditional watermarking methods that rely on uniform embedding, WAM supports spatially localized watermarks, enabling targeted protection of specific image regions or objects. The model is trained to balance imperceptibility, ensuring minimal visual distortion, with robustness against transformations and edits such as cropping or motion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    fairseq2

    fairseq2

    FAIR Sequence Modeling Toolkit 2

    fairseq2 is a modern, modular sequence modeling framework developed by Meta AI Research as a complete redesign of the original fairseq library. Built from the ground up for scalability, composability, and research flexibility, fairseq2 supports a broad range of language, speech, and multimodal content generation tasks, including instruction fine-tuning, reinforcement learning from human feedback (RLHF), and large-scale multilingual modeling. Unlike the original fairseq—which evolved into a large, monolithic codebase—fairseq2 introduces a clean, plugin-oriented architecture designed for long-term maintainability and rapid experimentation. It supports multi-GPU and multi-node distributed training using DDP, FSDP, and tensor parallelism, capable of scaling up to 70B+ parameter models. The framework integrates seamlessly with PyTorch 2.x features such as torch.compile, Fully Sharded Data Parallel (FSDP), and modern configuration management.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    gpt-oss-20b

    gpt-oss-20b

    OpenAI’s compact 20B open model for fast, agentic, and local use

    GPT-OSS-20B is OpenAI’s smaller, open-weight language model optimized for low-latency, agentic tasks, and local deployment. With 21B total parameters and 3.6B active parameters (MoE), it fits within 16GB of memory thanks to native MXFP4 quantization. Designed for high-performance reasoning, it supports Harmony response format, function calling, web browsing, and code execution. Like its larger sibling (gpt-oss-120b), it offers adjustable reasoning depth and full chain-of-thought visibility for better interpretability. It’s released under a permissive Apache 2.0 license, allowing unrestricted commercial and research use. GPT-OSS-20B is compatible with Transformers, vLLM, Ollama, PyTorch, and other tools. It is ideal for developers building lightweight AI agents or experimenting with fine-tuning on consumer-grade hardware.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    minGPT

    minGPT

    A minimal PyTorch re-implementation of the OpenAI GPT

    minGPT is a minimalist, educational re-implementation of the GPT (Generative Pretrained Transformer) architecture built in PyTorch, designed by Andrej Karpathy to expose the core structure of a transformer-based language model in as few lines of code as possible. It strips away extraneous bells and whistles, aiming to show how a sequence of token indices is fed into a stack of transformer blocks and then decoded into the next token probabilities, with both training and inference supported. Because the whole model is around 300 lines of code, users can follow each step—from embedding lookup, positional encodings, multi-head attention, feed-forward layers, to output heads—and thus demystify how GPT-style models work beneath the surface. It provides a practical sandbox for experimentation, letting learners tweak the architecture, dataset, or training loop without being overwhelmed by framework abstraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    pyllama

    pyllama

    LLaMA: Open and Efficient Foundation Language Models

    📢 pyllama is a hacked version of LLaMA based on original Facebook's implementation but more convenient to run in a Single consumer grade GPU.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    xFormers

    xFormers

    Hackable and optimized Transformers building blocks

    xformers is a modular, performance-oriented library of transformer building blocks, designed to allow researchers and engineers to compose, experiment, and optimize transformer architectures more flexibly than monolithic frameworks. It abstracts components like attention layers, feedforward modules, normalization, and positional encoding, so you can mix and match or swap optimized kernels easily. One of its key goals is efficient attention: it supports dense, sparse, low-rank, and approximate attention mechanisms (e.g. FlashAttention, Linformer, Performer) via interchangeable modules. The library includes memory-efficient operator implementations in both Python and optimized C++/CUDA, ensuring that performance isn’t sacrificed for modularity. It also integrates with PyTorch seamlessly so you can drop in its blocks to existing models, replace default attention layers, or build new architectures from scratch. xformers includes training, deployment, and memory profiling tools.
    Downloads: 0 This Week
    Last Update:
    See Project