A library for accelerating Transformer models on NVIDIA GPUs
A real time inference engine for temporal logical specifications
High-performance reactive message-passing based Bayesian engine
A high-throughput and memory-efficient inference and serving engine
950 line, minimal, extensible LLM inference engine built from scratch
lightweight, standalone C++ inference engine for Google's Gemma models
RGBD video generation model conditioned on camera input
Code for running inference and finetuning with SAM 3 model
Offline inference engine for art, real-time voice conversations
Pruna is a model optimization framework built for developers
Inference Llama 2 in one file of pure C
Fast inference engine for Transformer models
User-friendly AI Interface
Lightweight inference library for ONNX files, written in C++
Tensor search for humans
Fully private LLM chatbot that runs entirely with a browser
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference
Superduper: Integrate AI models and machine learning workflows
Extensible workflow development framework
A GPU-accelerated library containing highly optimized building blocks
Enables the best performance on NVIDIA RTX Graphics Cards
Multi-Agent daTa geneRation Infra and eXperimentation framework
TypeDB: a strongly-typed database
Open-Source AI Camera. Empower any camera/CCTV
Hardware-accelerated video transcoding using Android MediaCodec APIs