[go: up one dir, main page]

Search Results for "video ai upscale" - Page 2

60 projects for "video ai upscale" with 1 filter applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Budgyt Is The Highest Rated Business Budgeting Software In The Market. Icon
    Budgyt Is The Highest Rated Business Budgeting Software In The Market.

    Affordable budgeting software for companies with multiple users and multiple departments.

    Budgyt is an easy to use, intuitive platform with a clean simple interface that makes budgeting multiple P&L’s easy to do without needing Excel.
    Book a Demo
  • 1
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography,...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Peinture

    Peinture

    A general-purpose AI image generation framework that supports HF

    Peinture is a sleek, dark-themed web application that brings AI-powered image generation to artists, designers, and casual creators through a modern interface built with React, TypeScript, and Tailwind CSS. Instead of tying users to a single service, Peinture integrates multiple backend providers, including Hugging Face, Gitee AI, Model Scope, and others, so you can switch between models or extend support to custom endpoints — giving you flexibility over where and how your images are...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Seamless Communication

    Seamless Communication

    Foundational Models for State-of-the-Art Speech and Text Translation

    Seamless Communication is a research project focused on building more integrated, low-latency multimodal communication between humans and AI agents. The motivation is to move beyond “text in, text out” and enable direct, live, multi-turn exchange involving language, gesture, gaze, vision, and modality switching without user friction. The system architecture includes a real-time multimodal signal pipeline for audio, video, and sensor data, a dialog manager that can decide when to act (speak, gesture, point) or query, and a cross-modal reasoning layer that fuses perception with semantic context. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Effortlessly manage macOS, iOS, iPadOS and tvOS devices across multiple clients and locations. Icon
    Effortlessly manage macOS, iOS, iPadOS and tvOS devices across multiple clients and locations.

    The Most Powerful Apple Device Management Tool for MSPs and IT Teams

    Addigy solutions accelerate Apple adoption in any environment.
    Learn More
  • 5
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an...
    Downloads: 69 This Week
    Last Update:
    See Project
  • 6
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    FlowLens MCP

    FlowLens MCP

    Open-source MCP server that gives your coding agent

    FlowLens MCP Server is an open-source tool designed to give AI-powered coding agents (like Claude Code, Cursor, GitHub Copilot / Codex, and others) full, replayable browser context to dramatically improve debugging, bug reporting, and regression testing for web applications. It works together with a companion browser extension: when a user reproduces a bug or a complicated UI interaction, the extension captures a rich session log, including screen/video recording, network traffic, console logs, DOM events, storage changes, and more, and exports it. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Depth Pro

    Depth Pro

    Sharp Monocular Metric Depth in Less Than a Second

    Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    ...Its architecture supports a very large context window (on the order of 128K tokens during training), which lets it handle complex multimodal inputs like long documents, multi-page reports, or video transcripts, while maintaining coherence across extended content. In benchmarks and internal evaluations, GLM-4.6V achieves state-of-the-art (SoTA) performance among models of comparable parameter scale on multimodal reasoning.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build experiences that drive engagement and increase transactions Icon
    Build experiences that drive engagement and increase transactions

    Connect your users - doctors, gamers, shoppers, or lovers - wherever they are.

    Sendbird's chat, voice, and video APIs power conversations and communities in hundreds of the most innovative apps and products. Sendbird’s feature-rich platform, and pre-fab UI components make developers more productive. We take care of a ton of operational complexity under the hood, so you can power a rich chat service, and life-like voice, and video experiences, and not worry about features, edge cases, reliability, or scale.
    Learn More
  • 10
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Frigate

    Frigate

    NVR with realtime local object detection for IP cameras

    Frigate - NVR With Realtime Object Detection for IP Cameras A complete and local NVR designed for Home Assistant with AI object detection. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras. Use of a Google Coral Accelerator is optional, but highly recommended. The Coral will outperform even the best CPUs and can process 100+ FPS with very little overhead.
    Downloads: 32 This Week
    Last Update:
    See Project
  • 12
    Oasis

    Oasis

    Inference script for Oasis 500M

    Open-Oasis provides inference code and released weights for Oasis 500M, an interactive world model that generates gameplay frames conditioned on user keyboard input. Instead of rendering a pre-built game world, the system produces the next visual state via a diffusion-transformer approach, effectively “imagining” the world response to your actions in real time. The project focuses on enabling action-conditional frame generation so developers can experiment with interactive, model-generated...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T N1.5 is the world's first open foundation model

    NVIDIA Isaac‑GR00T N1.5 is an open-source foundation model engineered for generalized humanoid robot reasoning and manipulation skills. It accepts multimodal inputs—such as language and images—and uses a diffusion transformer architecture built upon vision-language encoders, enabling adaptive robot behaviors across diverse environments. It is designed to be customizable via post-training with real or synthetic data. The vision-language model remains frozen during both pretraining and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    EasyVoice

    EasyVoice

    Open source text-to-speech tool, supports extra-long text

    easyVoice is an open-source text-to-speech platform aimed at turning long-form text and novels into high-quality audio, with a strong focus on usability and scalability. It provides a web interface where users can paste or upload large texts and generate speech and subtitles in a single workflow, even for works exceeding 100,000 characters. The system supports multi-role voice acting, letting users assign different neural voices to different characters or narrative roles and configure...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Godot RL Agents

    Godot RL Agents

    An Open Source package that allows video game creators

    godot_rl_agents is a reinforcement learning integration for the Godot game engine. It allows AI agents to learn how to interact with and play Godot-based games using RL algorithms. The toolkit bridges Godot with Python-based RL libraries like Stable-Baselines3, making it possible to create complex and visually rich RL environments natively in Godot.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    comfyui-mixlab-nodes

    comfyui-mixlab-nodes

    Workflow and speech recognition app

    ...It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that mix live screen content, generative models, and visual effects. For audio and speech, it provides nodes for SpeechRecognition and SpeechSynthesis, plus workflows that combine voice generation with real-time face swapping and other audio-visual effects. On the AI side, it integrates multiple LLM providers (cloud and local), supports OpenAI-compatible endpoints, Siliconflow models, and includes prompt-focused utilities for random prompt generation, Chinese prompts, clip interrogation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    VideoCrafter2

    VideoCrafter2

    Overcoming Data Limitations for High-Quality Video Diffusion Models

    VideoCrafter is an open-source video generation and editing toolbox designed to create high-quality video content. It features models for both text-to-video and image-to-video generation. The system is optimized for generating videos from textual descriptions or still images, leveraging advanced diffusion models. VideoCrafter2, an upgraded version, improves on its predecessor by enhancing motion dynamics and concept combinations, especially in low-data scenarios. Users can explore a wide...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework developed by Tencent, extending the capabilities of HunyuanVideo. It allows for high-quality video creation from still images, using PyTorch and providing pre-trained model weights, inference code, and customizable training options. The system includes a LoRA training code for adding special effects and enhancing video realism, aiming to offer versatile and scalable solutions for generating videos from static image inputs.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    MediaPipe Face Detection

    MediaPipe Face Detection

    Detect faces in an image

    The MediaPipe Face Detection model is a high-performance, real-time face detection solution that uses machine learning to identify faces in images and video streams. It is optimized for mobile and embedded platforms, offering fast and accurate face detection while maintaining a small memory footprint. This model supports multiple face detections and is highly efficient, making it suitable for a variety of applications such as augmented reality, user authentication, and facial expression analysis.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 20
    phpMyChat

    phpMyChat

    Standalone chat system developed in php, mysql and javascript

    phpMyChat-Plus is an easy-to-install, easy-to-use multi-room PHP/DB chat. It is currently available for MySQLi, MariaDB, PostgreSQL, and ODBC. It supports IRC-like commands, moderators, is available in 24 languages and fully compatible with php7.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 21
    Quiz/Survey/Test - QST

    Quiz/Survey/Test - QST

    A Free, complete, enterprise grade, open source exam management system

    QST, the worlds unparalleled open source online/lan assessment software. From a quick quiz on your phone to very large scale, high stakes, proctored desktop testing, we make it easy/secure/economical. Our intuitive design contains features (Immediate detailed results, Create/Export/Import/Convert Questions, WYSIWYG/Math-Chemistry/Basic Editors, Question/Item Bank, Multiple Question Types, Multiple Delivery Styles, Multiple Delivery/Results Options, Adaptive/Branching Questions, Randomly...
    Leader badge">
    Downloads: 43 This Week
    Last Update:
    See Project
  • 22
    Video Pre-Training

    Video Pre-Training

    Learning to Act by Watching Unlabeled Online Videos

    The Video PreTraining (VPT) repository provides code and model artifacts for a project where agents learn to act by watching human gameplay videos—specifically, gameplay of Minecraft—using behavioral cloning. The idea is to learn general priors of control from large-scale, unlabeled video data, and then optionally fine-tune those priors for more goal-directed behavior via environment interaction. The repository contains demonstration models of different widths, fine-tuned variants (e.g. for...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    PyTorchVideo

    PyTorchVideo

    A deep learning library for video understanding research

    ...It supports video I/O pipelines, data augmentation, distributed training, and mixed precision computation for large-scale experiments. PyTorchVideo also connects seamlessly with other Meta AI tools such as Detectron2 and PyTorch3D for multimodal video analysis. Designed to accelerate research and deployment, it serves as a unified framework for reproducible, high-performance video AI development.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    TimeSformer

    TimeSformer

    The official pytorch implementation of our paper

    TimeSformer is a vision transformer architecture for video that extends the standard attention mechanism into spatiotemporal attention. The model alternates attention along spatial and temporal dimensions (or designs variants like divided attention) so that it can capture both appearance and motion cues in video. Because the attention is global across frames, TimeSformer can reason about dependencies across long time spans, not just local neighborhoods. The official implementation in PyTorch...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    cBroadcast

    cBroadcast

    Playout Server that can act as a TV Station in a Box

    *** this version is outdated email : software.chasapis@gmail.com to get the latest code Bug fixes, code optimization + compliance recorder, ingest recorders, stream renderer (udp, rtsp, hls) e.t.c. TV Station in a box server This is a TV Station scheduled playout video server with statics and animated graphics. (png sequences) Position/Alpha/Size of the PNGs are total configurable. Rundown with time calculations and program scheduling. The software acts like a video...
    Downloads: 0 This Week
    Last Update:
    See Project