UForm

UForm is a Multi-Modal Modal Inference package, designed to encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space! It comes with a set of homonymous pre-trained networks available on HuggingFace portal and extends the transfromers package to support Mid-fusion Models. Late-fusion models encode each modality independently, but into one shared vector space. Due to independent encoding late-fusion models are good at capturing coarse-grained features but often neglect fine-grained ones. This type of models is well-suited for retrieval in large collections. The most famous example of such models is CLIP by OpenAI. Early-fusion models encode both modalities jointly so they can take into account fine-grained features. Usually, these models are used for re-ranking relatively small retrieval results. Mid-fusion models are the golden midpoint between the previous two types. Mid-fusion models consist of two parts – unimodal and multimodal.

Features

Early-fusion models encode both modalities jointly
Late-fusion models encode each modality independently
Mid-fusion models are the golden midpoint between the previous two types
Mid-fusion models consist of two parts – unimodal and multimodal
The unimodal part allows encoding each modality separately as late-fusion models do
Encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow UForm

UForm Web Site

User Reviews

Be the first to post a review of UForm!

Additional Project Details

Programming Language

Python

Related Categories

Python Vector Search Engines, Python LLM Inference Tool

Registered

2023-04-10

Similar Business Software

txtai

txtai is an all-in-one open source embeddings database designed for semantic search, large language model orchestration, and language model workflows. It unifies vector indexes (both sparse and dense), graph networks, and relational databases, providing a robust foundation for vector search and...

See Software
Hugging Face

Hugging Face is a leading platform for AI and machine learning, offering a vast hub for models, datasets, and tools for natural language processing (NLP) and beyond. The platform supports a wide range of applications, from text, image, and audio to 3D data analysis. Hugging Face fosters...

See Software
LTX

Control every aspect of your video using AI, from ideation to final edits, on one holistic platform. We’re pioneering the integration of AI and video production, enabling the transformation of a single idea into a cohesive, AI-generated video. LTX empowers individuals to share their visions,...

See Software
Fireworks AI

Fireworks partners with the world's leading generative AI researchers to serve the best models, at the fastest speeds. Independently benchmarked to have the top speed of all inference providers. Use powerful models curated by Fireworks or our in-house trained multi-modal and function-calling...

See Software
HumanSignal

HumanSignal's Label Studio Enterprise is a comprehensive platform designed for creating high-quality labeled data and evaluating model outputs with human supervision. It supports labeling and evaluating multi-modal data, image, video, audio, text, and time series, all in one place. It offers...

See Software
Towhee

You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide...

See Software

Report inappropriate content

UForm

Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion

Get an email when there's a new version of UForm

Features

Project Samples

Project Activity

Categories

License

Follow UForm

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered