Data processing for and with foundation models
Analyze computation-communication overlap in V3/R1
An end-to-end Data Scientist
Data annotator for machine learning
Self-learning data agent that grounds its answers in layers of content
Synthetic Data Generation for tabular, relational and time series data
Official DeiT repository
Your own personal AI assistant. Any OS. Any Platform.
Training data (data labeling, annotation, workflow) for all data types
Project aimed at extracting, exporting, and analyzing chat records
OCRmyPDF adds an OCR text layer to scanned PDF files
Machine learning in Python
Label Studio is a multi-type data labeling and annotation tool
Conditional GAN for generating synthetic tabular data
Free and source-available fair-code licensed workflow automation tool
LLM based data scientist, AI native data application
A framework for real-life data science
ExtractThinker is a Document Intelligence library for LLMs
Detecting silent model failure. NannyML estimates performance
The open-source tool for building high-quality datasets
AutoGluon: AutoML for Image, Text, and Tabular Data
A Model Context Protocol (MCP) server that enables AI assistants
airda(Air Data Agent
Data science spreadsheet with Python & SQL
Data science on data without acquiring a copy