Open Source Python Data Management Systems - Page 3

Sort By:

Python Data Management Systems

Data Management Python Clear Filters

Browse free open source Python Data Management Systems and projects below. Use the toggles on the left to filter open source Python Data Management Systems by OS, license, language, programming language, and project status.

Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
Simple, Secure Domain Registration
Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.

Sign up for free
1

Mantid

Mantid Project (www.mantidproject.org)

Downloads: 27 This Week

Last Update: 2025-10-28
See Project
2

AutoGluon

AutoGluon: AutoML for Image, Text, and Tabular Data

AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code. Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge. Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing. Easily improve/tune your bespoke models and data pipelines, or customize AutoGluon for your use-case. AutoGluon is modularized into sub-modules specialized for tabular, text, or image data. You can reduce the number of dependencies required by solely installing a specific sub-module via: python3 -m pip install <submodule>.

Downloads: 1 This Week

Last Update: 2025-07-29
See Project
3

Bayesian Optimization

Python implementation of global optimization with gaussian processes

This is a constrained global optimization package built upon bayesian inference and gaussian process, that attempts to find the maximum value of an unknown function in as few iterations as possible. This technique is particularly suited for optimization of high cost functions, situations where the balance between exploration and exploitation is important. More detailed information, other advanced features, and tips on usage/implementation can be found in the examples folder. Follow the basic tour notebook to learn how to use the package's most important features. Take a look at the advanced tour notebook to learn how to make the package more flexible, how to deal with categorical parameters, how to use observers, and more. Explore the options exemplifying the balance between exploration and exploitation and how to control it. Explore the domain reduction notebook to learn more about how search can be sped up by dynamically changing parameters' bounds.

Downloads: 1 This Week

Last Update: 2025-07-24
See Project
4

BertViz

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

BertViz is an interactive tool for visualizing attention in Transformer language models such as BERT, GPT2, or T5. It can be run inside a Jupyter or Colab notebook through a simple Python API that supports most Huggingface models. BertViz extends the Tensor2Tensor visualization tool by Llion Jones, providing multiple views that each offer a unique lens into the attention mechanism. The head view visualizes attention for one or more attention heads in the same layer. It is based on the excellent Tensor2Tensor visualization tool. The model view shows a bird's-eye view of attention across all layers and heads. The neuron view visualizes individual neurons in the query and key vectors and shows how they are used to compute attention.

Downloads: 1 This Week

Last Update: 2025-06-01
See Project
The All-in-One Commerce Platform for Businesses - Shopify
Shopify offers plans for anyone that wants to sell products online and build an ecommerce store, small to mid-sized businesses as well as enterprise

Shopify is a leading all-in-one commerce platform that enables businesses to start, build, and grow their online and physical stores. It offers tools to create customized websites, manage inventory, process payments, and sell across multiple channels including online, in-person, wholesale, and global markets. The platform includes integrated marketing tools, analytics, and customer engagement features to help merchants reach and retain customers. Shopify supports thousands of third-party apps and offers developer-friendly APIs for custom solutions. With world-class checkout technology, Shopify powers over 150 million high-intent shoppers worldwide. Its reliable, scalable infrastructure ensures fast performance and seamless operations at any business size.

Learn More
5

Datasette

An open source multi-tool for exploring and publishing data

Datasette is a tool for exploring and publishing data. It helps people take data of any shape or size, analyze and explore it, and publish it as an interactive website and accompanying API. Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. It is part of a wider ecosystem of tools and plugins dedicated to making working with structured data as productive as possible. Try a demo and explore 33,000 power plants around the world, then take a look at some other examples of Datasette in action. Then read how to get started with Datasette, subscribe to the monthly-ish newsletter and consider signing up for office hours for an in-person conversation about the project.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
6

Diffgram

Training data (data labeling, annotation, workflow) for all data types

From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.

Downloads: 1 This Week

Last Update: 2024-10-14
See Project
7

Elementary

Open-source data observability for analytics engineers

Elementary is an open-source data observability solution for data & analytics engineers. Monitor your dbt project and data in minutes, and be the first to know of data issues. Gain immediate visibility, detect data issues, send actionable alerts, and understand the impact and root cause. Generate a data observability report, host it or share with your team. Monitoring of data quality metrics, freshness, volume and schema changes, including anomaly detection. Elementary data monitors are configured and executed like native tests in dbt your project. Uploading and modeling of dbt artifacts, run and test results to tables as part of your runs. Get informative notifications on data issues, schema changes, models and tests failures. Inspect upstream and downstream dependencies to understand impact and root cause of data issues.

Downloads: 1 This Week

Last Update: 2025-10-09
See Project
8

GemGIS

Spatial data processing for geomodeling

GemGIS is a Python-based, open-source geographic information processing library. It is capable of preprocessing spatial data such as vector data (shape files, geojson files, geopackages,…), raster data (tif, png,…), data obtained from online services (WCS, WMS, WFS) or XML/KML files (soon). Preprocessed data can be stored in a dedicated Data Class to be passed to the geomodeling package GemPy in order to accelerate the model-building process. Postprocessing of model results will allow export from GemPy to geoinformation systems such as QGIS and ArcGIS or to Google Earth for further use.

Downloads: 1 This Week

Last Update: 2023-12-22
See Project
9

Great Expectations

Always know what to expect from your data

Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams. Expectations are assertions for data. They are the workhorse abstraction in Great Expectations, covering all kinds of common data issues. Expectations are a great start, but it takes more to get to production-ready data validation. Where are Expectations stored? How do they get updated? How do you securely connect to production data systems? How do you notify team members and triage when data validation fails? Great Expectations supports all of these use cases out of the box. Instead of building these components for yourself over weeks or months, you will be able to add production-ready validation to your pipeline in a day.

Downloads: 1 This Week

Last Update: 3 days ago
See Project
Photo and Video Editing APIs and SDKs
Trusted by 150 million+ creators and businesses globally

Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.

Learn More
10

JILL.py

A cross-platform installer for the Julia programming language

The enhanced Python fork of JILL, Julia Installer for Linux (and every other platform), Light.

Downloads: 1 This Week

Last Update: 2025-06-07
See Project
11

Mage.ai

Build, run, and manage data pipelines for integrating data

Open-source data pipeline tool for transforming and integrating data. The modern replacement for Airflow. Effortlessly integrate and synchronize data from 3rd party sources. Build real-time and batch pipelines to transform data using Python, SQL, and R. Run, monitor, and orchestrate thousands of pipelines without losing sleep. Have you met anyone who said they loved developing in Airflow? That’s why we designed an easy developer experience that you’ll enjoy. Each step in your pipeline is a standalone file containing modular code that’s reusable and testable with data validations. No more DAGs with spaghetti code. Start developing locally with a single command or launch a dev environment in your cloud using Terraform. Write code in Python, SQL, or R in the same data pipeline for ultimate flexibility.

Downloads: 1 This Week

Last Update: 2025-09-18
See Project
12

Mercury

Convert Python notebook to web app and share with non-technical users

Turn Python notebooks to web applications with open-source Mercury framework. Hide code and add interactive widgets. Non-technical users can tweak widgets and execute notebook with new parameters. The core of Mercury is Open Source under AGPLv3. We provide Mercury Pro with additional features, dedicated support and friendly commercial license. Mercury is a perfect tool to convert Python notebook to interactive web application and share with non-programmers. You define interactive widgets for your notebook with the YAML header. Your users can change the widgets values, execute the notebook and save result (as PDF or html file). You can hide your code to not scare your (non-coding) collaborators. Easily deploy to any server. Mercury is dual-licensed. Looking for dedicated support, a commercial-friendly license, and more features? The Mercury Pro is for you.

Downloads: 1 This Week

Last Update: 2024-05-10
See Project
13

NannyML

Detecting silent model failure. NannyML estimates performance

NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, and interactive visualizations, is completely model-agnostic, and currently supports all tabular classification use cases. NannyML closes the loop with performance monitoring and post deployment data science, empowering data scientist to quickly understand and automatically detect silent model failure. By using NannyML, data scientists can finally maintain complete visibility and trust in their deployed machine learning models. When the actual outcome of your deployed prediction models is delayed, or even when post-deployment target labels are completely absent, you can use NannyML's CBPE-algorithm to estimate model performance.

Downloads: 1 This Week

Last Update: 2025-07-12
See Project
14

PipeRider

Code review for data in dbt

PipeRider automatically compares your data to highlight the difference in impacted downstream dbt models so you can merge your Pull Requests with confidence. PipeRider can profile your dbt models and obtain information such as basic data composition, quantiles, histograms, text length, top categories, and more. PipeRider can integrate with dbt metrics and present the time-series data of metrics in the report. PipeRider generates a static HTML report each time it runs, which can be viewed locally or shared. You can compare two previously generated reports or use a single command to compare the differences between the current branch and the main branch. The latter is designed specifically for code review scenarios. In our pull requests on GitHub, we not only want to know which files have been changed, but also the impact of these changes on the data. PipeRider can easily generate comparison reports with a single command to provide this information.

Downloads: 1 This Week

Last Update: 2023-11-22
See Project
15

Recommenders

Best practices on recommendation systems

The Recommenders repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The module reco_utils contains functions to simplify common tasks used when developing and evaluating recommender systems. Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks. Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.

Downloads: 1 This Week

Last Update: 2024-12-23
See Project
16

Sweetviz

Visualize and compare datasets, target values and associations

Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application. The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Shows how a target value (e.g. "Survived" in the Titanic dataset) relates to other features. Sweetviz integrates associations for numerical (Pearson's correlation), categorical (uncertainty coefficient) and categorical-numerical (correlation ratio) datatypes seamlessly, to provide maximum information for all data types. Automatically detects numerical, categorical and text features, with optional manual overrides. min/max/range, quartiles, mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness.

Downloads: 1 This Week

Last Update: 2023-11-29
See Project
17

data-diff

Efficiently diff rows across two different databases

We're excited to announce the launch of a new open-source product, data-diff that makes comparing datasets across databases fast at any scale. data-diff automates data quality checks for data replication and migration. In modern data platforms, data is constantly moving between systems, and at the modern data volume and complexity, systems go out of sync all the time. Until now, there has not been any tooling to ensure that when the data is correctly copied. Replicating data at scale, across hundreds of tables, with low latency and at a reasonable infrastructure cost is a hard problem, and most data teams we’ve talked to, have faced data quality issues in their replication processes. The hard truth is that the quality of the replication is the quality of the data. Since copying entire datasets in batch is often infeasible at the modern data scale, businesses rely on the Change Data Capture (CDC) approach of replicating data using a continuous stream of updates.

Downloads: 1 This Week

Last Update: 2024-02-20
See Project
18

gusty

Making DAG construction easier

gusty allows you to control your Airflow DAGs, Task Groups, and Tasks with greater ease. gusty manages collections of tasks, represented as any number of YAML, Python, SQL, Jupyter Notebook, or R Markdown files. A directory of task files is instantly rendered into a DAG by passing a file path to gusty's create_dag function. gusty also manages dependencies (within one DAG) and external dependencies (dependencies on tasks in other DAGs) for each task file you define. All you have to do is provide a list of dependencies or external_dependencies inside of a task file, and gusty will automatically set each task's dependencies and create external task sensors for any external dependencies listed. gusty works with both Airflow 1.x and Airflow 2.x, and has even more features, all of which aim to make the creation, management, and iteration of DAGs more fluid, so that you can intuitively design your DAG and build your tasks.

Downloads: 1 This Week

Last Update: 2025-05-14
See Project
19

miepython

Mie scattering of light by perfect spheres

miepython is a pure Python module to calculate light scattering for non-absorbing, partially-absorbing, or perfectly-conducting spheres. Mie theory is used, following the procedure described by Wiscombe. This code has been validated against his results. This code provides functions for calculating the extinction efficiency, scattering efficiency, backscattering, and scattering asymmetry. Moreover, a set of angles can be given to calculate the scattering for a sphere at each of those angles.

Downloads: 1 This Week

Last Update: 2025-05-25
See Project
20

missingno

Missing data visualization module for Python

Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset. Just pip install missingno to get started. This quickstart uses a sample of the NYPD Motor Vehicle Collisions Dataset dataset. The msno.matrix nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion. At a glance, date, time, the distribution of injuries, and the contribution factor of the first vehicle appear to be completely populated, while geographic information seems mostly complete, but spottier. The sparkline at right summarizes the general shape of the data completeness and points out the rows with the maximum and minimum nullity in the dataset. This visualization will comfortably accommodate up to 50 labelled variables.

Downloads: 1 This Week

Last Update: 2023-02-26
See Project
21

PanelCheck

PanelCheck is an easy-to-use software tool for visualization of sensory profiling data using different types of plots. The joint information from the implemented plots provide detailed insight into assessor and panel performance.

">

Downloads: 26 This Week

Last Update: 2016-04-02
See Project
22

pyNastran

Make sure to download from the link below and not the big giant button. I'm not sure how to fix that, so if you know!

">

Downloads: 25 This Week

Last Update: 2025-01-20
See Project
23

PyDSTool

This is a sophisticated & integrated simulation and analysis environment for dynamical systems models of physical systems (ODEs, DAEs, maps, and hybrid systems). It supports symbolic math, optimization, continuation, data analysis, biological apps...

4 Reviews

Downloads: 6 This Week

Last Update: 2015-07-07
See Project
24

XCSoar

... the open-source glide computer

XCSoar is a tactical glide computer for Android, Linux, macOS, and Windows.

4 Reviews

Downloads: 6 This Week

Last Update: 2025-09-26
See Project
25

relax

Molecular dynamics by NMR data analysis

The software package 'relax' is designed for the study of molecular dynamics through the analysis of experimental NMR data. Organic molecules, proteins, RNA, DNA, sugars, and other biomolecules are all supported. It supports exponential curve fitting for the calculation of the R1 and R2 relaxation rates, calculation of the NOE, reduced spectral density mapping, the Lipari and Szabo model-free analysis, study of domain motions via the N-state model and frame order dynamics theories using anisotropic NMR parameters such as RDCs and PCSs, the investigation of stereochemistry in dynamic ensembles, and the analysis of relaxation dispersion data.

">

Downloads: 20 This Week

Last Update: 2025-10-07
See Project