Twinify

twinify is a software package for the privacy-preserving generation of a synthetic twin to a given sensitive tabular data set. On a high level, twinify follows the differentially private data-sharing process introduced by Jälkö et al.. Depending on the nature of your data, twinify implements either the NAPSU-MQ approach described by Räisä et al. or finds an approximate parameter posterior for any probabilistic model you formulated using differentially private variational inference (DPVI). For the latter, twinify also offers automatic modeling for easy building of models fitting the data. If you have existing experience with NumPyro you can also implement your own model directly. Often data that would be very useful for the scientific community is subject to privacy regulations and concerns and cannot be shared. Differentially private data sharing allows generating of synthetic data that is statistically similar to the original data.

Features

NAPSU-MQ learns a maximum entropy distribution that best reproduces a user-chosen set of marginal queries on the data
DPVI is capable of learning any probabilistic model you specify, for categorical, continuous or mixed data
The main thing you need to do next for either method is to define the probabilistic model to be learned
For NAPSU-MQ this means that you must specify the the marginal queries to preserve
twinifys automatic modelling feature for DPVI builds a mixture model for user specified feature distributions
You can provide a Python file containing NumPyro code to twinify

Project Samples

Project Activity

See All Activity >

Follow Twinify

Twinify Web Site

User Reviews

Be the first to post a review of Twinify!

Additional Project Details

Programming Language

Python

Related Categories

Python Synthetic Data Generation Software

Registered

2023-05-22

Similar Business Software

Windocks

Windocks is a leader in cloud native database DevOps, recognized by Gartner as a Cool Vendor, and as an innovator by Bloor research in Test Data Management. Novartis, DriveTime, American Family Insurance, and other enterprises rely on Windocks for on-demand database environments for development,...

See Software
CloudTDMS

CloudTDMS solution is a No-Code platform having all necessary functionalities required for Realistic Data Generation. CloudTDMS, your one stop for Test Data Management. Discover & Profile your Data, Define & Generate Test Data for all your team members : Architects, Developers, Testers,...

See Software
Statice

We offer data anonymization software that generates entirely anonymous synthetic datasets for our customers. The synthetic data generated by Statice contains statistical properties similar to real data but irreversibly breaks any relationships with actual individuals, making it a valuable...

See Software
Datomize

Our AI-powered data generation platform enables data analysts and machine learning engineers to maximize the value of their analytical data sets. By leveraging the behavior extracted from existing data, Datomize enables users to generate the exact analytical data sets needed. Equipped with data...

See Software
Protecto

While enterprise data is exploding and scattered across various systems, oversight of driving privacy, data security, and governance has become very challenging. As a result, businesses hold significant risks in the form of data breaches, privacy lawsuits, and penalties. Finding data privacy...

See Software
SKY ENGINE AI

SKY ENGINE AI is a fully managed 3D Generative AI platform that transforms how enterprises build Vision AI by producing high-quality synthetic data at scale. It replaces difficult, expensive real-world data collection with physics-accurate simulation, multispectrum rendering, and automated...

See Software