twinify is a software package for the privacy-preserving generation of a synthetic twin to a given sensitive tabular data set. On a high level, twinify follows the differentially private data-sharing process introduced by Jälkö et al.. Depending on the nature of your data, twinify implements either the NAPSU-MQ approach described by Räisä et al. or finds an approximate parameter posterior for any probabilistic model you formulated using differentially private variational inference (DPVI). For the latter, twinify also offers automatic modeling for easy building of models fitting the data. If you have existing experience with NumPyro you can also implement your own model directly. Often data that would be very useful for the scientific community is subject to privacy regulations and concerns and cannot be shared. Differentially private data sharing allows generating of synthetic data that is statistically similar to the original data.
Features
- NAPSU-MQ learns a maximum entropy distribution that best reproduces a user-chosen set of marginal queries on the data
- DPVI is capable of learning any probabilistic model you specify, for categorical, continuous or mixed data
- The main thing you need to do next for either method is to define the probabilistic model to be learned
- For NAPSU-MQ this means that you must specify the the marginal queries to preserve
- twinifys automatic modelling feature for DPVI builds a mixture model for user specified feature distributions
- You can provide a Python file containing NumPyro code to twinify