Insecurity and Python pickles
Insecurity and Python pickles
Posted Mar 15, 2024 8:43 UTC (Fri) by aragilar (subscriber, #122569)In reply to: Insecurity and Python pickles by Wol
Parent article: Insecurity and Python pickles
I think we're using the same words to mean different things. The data I've worked with has come in two different forms:
* arrays of records (and collections of these arrays): generally having a db makes it easier and faster to do more complex queries over these vs multiple files (or a single file with multiple arrays), and formats designed for efficient use of "tabular" data (e.g. parquet) are better than random CSV/TSV.
* n-dimensional arrays: this represent images/cubes/higher moments of physical data (vs metadata), and so are different in kind to the arrays of records. This is is where HDF5, netCDF, FITS (if you're doing observational astronomy) come in.
* arrays of records (and collections of these arrays): generally having a db makes it easier and faster to do more complex queries over these vs multiple files (or a single file with multiple arrays), and formats designed for efficient use of "tabular" data (e.g. parquet) are better than random CSV/TSV.
* n-dimensional arrays: this represent images/cubes/higher moments of physical data (vs metadata), and so are different in kind to the arrays of records. This is is where HDF5, netCDF, FITS (if you're doing observational astronomy) come in.
I think the data you're talking about is more graph-like right (and feels like the kind of thing where you want to talk about the structure of how data is related)? That feels different in kind to both the above, and so naturally tools designed for other types of data don't match?
My understand of ML/AI is generally they're pushed into one of the two bins above, but that may be a bias based on the data I encounter.