Data preparation¶
One of the first things you’ll generally want to do with datasets is prepare them.
The visual data preparation of DSS lets you create data cleansing, normalization and enrichment scripts in a visual and interactive way
Note
For a step by step introduction to the data preparation component of Data Science Studio, we recommend that you follow our Tutorial 101. This section will focus on advanced and reference topics related to the data preparation component.
We also have a Data preparation quick start portal that will give you an overview of all our learning content related to visual data preparation.
- Processors reference
- Extract from array
- Fold an array
- Sort array
- Concatenate JSON arrays
- Discretize (bin) numerical values
- Change coordinates system
- Copy column
- Rename columns
- Concatenate columns
- Delete/Keep columns by name
- Count occurrences
- Convert currencies
- Extract date elements
- Compute difference between dates
- Format date with custom format
- Parse to standard date format
- Split e-mail addresses
- Enrich from French department
- Enrich from French postcode
- Extract ngrams
- Extract numbers
- Fill empty cells with fixed value
- Filter rows/cells on date range
- Filter rows/cells with formula
- Filter invalid rows/cells
- Filter rows/cells on numerical range
- Filter rows/cells on value
- Find and replace
- Flag rows/cells on date range
- Flag rows with formula
- Flag invalid rows
- Flag rows on numerical range
- Flag rows on value
- Fold multiple columns
- Fold multiple columns by pattern
- Fold object keys
- Formula
- Fuzzy join with other dataset (memory-based)
- Generate Big Data
- Compute distance between geopoints
- Extract from geo column
- Geo-join
- Resolve GeoIP
- Create GeoPoint from lat/lon
- Extract lat/lon from GeoPoint
- Flag holidays
- Split invalid cells into another column
- Join with other dataset (memory-based)
- Extract with JSONPath
- Group long-tail values
- Translate values using meaning
- Normalize measure
- Negate boolean value
- Force numerical range
- Generate numerical combinations
- Convert number formats
- Nest columns
- Unnest object (flatten JSON)
- Extract with regular expression
- Pivot
- Python function
- Split HTTP Query String
- Remove rows where cell is empty
- Round numbers
- Simplify text
- Split and fold
- Split and unfold
- Split column
- Transform string
- Tokenize text
- Transpose rows to columns
- Triggered unfold
- Unfold
- Unfold an array
- Convert a UNIX timestamp to a date
- Fill empty cells with previous/next value
- Split URL (into protocol, host, port, …)
- Classify User-Agent
- Generate a best-effort visitor id
- Zip JSON arrays
- Filtering and flagging rows
- Managing dates
- Reshaping
- Geographic processing
- Sampling
- Execution engines