Address
:
[go:
up one dir
,
main page
]
Include Form
Remove Scripts
Accept Cookies
Show Images
Show Referer
Rotate13
Base64
Strip Meta
Strip Title
Session Cookies
Reference Doc.
Product
Features
Connectivity
Data Wrangling
Machine Learning
Data Mining
Data Visualization
Data Workflow
Realtime Scoring
Code or Click
Collaboration
Deployment
Enterprise Readiness
Plugins
Samples
Technology
Editions
Solutions
Use cases
Industries
Departments
Customers
Learn
Learn Dataiku DSS
All How-To's
Reference Doc.
Q & A
What's new
Support
Resources
White Papers
Reference Doc.
Webinars
Success Stories
Company
Our Story
Team
Careers
News
Events
Customers
Partners
Blog
Contact us
Get Started
Installing DSS
Requirements
Installing a new DSS instance
Upgrading a DSS instance
Other installation options
Install on macOS
Install on AWS
Install on Azure
Install a virtual machine
Setting up Hadoop and Spark integration
Setting up R integration
Customizing DSS installation
Installing database drivers
Java runtime environment
The Python environment
Installing a DSS plugin
Configuring LDAP authentication
Working with proxies
Migration operations
DSS concepts
Connecting to data
Supported connections
SQL databases
MySQL
PostgreSQL
HP Vertica
Microsoft SQL Server
IBM Netezza
SAP HANA
Google Bigquery
Cassandra
Amazon S3
Google Cloud Storage
Azure Blob Storage
ElasticSearch
FTP
SSH / SCP / SFTP (cached)
HTTP (cached)
FTP (cached)
“Files in folder” dataset
Dataset plugins
Data connectivity macros
Making relocatable managed datasets
Exploring your data
Sampling
Analyze
Schemas, storage types and meanings
Definitions
Basic usage
Schema for data preparation
Creating schemas of datasets
Handling of schemas by recipes
List of recognized meanings
User-defined meanings
Data preparation
Processors reference
Extract from array
Fold an array
Sort array
Concatenate JSON arrays
Discretize (bin) numerical values
Change coordinates system
Copy column
Rename columns
Concatenate columns
Delete/Keep columns by name
Count occurrences
Convert currencies
Extract date elements
Compute difference between dates
Format date with custom format
Parse to standard date format
Split e-mail addresses
Enrich from French department
Enrich from French postcode
Extract ngrams
Extract numbers
Fill empty cells with fixed value
Filter rows/cells on date range
Filter rows/cells with formula
Filter invalid rows/cells
Filter rows/cells on numerical range
Filter rows/cells on value
Find and replace
Flag rows/cells on date range
Flag rows with formula
Flag invalid rows
Flag rows on numerical range
Flag rows on value
Fold multiple columns
Fold multiple columns by pattern
Fold object keys
Formula
Fuzzy join with other dataset (memory-based)
Compute distance between geopoints
Extract from geo column
Geo-join
Resolve GeoIP
Create GeoPoint from lat/lon
Extract lat/lon from GeoPoint
Flag holidays
Split invalid cells into another column
Join with other dataset (memory-based)
Extract with JSONPath
Group long-tail values
Translate values using meaning
Normalize measure
Negate boolean value
Force numerical range
Generate numerical combinations
Convert number formats
Nest columns
Unnest object (flatten JSON)
Extract with regular expression
Pivot
Python function
Split HTTP Query String
Remove rows where cell is empty
Round numbers
Simplify text
Split and fold
Split and unfold
Split column
Transform string
Tokenize text
Transpose rows to columns
Triggered unfold
Unfold
Convert a UNIX timestamp to a date
Fill empty cells with previous/next value
Split URL (into protocol, host, port, ...)
Classify User-Agent
Generate a best-effort visitor id
Zip JSON arrays
Filtering and flagging rows
Managing dates
Reshaping
Geographic processing
Sampling
Execution engines
Data Visualization
Sampling and charts engines
Standard chart types
Geographic charts (Beta)
Machine learning
Prediction (Supervised ML)
Clustering (Unsupervised ML)
Features handling
The machine learning engines
Scikit-learn / XGBoost engine
MLLib (Spark) engine
H2O (Sparkling Water) engine
Vertica
The Flow
Visual recipes
Sync: Copying datasets
Grouping: Aggregating data
Join: joining datasets
Splitting datasets
Stacking datasets
Sampling datasets
Recipes based on code
The common editor layout
Python recipes
R recipes
SQL recipes
Hive recipes
Pig recipes
Impala recipes
Spark-Scala recipes
PySpark recipes
Spark / R recipes
SparkSQL recipes
Shell recipes
Variables expansion in code recipes
Code notebooks
SQL notebook
Python notebooks
Predefined notebooks
Webapps
Dashboards
Dashboard concepts
Display settings
Insights reference
Chart
Dataset table
Model report
Managed folder
Jupyter Notebook
Webapp
Metric
Scenarios
Working with partitions
Partitioning files-based datasets
Partitioned SQL datasets
Specifying partition dependencies
Partition identifiers
Recipes for partitioned datasets
Partitioned Hive recipes
Partitioned SQL recipes
Partitioning variables substitutions
DSS and Hadoop
Setting up Hadoop integration
Connecting to secure clusters
Setup a new HDFS connection
DSS and Hive
DSS and Impala
Hadoop multi-user security
Distribution-specific notes
Cloudera CDH
Hortonworks HDP
MapR
Amazon Elastic MapReduce
Azure HDInsights
Using multiple Hadoop filesystems
DSS and Spark
Usage of Spark in DSS
Setting up Spark integration
Spark configurations
Usage notes per dataset type
Spark pipelines
Limitations and attention points
Collaboration
Version control
Plugins
Installing plugins
Installing plugins offline
Writing your own plugin
Plugin author reference guide
Plugins and components
Parameters
Writing recipes
Writing DSS macros
Other topics
Real-time predictions
Introduction
Concepts
Installing the API node
Exposing a prediction model
Custom prediction models
API node user API
Using the apinode-admin tool
API node administration API
High availability and scalability
Enriching queries in real-time
Managing versions of your endpoint
Logging and auditing
Health monitoring
Automation node and bundles
Installing the Automation node
Creating a bundle
Importing a bundle
Automation scenarios, metrics, and checks
Definitions
Scenario steps
Launching a scenario
Reporting on scenario runs
Custom scenarios
Variables in scenarios
Metrics
Checks
Custom probes and checks
Advanced topics
Sampling methods
Using managed folders
Formula language
Custom variables expansion
File formats
Delimiter-separated values (DSV)
Fixed width
Parquet
Avro
Hive SequenceFile
Hive RCFile
Hive ORCFile
XML
JSON
Excel
ESRI Shapefiles
DSS APIs
The DSS public API
Features
Public API Keys
Public API Python client
The REST API
The internal Python API
Interacting with datasets
Advanced Python API usage
Performing SQL, Hive and Impala queries
Executing partial recipes
Interacting with Pyspark
Managed folders in Python API
Interacting with saved models
Interacting with metrics
API for custom recipes
API for custom datasets
API for custom formats
Custom scenarios API
The Javascript API
The R API
The Scala API
Security
Main permissions
Connections security
User profiles
Exposed objects
Dashboard authorizations
Multi-user security
Comparing security modes
Concepts
Prerequisites and limitations
Setup
Operations
Interaction with Hive and Impala
Interaction with Spark
Advanced topics
Audit Trail
Advanced security options
Single Sign-On
Operating DSS
DSS Macros
Logging in DSS
Tips and troubleshooting
Forgot your password ?
Finding logs
Troubleshooting websockets
Known issues
Release notes
DSS 4.0 Release notes
DSS 3.1 Release notes
DSS 3.0 Relase notes
DSS 2.3 Relase notes
DSS 2.2 Relase notes
DSS 2.1 Relase notes
DSS 2.0 Relase notes
DSS 1.4 Relase notes
DSS 1.3 Relase notes
DSS 1.2 Relase notes
DSS 1.1 Release notes
DSS 1.0 Release Notes
Pre versions
Other Documentation
Third-party acknowledgements
Dataiku DSS
Docs
»
Other Documentation
Other Documentation
¶
Older DSS versions
¶
DSS 3.1
DSS 3.0
DSS 2.3
DSS 2.2
DSS 2.1
DSS 2.0
DSS 1.4
Other Dataiku products
¶
WT1 Web Tracker