Address
:
[go:
up one dir
,
main page
]
Include Form
Remove Scripts
Accept Cookies
Show Images
Show Referer
Rotate13
Base64
Strip Meta
Strip Title
Session Cookies
Reference Doc.
Product
Features
Connectivity
Data Exploration
Data Preparation
Machine Learning
Model Deployment
Automation
Code
Collaboration
Governance & Security
Plugins
Samples
Technology
Editions
Solutions
Use cases
Industries
Departments
Customers
Learn
Learn Dataiku DSS
All How-To's
Reference Doc.
Q & A
What's new
Support
Resources
White Papers
Reference Doc.
Webinars
Success Stories
Company
Our Story
Team
Careers
News
Events
Customers
Partners
Blog
Contact us
Get Started
Installing DSS
Requirements
Installing a new DSS instance
Upgrading a DSS instance
Updating a DSS license
Other installation options
Install on macOS
Install on AWS
Install on Azure
Install a virtual machine
Running DSS as a Docker container
Setting up Hadoop and Spark integration
R integration
Customizing DSS installation
Installing database drivers
Java runtime environment
Python integration
Installing a DSS plugin
Configuring LDAP authentication
Working with proxies
Migration operations
DSS concepts
Connecting to data
Supported connections
Upload your files
Server filesystem
HDFS
Amazon S3
Google Cloud Storage
Azure Blob Storage
FTP
SCP / SFTP (aka SSH)
HTTP
SQL databases
MySQL
PostgreSQL
Vertica
Amazon Redshift
Pivotal Greenplum
Teradata
Oracle
Microsoft SQL Server
SAP HANA
IBM Netezza
Google Bigquery
IBM DB2
Snowflake
Cassandra
ElasticSearch
Managed folders
“Files in folder” dataset
Metrics dataset
Internal stats dataset
HTTP (with cache)
Dataset plugins
Data connectivity macros
Making relocatable managed datasets
Data ordering
Exploring your data
Sampling
Analyze
Schemas, storage types and meanings
Definitions
Basic usage
Schema for data preparation
Creating schemas of datasets
Handling of schemas by recipes
List of recognized meanings
User-defined meanings
Data preparation
Processors reference
Extract from array
Fold an array
Sort array
Concatenate JSON arrays
Discretize (bin) numerical values
Change coordinates system
Copy column
Rename columns
Concatenate columns
Delete/Keep columns by name
Count occurrences
Convert currencies
Extract date elements
Compute difference between dates
Format date with custom format
Parse to standard date format
Split e-mail addresses
Enrich from French department
Enrich from French postcode
Extract ngrams
Extract numbers
Fill empty cells with fixed value
Filter rows/cells on date range
Filter rows/cells with formula
Filter invalid rows/cells
Filter rows/cells on numerical range
Filter rows/cells on value
Find and replace
Flag rows/cells on date range
Flag rows with formula
Flag invalid rows
Flag rows on numerical range
Flag rows on value
Fold multiple columns
Fold multiple columns by pattern
Fold object keys
Formula
Fuzzy join with other dataset (memory-based)
Generate Big Data
Compute distance between geopoints
Extract from geo column
Geo-join
Resolve GeoIP
Create GeoPoint from lat/lon
Extract lat/lon from GeoPoint
Flag holidays
Split invalid cells into another column
Join with other dataset (memory-based)
Extract with JSONPath
Group long-tail values
Translate values using meaning
Normalize measure
Move columns
Negate boolean value
Force numerical range
Generate numerical combinations
Convert number formats
Nest columns
Unnest object (flatten JSON)
Extract with regular expression
Pivot
Python function
Split HTTP Query String
Remove rows where cell is empty
Round numbers
Simplify text
Split and fold
Split and unfold
Split column
Transform string
Tokenize text
Transpose rows to columns
Triggered unfold
Unfold
Unfold an array
Convert a UNIX timestamp to a date
Fill empty cells with previous/next value
Split URL (into protocol, host, port, …)
Classify User-Agent
Generate a best-effort visitor id
Zip JSON arrays
Filtering and flagging rows
Managing dates
Reshaping
Geographic processing
Sampling
Execution engines
Charts
The Charts Interface
Sampling & Engine
Basic Charts
Tables
Scatter Charts
Map Charts
Other Charts
Common Chart Elements
Color palettes
Machine learning
Prediction (Supervised ML)
Prediction settings
Prediction Results
Clustering (Unsupervised ML)
Clustering settings
Clustering results
Automated machine learning
Features handling
Features roles and types
Categorical variables
Numerical variables
Text variables
Vector variables
Image variables
Custom Preprocessing
Algorithms reference
In-memory Python (Scikit-learn / XGBoost)
MLLib (Spark) engine
H2O (Sparkling Water) engine
Vertica
Advanced models optimization
Models ensembling
Deep Learning
Introduction
Your first deep learning model
Model architecture
Training
Multiple inputs
Using image features
Using text features
Runtime and GPU support
Advanced topics
Troubleshooting
Models lifecycle
Scoring engines
Writing custom models
Exporting models
The Flow
Visual Grammar
Rebuilding Datasets
Limiting Concurrent Executions
Visual recipes
Sync: copying datasets
Grouping: aggregating data
Window: analytics functions
Distinct: get unique rows
Join: joining datasets
Splitting datasets
Top N: retrieve first N rows
Stacking datasets
Sampling datasets
Sort: order values
Pivot recipe
Download recipe
Recipes based on code
The common editor layout
Python recipes
R recipes
SQL recipes
Hive recipes
Pig recipes
Impala recipes
Spark-Scala recipes
PySpark recipes
Spark / R recipes
SparkSQL recipes
Shell recipes
Variables expansion in code recipes
Code notebooks
SQL notebook
Python notebooks
Predefined notebooks
Webapps
“Standard” web apps
Shiny web apps
Bokeh web apps
Publishing webapps on the dashboard
Code reports
R Markdown reports
Dashboards
Dashboard concepts
Display settings
Exporting dashboards to PDF or images
Insights reference
Chart
Dataset table
Model report
Managed folder
Jupyter Notebook
Webapp
Metric
Scenarios
Wiki article
Working with partitions
Partitioning files-based datasets
Partitioned SQL datasets
Specifying partition dependencies
Partition identifiers
Recipes for partitioned datasets
Partitioned Hive recipes
Partitioned SQL recipes
Partitioning variables substitutions
DSS and Hadoop
Setting up Hadoop integration
Connecting to secure clusters
Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS)
DSS and Hive
DSS and Impala
Hive datasets
Multiple Hadoop clusters
Dynamic AWS EMR clusters
Hadoop multi-user security
Distribution-specific notes
Cloudera CDH
Hortonworks HDP
MapR
Amazon Elastic MapReduce
Microsoft Azure HDInsight
Google Cloud Dataproc
Teradata Connector For Hadoop
DSS and Spark
Usage of Spark in DSS
Setting up Spark integration
Spark configurations
Interacting with DSS datasets
Spark pipelines
Limitations and attention points
DSS and Python
Installing Python packages
Reusing Python code
Using Matplotlib
Using Bokeh
Using Plot.ly
Using Ggplot
DSS and R
Installing R packages
Reusing R code
Using ggplot2
Using Dygraphs
Using googleVis
Using ggvis
Code environments
Operations (Python)
Operations (R)
Base packages
Using Conda
Automation nodes
Non-managed code environments
Plugins’ code environments
Custom options and environment
Troubleshooting
Code env permissions
Running in containers
Concepts
Setting up
Using code envs with container execution
Running on Google Kubernetes Engine
Running on Azure Kubernetes Service
Running on Amazon Elastic Kubernetes Service
Customization of base images
Remote Docker daemons
Collaboration
Wikis
Discussions
Project folders
Version control
Markdown
Plugins
Installing plugins
Installing plugins offline
Writing your own plugin
Plugin author reference guide
Plugins and components
Parameters
Writing recipes
Writing DSS macros
Writing DSS Filesystem providers
Custom chart elements
Other topics
Automation scenarios, metrics, and checks
Definitions
Scenario steps
Launching a scenario
Reporting on scenario runs
Custom scenarios
Variables in scenarios
Step-based execution control
Metrics
Checks
Custom probes and checks
Automation node and bundles
Installing the Automation node
Creating a bundle
Importing a bundle
API Node & API Deployer: Real-time APIs
Introduction
Concepts
Installing an API node
Installing the API Deployer
First API (without API Deployer)
First API (with API Deployer)
Types of Endpoints
Exposing a visual prediction model
Exposing a Python prediction model
Exposing a R prediction model
Exposing a Python function
Exposing a R function
Exposing a SQL query
Exposing a lookup in a dataset
Enriching prediction queries
Security
Managing versions of your endpoint
Deploying on Kubernetes
Setting up
Deployment on Google Kubernetes Engine
Deployment on Minikube
Managing SQL connections
Custom base images
APINode APIs reference
API node user API
API node administration API
Endpoint APIs
Operations reference
Using the apinode-admin tool
High availability and scalability
Logging and auditing
Health monitoring
Advanced topics
Sampling methods
Formula language
Custom variables expansion
File formats
Delimiter-separated values (CSV / TSV)
Fixed width
Parquet
Avro
Hive SequenceFile
Hive RCFile
Hive ORCFile
XML
JSON
Excel
ESRI Shapefiles
DSS internal APIs
The internal Python API
API for interacting with datasets
API for interacting with Pyspark
API for managed folders
API for interacting with saved models
API for scenarios
API for performing SQL, Hive and Impala queries
API for performing SQL, Hive and Impala queries like the recipes
API for metrics and checks
API For creating static insights
API for plugin recipes
API for plugin datasets
API for plugin formats
API for plugin FS providers
The Javascript API
The R API
Authentication information
Creating static insights
The Scala API
Public API
Features
Public API Keys
Public API Python client
The main client class
Managing projects
Managing datasets
Managed folders
Managing recipes
Machine learning
Managing jobs
Managing scenarios
API Designer & Deployer
Managing meanings
Authentication information
Managing users and groups
Managing connections
Other administration tasks
Metrics and checks
SQL queries through DSS
Utilities
Reference API documentation
The REST API
Security
Main permissions
Connections security
User profiles
Exposed objects
Dashboard authorizations
User secrets
Multi-user security
Comparing security modes
Concepts
Prerequisites and limitations
Setup
Operations
Interaction with Hive and Impala
Interaction with Spark
Advanced topics
Audit Trail
Advanced security options
Single Sign-On
Multi-Factor Authentication
Passwords security
Operating DSS
dsscli tool
The data directory
Backing up
Logging in DSS
DSS Macros
Managing DSS disk usage
Understanding and tracking DSS processes
Tuning and controlling memory usage
Using cgroups for resource control
Monitoring DSS
Troubleshooting
Diagnosing and debugging issues
Obtaining support
Common issues
DSS does not start / Cannot connect
Cannot login to DSS
DSS crashes / The “Disconnected” overlay appears
Websockets problems
Cannot connect to a SQL database
A job fails
A scenario fails
A ML model training fails
“Your user profile does not allow” issues
Error codes
ERR_CODEENV_CONTAINER_IMAGE_FAILED: Could not build container image for this code environment
ERR_CODEENV_CONTAINER_IMAGE_TAG_NOT_FOUND: Container image tag not found for this Code environment
ERR_CODEENV_CREATION_FAILED: Could not create this code environment
ERR_CODEENV_DELETION_FAILED: Could not delete this code environment
ERR_CODEENV_EXISTING_ENV: Code environment already exists
ERR_CODEENV_INCORRECT_ENV_TYPE: Wrong type of Code environment
ERR_CODEENV_INVALID_CODE_ENV_ARCHIVE: Invalid code environment archive
ERR_CODEENV_JUPYTER_SUPPORT_INSTALL_FAILED: Could not install Jupyter support in this code environment
ERR_CODEENV_JUPYTER_SUPPORT_REMOVAL_FAILED: Could not remove Jupyter support from this code environment
ERR_CODEENV_MISSING_ENV: Code environment does not exists
ERR_CODEENV_MISSING_ENV_VERSION: Code environment version does not exists
ERR_CODEENV_NO_CREATION_PERMISSION: User not allowed to create Code environments
ERR_CODEENV_NO_USAGE_PERMISSION: User not allowed to use this Code environment
ERR_CODEENV_UNSUPPORTED_OPERATION_FOR_ENV_TYPE: Operation not supported for this type of Code environment
ERR_CODEENV_UPDATE_FAILED: Could not update this code environment
ERR_CONNECTION_ALATION_REGISTRATION_FAILED: Failed to register Alation integration
ERR_CONNECTION_API_BAD_CONFIG: Bad configuration for connection
ERR_CONNECTION_AZURE_INVALID_CONFIG: Invalid Azure connection configuration
ERR_CONNECTION_DUMP_FAILED: Failed to dump connection tables
ERR_CONNECTION_INVALID_CONFIG: Invalid connection configuration
ERR_CONNECTION_LIST_HIVE_FAILED: Failed to list indexable Hive connections
ERR_CONNECTION_S3_INVALID_CONFIG: Invalid S3 connection configuration
ERR_CONNECTION_SQL_INVALID_CONFIG: Invalid SQL connection configuration
ERR_CONNECTION_SSH_INVALID_CONFIG: Invalid SSH connection configuration
ERR_CONTAINER_CONF_NO_USAGE_PERMISSION: User not allowed to use this container execution configuration
ERR_CONTAINER_CONF_NOT_FOUND: The selected container configuration was not found
ERR_CONTAINER_IMAGE_PUSH_FAILED: Container image push failed
ERR_DATASET_ACTION_NOT_SUPPORTED: Action not supported for this kind of dataset
ERR_DATASET_CSV_UNTERMINATED_QUOTE: Error in CSV file: Unterminated quote
ERR_DATASET_HIVE_INCOMPATIBLE_SCHEMA: Dataset schema not compatible with Hive
ERR_DATASET_INVALID_CONFIG: Invalid dataset configuration
ERR_DATASET_INVALID_FORMAT_CONFIG: Invalid format configuration for this dataset
ERR_DATASET_INVALID_METRIC_IDENTIFIER: Invalid metric identifier
ERR_DATASET_INVALID_PARTITIONING_CONFIG: Invalid dataset partitioning configuration
ERR_DATASET_PARTITION_EMPTY: Input partition is empty
ERR_DATASET_TRUNCATED_COMPRESSED_DATA: Error in compressed file: Unexpected end of file
ERR_ENDPOINT_INVALID_CONFIG: Invalid configuration for API Endpoint
ERR_FOLDER_INVALID_PARTITIONING_CONFIG: Invalid folder partitioning configuration
ERR_FSPROVIDER_CANNOT_CREATE_FOLDER_ON_DIRECTORY_UNAWARE_FS: Cannot create a folder on this type of file system
ERR_FSPROVIDER_DEST_PATH_ALREADY_EXISTS: Destination path already exists
ERR_FSPROVIDER_FSLIKE_REACH_OUT_OF_ROOT: Illegal attempt to access data out of connection root path
ERR_FSPROVIDER_HTTP_CONNECTION_FAILED: HTTP connection failed
ERR_FSPROVIDER_HTTP_INVALID_URI: Invalid HTTP URI
ERR_FSPROVIDER_HTTP_REQUEST_FAILED: HTTP request failed
ERR_FSPROVIDER_ILLEGAL_PATH: Illegal path for that file system
ERR_FSPROVIDER_INVALID_CONFIG: Invalid configuration
ERR_FSPROVIDER_INVALID_FILE_NAME: Invalid file name
ERR_FSPROVIDER_LOCAL_LIST_FAILED: Could not list local directory
ERR_FSPROVIDER_PATH_DOES_NOT_EXIST: Path in dataset or folder does not exist
ERR_FSPROVIDER_ROOT_PATH_DOES_NOT_EXIST: Root path of the dataset or folder does not exist
ERR_FSPROVIDER_SSH_CONNECTION_FAILED: Failed to establish SSH connection
ERR_HIVE_HS2_CONNECTION_FAILED: Failed to establish HiveServer2 connection
ERR_HIVE_LEGACY_UNION_SUPPORT: Your current Hive version doesn’t support UNION clause but only supports UNION ALL, which does not remove duplicates
ERR_METRIC_DATASET_COMPUTATION_FAILED: Metrics computation completely failed
ERR_METRIC_ENGINE_RUN_FAILED: One of the metrics engine failed to run
ERR_MISC_ENOSPC: Out of disk space
ERR_MISC_EOPENF: Too many open files
ERR_NOT_USABLE_FOR_USER: You may not use this connection
ERR_OBJECT_OPERATION_NOT_AVAILABLE_FOR_TYPE: Operation not supported for this kind of object
ERR_PLUGIN_CANNOT_LOAD: Plugin cannot be loaded
ERR_PLUGIN_COMPONENT_NOT_INSTALLED: Plugin component not installed or removed
ERR_PLUGIN_DEV_INVALID_COMPONENT_PARAMETER: Invalid parameter for plugin component creation
ERR_PLUGIN_DEV_INVALID_DEFINITION: The descriptor of the plugin is invalid
ERR_PLUGIN_INVALID_DEFINITION: The plugin’s definition is invalid
ERR_PLUGIN_NOT_INSTALLED: Plugin not installed or removed
ERR_PLUGIN_WITHOUT_CODEENV: The plugin has no code env specification
ERR_PLUGIN_WRONG_TYPE: Unexpected type of plugin
ERR_PROJECT_INVALID_ARCHIVE: Invalid project archive
ERR_PROJECT_INVALID_PROJECT_KEY: Invalid project key
ERR_PROJECT_UNKNOWN_PROJECT_KEY: Unknown project key
ERR_RECIPE_CANNOT_CHANGE_ENGINE: Cannot change engine
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY: Cannot check schema consistency
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_EXPENSIVE: Cannot check schema consistency: expensive checks disabled
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_NEEDS_BUILD: Cannot compute output schema with an empty input dataset. Build the input dataset first.
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_ON_RECIPE_TYPE: Cannot check schema consistency on this kind of recipe
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_WITH_RECIPE_CONFIG: Cannot check schema consistency because of recipe configuration
ERR_RECIPE_CANNOT_CHANGE_ENGINE: Not compatible with Spark
ERR_RECIPE_CANNOT_USE_ENGINE: Cannot use the selected engine for this recipe
ERR_RECIPE_ENGINE_NOT_DWH: Error in recipe engine: SQLServer is not Data Warehouse edition
ERR_RECIPE_INCONSISTENT_I_O: Inconsistent recipe input or output
ERR_RECIPE_SYNC_AWS_DIFFERENT_REGIONS: Error in recipe engine: Redshift and S3 are in different AWS regions
ERR_RECIPE_PDEP_UPDATE_REQUIRED: Partition dependecy update required
ERR_RECIPE_SPLIT_INVALID_COMPUTED_COLUMNS: Invalid computed column
ERR_SCENARIO_INVALID_STEP_CONFIG: Invalid scenario step configuration
ERR_SECURITY_CRUD_INVALID_SETTINGS: The user attributes submitted for a change are invalid
ERR_SECURITY_GROUP_EXISTS: The new requested group already exists
ERR_SECURITY_INVALID_NEW_PASSWORD: The new password is invalid
ERR_SECURITY_INVALID_PASSWORD: The password hash from the database is invalid
ERR_SECURITY_MUS_USER_UNMATCHED: The DSS user is not configured to be matched onto a system user
ERR_SECURITY_PATH_ESCAPE: The requested file is not within any allowed directory
ERR_SECURITY_USER_EXISTS: The requested user for creation already exists
ERR_SECURITY_WRONG_PASSWORD: The old password provided for password change is invalid
ERR_SPARK_FAILED_DRIVER_OOM: Spark failure: out of memory in driver
ERR_SPARK_FAILED_TASK_OOM: Spark failure: out of memory in task
ERR_SPARK_FAILED_YARN_KILLED_MEMORY: Spark failure: killed by YARN (excessive memory usage)
ERR_SPARK_PYSPARK_CODE_FAILED_UNSPECIFIED: Pyspark code failed
ERR_SPARK_SQL_LEGACY_UNION_SUPPORT: Your current Spark version doesn’t support UNION clause but only supports UNION ALL, which does not remove duplicates
ERR_SQL_CANNOT_LOAD_DRIVER: Failed to load database driver
ERR_SQL_DB_UNREACHABLE: Failed to reach database
ERR_SQL_IMPALA_MEMORYLIMIT: Impala memory limit exceeded
ERR_SQL_POSTGRESQL_TOOMANYSESSIONS: too many sessions open concurrently
ERR_SQL_TABLE_NOT_FOUND: SQL Table not found
ERR_SQL_VERTICA_TOOMANYROS: Error in Vertica: too many ROS
ERR_SQL_VERTICA_TOOMANYSESSIONS: Error in Vertica: too many sessions open concurrently
ERR_TRANSACTION_FAILED_ENOSPC: Out of disk space
ERR_TRANSACTION_GIT_COMMMIT_FAILED: Failed committing changes
ERR_USER_ACTION_FORBIDDEN_BY_PROFILE: Your user profile does not allow you to perform this action
WARN_RECIPE_SPARK_INDIRECT_HDFS: No direct access to read/write HDFS dataset
WARN_RECIPE_SPARK_INDIRECT_S3: No direct access to read/write S3 dataset
Undocumented error
Known issues
Release notes
DSS 5.0 Release notes
DSS 4.3 Release notes
DSS 4.2 Release notes
DSS 4.1 Release notes
DSS 4.0 Release notes
DSS 3.1 Release notes
DSS 3.0 Relase notes
DSS 2.3 Relase notes
DSS 2.2 Relase notes
DSS 2.1 Relase notes
DSS 2.0 Relase notes
DSS 1.4 Relase notes
DSS 1.3 Relase notes
DSS 1.2 Relase notes
DSS 1.1 Release notes
DSS 1.0 Release Notes
Pre versions
Other Documentation
Third-party acknowledgements
Dataiku DSS
You are viewing the documentation for version
5.0
of DSS.
An up to date version might be available
for the latest version
Docs
»
Security
Security
¶
Main permissions
Per-project group permissions
Project owner
Global group permissions
Multiple group membership
Connections security
Securing access to connections
Reading details of a connection
Per-user credentials for connections
Personal connections
User profiles
Exposed objects
Exposing objects between projects
Permissions on exposed objects
Dashboard authorizations
Scope
Adding objects to dashboard authorizations
Details by object type
User secrets
Entering user secrets
Using user secrets
Multi-user security
Comparing security modes
Concepts
Prerequisites and limitations
Setup
Operations
Interaction with Hive and Impala
Interaction with Spark
Advanced topics
Audit Trail
Viewing the audit trail in DSS
Audit trail log files
Auditing to external systems
Advanced security options
Hiding error stacks
Hiding version info
Using secure cookies
Expiring sessions
Forcing a single session per user
Restricting visibility of groups and users
Example general-settings.json file
Disabling exports
Setting security-related HTTP headers
Single Sign-On
Users database
SAML
SPNEGO / Kerberos
Multi-Factor Authentication
Passwords security
Local passwords database or not
Passwords complexity
Encryption of the local passwords database
3rd party system credentials