Address
:
[go:
up one dir
,
main page
]
Include Form
Remove Scripts
Accept Cookies
Show Images
Show Referer
Rotate13
Base64
Strip Meta
Strip Title
Session Cookies
Reference Doc.
Product
Features
Connectivity
Data Wrangling
Machine Learning
Data Mining
Data Visualization
Data Workflow
Realtime Scoring
Code or Click
Collaboration
Deployment
Enterprise Readiness
Plugins
Samples
Technology
Editions
Solutions
Use cases
Industries
Departments
Customers
Learn
Learn Dataiku DSS
All How-To's
Reference Doc.
Q & A
What's new
Support
Resources
White Papers
Reference Doc.
Webinars
Success Stories
Company
Our Story
Team
Careers
News
Events
Customers
Partners
Blog
Contact us
Get Started
Installing DSS
Requirements
Installing a new DSS instance
Upgrading a DSS instance
Updating a DSS license
Other installation options
Install on macOS
Install on AWS
Install on Azure
Install a virtual machine
Setting up Hadoop and Spark integration
R integration
Customizing DSS installation
Installing database drivers
Java runtime environment
Python integration
Installing a DSS plugin
Configuring LDAP authentication
Working with proxies
Migration operations
DSS concepts
Connecting to data
Supported connections
Upload your files
Server filesystem
HDFS
Amazon S3
Google Cloud Storage
Azure Blob Storage
FTP
SCP / SFTP (aka SSH)
HTTP
SQL databases
MySQL
PostgreSQL
HP Vertica
Amazon Redshift
EMC Greenplum
Teradata
Oracle
Microsoft SQL Server
SAP HANA
IBM Netezza
Google Bigquery
IBM DB2
Snowflake
Cassandra
ElasticSearch
Managed folders
“Files in folder” dataset
HTTP (with cache)
Dataset plugins
Data connectivity macros
Making relocatable managed datasets
Data ordering
Exploring your data
Sampling
Analyze
Schemas, storage types and meanings
Definitions
Basic usage
Schema for data preparation
Creating schemas of datasets
Handling of schemas by recipes
List of recognized meanings
User-defined meanings
Data preparation
Processors reference
Extract from array
Fold an array
Sort array
Concatenate JSON arrays
Discretize (bin) numerical values
Change coordinates system
Copy column
Rename columns
Concatenate columns
Delete/Keep columns by name
Count occurrences
Convert currencies
Extract date elements
Compute difference between dates
Format date with custom format
Parse to standard date format
Split e-mail addresses
Enrich from French department
Enrich from French postcode
Extract ngrams
Extract numbers
Fill empty cells with fixed value
Filter rows/cells on date range
Filter rows/cells with formula
Filter invalid rows/cells
Filter rows/cells on numerical range
Filter rows/cells on value
Find and replace
Flag rows/cells on date range
Flag rows with formula
Flag invalid rows
Flag rows on numerical range
Flag rows on value
Fold multiple columns
Fold multiple columns by pattern
Fold object keys
Formula
Fuzzy join with other dataset (memory-based)
Generate Big Data
Compute distance between geopoints
Extract from geo column
Geo-join
Resolve GeoIP
Create GeoPoint from lat/lon
Extract lat/lon from GeoPoint
Flag holidays
Split invalid cells into another column
Join with other dataset (memory-based)
Extract with JSONPath
Group long-tail values
Translate values using meaning
Normalize measure
Negate boolean value
Force numerical range
Generate numerical combinations
Convert number formats
Nest columns
Unnest object (flatten JSON)
Extract with regular expression
Pivot
Python function
Split HTTP Query String
Remove rows where cell is empty
Round numbers
Simplify text
Split and fold
Split and unfold
Split column
Transform string
Tokenize text
Transpose rows to columns
Triggered unfold
Unfold
Unfold an array
Convert a UNIX timestamp to a date
Fill empty cells with previous/next value
Split URL (into protocol, host, port, …)
Classify User-Agent
Generate a best-effort visitor id
Zip JSON arrays
Filtering and flagging rows
Managing dates
Reshaping
Geographic processing
Sampling
Execution engines
Data Visualization
Sampling and charts engines
Standard chart types
Geographic charts (Beta)
Color palettes
Machine learning
Prediction (Supervised ML)
Clustering (Unsupervised ML)
Features handling
Machine learning training engines
Scikit-learn / XGBoost engine
MLLib (Spark) engine
H2O (Sparkling Water) engine
Vertica
Scoring engines
The Flow
Limiting concurrent executions
Visual recipes
Sync: copying datasets
Grouping: aggregating data
Window: analytics functions
Distinct: get unique rows
Join: joining datasets
Splitting datasets
Top N: retrieve first N rows
Stacking datasets
Sampling datasets
Sort: order values
Pivot recipe
Download recipe
Recipes based on code
The common editor layout
Python recipes
R recipes
SQL recipes
Hive recipes
Pig recipes
Impala recipes
Spark-Scala recipes
PySpark recipes
Spark / R recipes
SparkSQL recipes
Shell recipes
Variables expansion in code recipes
Code notebooks
SQL notebook
Python notebooks
Predefined notebooks
Webapps
“Standard” web apps
Shiny web apps
Bokeh web apps
Publishing webapps on the dashboard
Code reports
R Markdown reports
Dashboards
Dashboard concepts
Display settings
Insights reference
Chart
Dataset table
Model report
Managed folder
Jupyter Notebook
Webapp
Metric
Scenarios
Working with partitions
Partitioning files-based datasets
Partitioned SQL datasets
Specifying partition dependencies
Partition identifiers
Recipes for partitioned datasets
Partitioned Hive recipes
Partitioned SQL recipes
Partitioning variables substitutions
DSS and Hadoop
Setting up Hadoop integration
Connecting to secure clusters
Setup a new HDFS connection
DSS and Hive
DSS and Impala
Hadoop multi-user security
Distribution-specific notes
Cloudera CDH
Hortonworks HDP
MapR
Amazon Elastic MapReduce
Microsoft Azure HDInsight
Google Cloud Dataproc
Using multiple Hadoop filesystems
Teradata Connector For Hadoop
DSS and Spark
Usage of Spark in DSS
Setting up Spark integration
Spark configurations
Usage notes per dataset type
Spark pipelines
Limitations and attention points
DSS and Python
Installing Python packages
Reusing Python code
Using Matplotlib
Using Bokeh
Using Plot.ly
Using Ggplot
DSS and R
Installing R packages
Reusing R code
Using ggplot2
Using Dygraphs
Using googleVis
Using ggvis
Code environments
Operations (Python)
Operations (R)
Base packages
Using Conda
Automation nodes
Non-managed code environments
Plugins’ code environments
Custom options and environment
Troubleshooting
Collaboration
Version control
Plugins
Installing plugins
Installing plugins offline
Writing your own plugin
Plugin author reference guide
Plugins and components
Parameters
Writing recipes
Writing DSS macros
Writing DSS Filesystem providers
Custom chart elements
Other topics
Automation scenarios, metrics, and checks
Definitions
Scenario steps
Launching a scenario
Reporting on scenario runs
Custom scenarios
Variables in scenarios
Metrics
Checks
Custom probes and checks
Automation node and bundles
Installing the Automation node
Creating a bundle
Importing a bundle
API Node: Real-time service
Introduction
Concepts
Installing the API node
Your first API service
Exposing a visual prediction model
Exposing a Python prediction model
Exposing a R prediction model
Exposing a Python function
Exposing a R function
Exposing a SQL query
Exposing a lookup in a dataset
Enriching prediction queries
API node user API
Using the apinode-admin tool
API node administration API
High availability and scalability
Managing versions of your endpoint
Logging and auditing
Health monitoring
Advanced topics
Sampling methods
Formula language
Custom variables expansion
File formats
Delimiter-separated values (DSV)
Fixed width
Parquet
Avro
Hive SequenceFile
Hive RCFile
Hive ORCFile
XML
JSON
Excel
ESRI Shapefiles
DSS APIs
The DSS public API
Features
Public API Keys
Public API Python client
The REST API
The internal Python API
Interacting with datasets
Performing SQL, Hive and Impala queries
Executing partial recipes
Interacting with Pyspark
Managed folders in Python API
Interacting with saved models
Interacting with metrics
API for custom recipes
API for custom datasets
API for custom formats
API for custom FS providers
Custom scenarios API
Creating static insights
The Javascript API
The R API
Creating static insights
The Scala API
Security
Main permissions
Connections security
User profiles
Exposed objects
Dashboard authorizations
Multi-user security
Comparing security modes
Concepts
Prerequisites and limitations
Setup
Operations
Interaction with Hive and Impala
Interaction with Spark
Advanced topics
Audit Trail
Advanced security options
Single Sign-On
Operating DSS
dsscli tool
The data directory
Backing up
Logging in DSS
DSS Macros
Managing DSS disk usage
Troubleshooting
Diagnosing and debugging issues
Obtaining support
Common issues
DSS does not start / Cannot connect
Cannot login to DSS
DSS crashes / The “Disconnected” overlay appears
Websockets problems
Cannot connect to a SQL database
A job fails
A scenario fails
A ML model training fails
“Your user profile does not allow” issues
Error codes
ERR_CODEENV_EXISTING_ENV: Code environment already exists
ERR_CODEENV_INCORRECT_ENV_TYPE: Wrong type of Code environment
ERR_CODEENV_INVALID_CODE_ENV_ARCHIVE: Invalid code environment archive
ERR_CODEENV_MISSING_ENV: Code environment does not exists
ERR_CODEENV_MISSING_ENV_VERSION: Code environment version does not exists
ERR_CODEENV_NO_CREATION_PERMISSION: User not allowed to create Code environments
ERR_CODEENV_NO_USAGE_PERMISSION: User not allowed to use this Code environment
ERR_CODEENV_UNSUPPORTED_OPERATION_FOR_ENV_TYPE: Operation not supported for this type of Code environment
ERR_CONNECTION_API_BAD_CONFIG: Bad configuration for connection
ERR_CONNECTION_AZURE_INVALID_CONFIG: Invalid Azure connection configuration
ERR_CONNECTION_S3_INVALID_CONFIG: Invalid S3 connection configuration
ERR_CONNECTION_SQL_INVALID_CONFIG: Invalid SQL connection configuration
ERR_CONNECTION_SSH_INVALID_CONFIG: Invalid SSH connection configuration
ERR_DATASET_ACTION_NOT_SUPPORTED: Action not supported for this kind of dataset
ERR_DATASET_HIVE_INCOMPATIBLE_SCHEMA: Dataset schema not compatible with Hive
ERR_DATASET_INVALID_CONFIG: Invalid dataset configuration
ERR_DATASET_INVALID_FORMAT_CONFIG: Invalid format configuration for this dataset
ERR_DATASET_INVALID_METRIC_IDENTIFIER: Invalid metric identifier
ERR_DATASET_INVALID_PARTITIONING_CONFIG: Invalid dataset partitioning configuration
ERR_DATASET_PARTITION_EMPTY: Input partition is empty
ERR_ENDPOINT_INVALID_CONFIG: Invalid configuration for API Endpoint
ERR_FOLDER_INVALID_PARTITIONING_CONFIG: Invalid folder partitioning configuration
ERR_FSPROVIDER_CANNOT_CREATE_FOLDER_ON_DIRECTORY_UNAWARE_FS: Cannot create a folder on this type of file system
ERR_FSPROVIDER_DEST_PATH_ALREADY_EXISTS: Destination path already exists
ERR_FSPROVIDER_FSLIKE_REACH_OUT_OF_ROOT: Illegal attempt to access data out of connection root path
ERR_FSPROVIDER_HTTP_CONNECTION_FAILED: HTTP connection failed
ERR_FSPROVIDER_HTTP_INVALID_URI: Invalid HTTP URI
ERR_FSPROVIDER_HTTP_REQUEST_FAILED: HTTP request failed
ERR_FSPROVIDER_ILLEGAL_PATH: Illegal path for that file system
ERR_FSPROVIDER_INVALID_CONFIG: Invalid configuration
ERR_FSPROVIDER_INVALID_FILE_NAME: Invalid file name
ERR_FSPROVIDER_LOCAL_LIST_FAILED: Could not list local directory
ERR_FSPROVIDER_PATH_DOES_NOT_EXIST: Path in dataset or folder does not exist
ERR_FSPROVIDER_ROOT_PATH_DOES_NOT_EXIST: Root path of the dataset or folder does not exist
ERR_FSPROVIDER_SSH_CONNECTION_FAILED: Failed to establish SSH connection
ERR_HIVE_HS2_CONNECTION_FAILED: Failed to establish HiveServer2 connection
ERR_METRIC_DATASET_COMPUTATION_FAILED: Metrics computation completely failed
ERR_METRIC_ENGINE_RUN_FAILED: One of the metrics engine failed to run
ERR_MISC_ENOSPC: Out of disk space
ERR_MISC_EOPENF: Too many open files
ERR_OBJECT_OPERATION_NOT_AVAILABLE_FOR_TYPE: Operation not supported for this kind of object
ERR_PLUGIN_CANNOT_LOAD: Plugin cannot be loaded
ERR_PLUGIN_COMPONENT_NOT_INSTALLED: Plugin component not installed or removed
ERR_PLUGIN_DEV_INVALID_COMPONENT_PARAMETER: Invalid parameter for plugin component creation
ERR_PLUGIN_DEV_INVALID_DEFINITION: The descriptor of the plugin is invalid
ERR_PLUGIN_INVALID_DEFINITION: The plugin’s definition is invalid
ERR_PLUGIN_NOT_INSTALLED: Plugin not installed or removed
ERR_PLUGIN_WITHOUT_CODEENV: The plugin has no code env specification
ERR_PLUGIN_WRONG_TYPE: Unexpected type of plugin
ERR_PROJECT_INVALID_ARCHIVE: Invalid project archive
ERR_PROJECT_INVALID_PROJECT_KEY: Invalid project key
ERR_RECIPE_CANNOT_CHANGE_ENGINE: Cannot change engine
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY: Cannot check schema consistency
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_EXPENSIVE: Cannot check schema consistency: expensive checks disabled
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_NEEDS_BUILD: Cannot compute output schema with an empty input dataset. Build the input dataset first.
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_ON_RECIPE_TYPE: Cannot check schema consistency on this kind of recipe
ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_WITH_RECIPE_CONFIG: Cannot check schema consistency because of recipe configuration
ERR_RECIPE_CANNOT_CHANGE_ENGINE: Not compatible with Spark
ERR_RECIPE_CANNOT_USE_ENGINE: Cannot use the selected engine for this recipe
ERR_RECIPE_INCONSISTENT_I_O: Inconsistent recipe input or output
ERR_RECIPE_PDEP_UPDATE_REQUIRED: Partition dependecy update required
ERR_RECIPE_SPLIT_INVALID_COMPUTED_COLUMNS: Invalid computed column
ERR_SCENARIO_INVALID_STEP_CONFIG: Invalid scenario step configuration
ERR_SECURITY_CRUD_INVALID_SETTINGS: The user attributes submitted for a change are invalid
ERR_SECURITY_GROUP_EXISTS: The new requested group already exists
ERR_SECURITY_INVALID_NEW_PASSWORD: The new password is invalid
ERR_SECURITY_INVALID_PASSWORD: The password hash from the database is invalid
ERR_SECURITY_MUS_USER_UNMATCHED: The DSS user is not configured to be matched onto a system user
ERR_SECURITY_PATH_ESCAPE: The requested file is not within any allowed directory
ERR_SECURITY_USER_EXISTS: The requested user for creation already exists
ERR_SECURITY_WRONG_PASSWORD: The old password provided for password change is invalid
ERR_SPARK_FAILED_DRIVER_OOM: Spark failure: out of memory in driver
ERR_SPARK_FAILED_TASK_OOM: Spark failure: out of memory in task
ERR_SPARK_FAILED_YARN_KILLED_MEMORY: Spark failure: killed by YARN (excessive memory usage)
ERR_SPARK_PYSPARK_CODE_FAILED_UNSPECIFIED: Pyspark code failed
ERR_SQL_CANNOT_LOAD_DRIVER: Failed to load database driver
ERR_SQL_DB_UNREACHABLE: Failed to reach database
ERR_SQL_IMPALA_MEMORYLIMIT: Impala memory limit exceeded
ERR_SQL_POSTGRESQL_TOOMANYSESSIONS: too many sessions open concurrently
ERR_SQL_TABLE_NOT_FOUND: SQL Table not found
ERR_SQL_VERTICA_TOOMANYROS: Error in Vertica: too many ROS
ERR_SQL_VERTICA_TOOMANYSESSIONS: Error in Vertica: too many sessions open concurrently
ERR_TRANSACTION_FAILED_ENOSPC: Out of disk space
ERR_TRANSACTION_GIT_COMMMIT_FAILED: Failed committing changes
Known issues
Release notes
DSS 4.1 Release notes
DSS 4.0 Release notes
DSS 3.1 Release notes
DSS 3.0 Relase notes
DSS 2.3 Relase notes
DSS 2.2 Relase notes
DSS 2.1 Relase notes
DSS 2.0 Relase notes
DSS 1.4 Relase notes
DSS 1.3 Relase notes
DSS 1.2 Relase notes
DSS 1.1 Release notes
DSS 1.0 Release Notes
Pre versions
Other Documentation
Third-party acknowledgements
Dataiku DSS
Docs
»
DSS and Python
»
Using Matplotlib
Using Matplotlib
¶