[go: up one dir, main page]

Dataiku Reference Doc.
  • Product
    • Features
      • Connectivity
      • Data Wrangling
      • Machine Learning
      • Data Mining
      • Data Visualization
      • Data Workflow
      • Realtime Scoring
      •  
      • Code or Click
      • Collaboration
      • Deployment
      • Enterprise Readiness
    • Plugins
    • Samples
    • Technology
    • Editions
  • Solutions
    • Use cases
    • Industries
    • Departments
    • Customers
  • Learn
    • Learn Dataiku DSS
    • All How-To's
    • Reference Doc.
    • Q & A
    • What's new
    • Support
  • Resources
    • White Papers
    • Reference Doc.
    • Webinars
    • Success Stories
  • Company
    • Our Story
    • Team
    • Careers
    • News
    • Events
    • Customers
    • Partners
  • Blog
  • Contact us
  • Get Started
  • Installing DSS
    • Requirements
    • Installing a new DSS instance
    • Upgrading a DSS instance
    • Other installation options
      • Install on macOS
      • Install on AWS
      • Install on Azure
      • Install a virtual machine
    • Setting up Hadoop and Spark integration
    • Setting up R integration
    • Customizing DSS installation
    • Installing database drivers
    • Java runtime environment
    • The Python environment
    • Installing a DSS plugin
    • Configuring LDAP authentication
    • Working with proxies
    • Migration operations
  • DSS concepts
  • Connecting to data
    • Supported connections
    • SQL databases
      • MySQL
      • PostgreSQL
      • HP Vertica
      • Microsoft SQL Server
      • IBM Netezza
      • SAP HANA
      • Google Bigquery
    • Cassandra
    • Amazon S3
    • Google Cloud Storage
    • Azure Blob Storage
    • ElasticSearch
    • FTP
    • SSH / SCP / SFTP (cached)
    • HTTP (cached)
    • FTP (cached)
    • “Files in folder” dataset
    • Dataset plugins
    • Data connectivity macros
    • Making relocatable managed datasets
  • Exploring your data
    • Sampling
    • Analyze
  • Schemas, storage types and meanings
    • Definitions
    • Basic usage
    • Schema for data preparation
    • Creating schemas of datasets
    • Handling of schemas by recipes
    • List of recognized meanings
    • User-defined meanings
  • Data preparation
    • Processors reference
      • Extract from array
      • Fold an array
      • Sort array
      • Concatenate JSON arrays
      • Discretize (bin) numerical values
      • Change coordinates system
      • Copy column
      • Rename columns
      • Concatenate columns
      • Delete/Keep columns by name
      • Count occurrences
      • Convert currencies
      • Extract date elements
      • Compute difference between dates
      • Format date with custom format
      • Parse to standard date format
      • Split e-mail addresses
      • Enrich from French department
      • Enrich from French postcode
      • Extract ngrams
      • Extract numbers
      • Fill empty cells with fixed value
      • Filter rows/cells on date range
      • Filter rows/cells with formula
      • Filter invalid rows/cells
      • Filter rows/cells on numerical range
      • Filter rows/cells on value
      • Find and replace
      • Flag rows/cells on date range
      • Flag rows with formula
      • Flag invalid rows
      • Flag rows on numerical range
      • Flag rows on value
      • Fold multiple columns
      • Fold multiple columns by pattern
      • Fold object keys
      • Formula
      • Fuzzy join with other dataset (memory-based)
      • Compute distance between geopoints
      • Extract from geo column
      • Geo-join
      • Resolve GeoIP
      • Create GeoPoint from lat/lon
      • Extract lat/lon from GeoPoint
      • Flag holidays
      • Split invalid cells into another column
      • Join with other dataset (memory-based)
      • Extract with JSONPath
      • Group long-tail values
      • Translate values using meaning
      • Normalize measure
      • Negate boolean value
      • Force numerical range
      • Generate numerical combinations
      • Convert number formats
      • Nest columns
      • Unnest object (flatten JSON)
      • Extract with regular expression
      • Pivot
      • Python function
      • Split HTTP Query String
      • Remove rows where cell is empty
      • Round numbers
      • Simplify text
      • Split and fold
      • Split and unfold
      • Split column
      • Transform string
      • Tokenize text
      • Transpose rows to columns
      • Triggered unfold
      • Unfold
      • Convert a UNIX timestamp to a date
      • Fill empty cells with previous/next value
      • Split URL (into protocol, host, port, ...)
      • Classify User-Agent
      • Generate a best-effort visitor id
      • Zip JSON arrays
    • Filtering and flagging rows
    • Managing dates
    • Reshaping
    • Geographic processing
    • Sampling
    • Execution engines
  • Data Visualization
    • Sampling and charts engines
    • Standard chart types
    • Geographic charts (Beta)
  • Machine learning
    • Prediction (Supervised ML)
    • Clustering (Unsupervised ML)
    • Features handling
    • The machine learning engines
      • Scikit-learn / XGBoost engine
      • MLLib (Spark) engine
      • H2O (Sparkling Water) engine
      • Vertica
  • The Flow
  • Visual recipes
    • Sync: Copying datasets
    • Grouping: Aggregating data
    • Join: joining datasets
    • Splitting datasets
    • Stacking datasets
    • Sampling datasets
  • Recipes based on code
    • The common editor layout
    • Python recipes
    • R recipes
    • SQL recipes
    • Hive recipes
    • Pig recipes
    • Impala recipes
    • Spark-Scala recipes
    • PySpark recipes
    • Spark / R recipes
    • SparkSQL recipes
    • Shell recipes
    • Variables expansion in code recipes
  • Code notebooks
    • SQL notebook
    • Python notebooks
    • Predefined notebooks
  • Webapps
  • Dashboards
    • Dashboard concepts
    • Display settings
    • Insights reference
      • Chart
      • Dataset table
      • Model report
      • Managed folder
      • Jupyter Notebook
      • Webapp
      • Metric
      • Scenarios
  • Working with partitions
    • Partitioning files-based datasets
    • Partitioned SQL datasets
    • Specifying partition dependencies
    • Partition identifiers
    • Recipes for partitioned datasets
    • Partitioned Hive recipes
    • Partitioned SQL recipes
    • Partitioning variables substitutions
  • DSS and Hadoop
    • Setting up Hadoop integration
    • Connecting to secure clusters
    • Setup a new HDFS connection
    • DSS and Hive
    • DSS and Impala
    • Hadoop multi-user security
    • Distribution-specific notes
      • Cloudera CDH
      • Hortonworks HDP
      • MapR
      • Amazon Elastic MapReduce
      • Azure HDInsights
    • Using multiple Hadoop filesystems
  • DSS and Spark
    • Usage of Spark in DSS
    • Setting up Spark integration
    • Spark configurations
    • Usage notes per dataset type
    • Spark pipelines
    • Limitations and attention points
  • Collaboration
    • Version control
  • Plugins
    • Installing plugins
    • Installing plugins offline
    • Writing your own plugin
    • Plugin author reference guide
      • Plugins and components
      • Parameters
      • Writing recipes
      • Writing DSS macros
      • Other topics
  • Real-time predictions
    • Introduction
    • Concepts
    • Installing the API node
    • Exposing a prediction model
    • Custom prediction models
    • API node user API
    • Using the apinode-admin tool
    • API node administration API
    • High availability and scalability
    • Enriching queries in real-time
    • Managing versions of your endpoint
    • Logging and auditing
    • Health monitoring
  • Automation node and bundles
    • Installing the Automation node
    • Creating a bundle
    • Importing a bundle
  • Automation scenarios, metrics, and checks
    • Definitions
    • Scenario steps
    • Launching a scenario
    • Reporting on scenario runs
    • Custom scenarios
    • Variables in scenarios
    • Metrics
    • Checks
    • Custom probes and checks
  • Advanced topics
    • Sampling methods
    • Using managed folders
    • Formula language
    • Custom variables expansion
  • File formats
    • Delimiter-separated values (DSV)
    • Fixed width
    • Parquet
    • Avro
    • Hive SequenceFile
    • Hive RCFile
    • Hive ORCFile
    • XML
    • JSON
    • Excel
    • ESRI Shapefiles
  • DSS APIs
    • The DSS public API
      • Features
      • Public API Keys
      • Public API Python client
      • The REST API
    • The internal Python API
      • Interacting with datasets
      • Advanced Python API usage
      • Performing SQL, Hive and Impala queries
      • Executing partial recipes
      • Interacting with Pyspark
      • Managed folders in Python API
      • Interacting with saved models
      • Interacting with metrics
      • API for custom recipes
      • API for custom datasets
      • API for custom formats
      • Custom scenarios API
    • The Javascript API
    • The R API
    • The Scala API
  • Security
    • Main permissions
    • Connections security
    • User profiles
    • Exposed objects
    • Dashboard authorizations
    • Multi-user security
      • Comparing security modes
      • Concepts
      • Prerequisites and limitations
      • Setup
      • Operations
      • Interaction with Hive and Impala
      • Interaction with Spark
      • Advanced topics
    • Audit Trail
    • Advanced security options
    • Single Sign-On
  • Operating DSS
    • DSS Macros
    • Logging in DSS
  • Tips and troubleshooting
    • Forgot your password ?
    • Finding logs
    • Troubleshooting websockets
    • Known issues
  • Release notes
    • DSS 4.0 Release notes
    • DSS 3.1 Release notes
    • DSS 3.0 Relase notes
    • DSS 2.3 Relase notes
    • DSS 2.2 Relase notes
    • DSS 2.1 Relase notes
    • DSS 2.0 Relase notes
    • DSS 1.4 Relase notes
    • DSS 1.3 Relase notes
    • DSS 1.2 Relase notes
    • DSS 1.1 Release notes
    • DSS 1.0 Release Notes
    • Pre versions
  • Other Documentation
  • Third-party acknowledgements
 
Dataiku DSS
  • Docs »
  • Installing DSS

Installing DSSΒΆ

  • Requirements
  • Installing a new DSS instance
  • Upgrading a DSS instance
  • Other installation options
  • Setting up Hadoop and Spark integration
  • Setting up R integration
  • Customizing DSS installation
  • Installing database drivers
  • Java runtime environment
  • The Python environment
  • Installing a DSS plugin
  • Configuring LDAP authentication
  • Working with proxies
  • Migration operations
Next Previous

© Copyright 2017, Dataiku.

Sphinx theme provided by Read the Docs