Setting up Hadoop and Spark integration¶
Data Science Studio is able to connect to a Hadoop cluster and to:
- Read and write HDFS datasets
- Run Hive queries and scripts
- Run Impala queries
- Run Pig scripts
- Run preparation recipes on Hadoop
In addition, if you setup Spark integration, you can:
- Run SparkSQL queries
- Run preparation, join, stack and group recipes on Spark
- Run PySpark & SparkR scripts
- Train & use Spark MLLib models
See Setting up Hadoop integration and Setting up Spark integration