Amazon Redshift¶
DSS supports the full range of features on Redshift:
- Reading and writing datasets
- Executing SQL recipes
- Performing visual recipes in-database
- Using live engine for charts
Note
We have a detailed howto for your first steps with SQL databases in DSS.
You might want to start with that Howto. The rest of this page is reference information for Redshift.
Installing the JDBC driver¶
The Redshift driver is pre-installed in DSS. You don’t need any further installation.
Writing data into Redshift¶
Loading data into a Redshift database using the regular SQL “INSERT” or “COPY” statements is extremely inefficient (a few dozens of records per second) and should only be used for extremely small datasets.
The recommended way to load data into a Redshift table is through a bulk COPY from files stored in Amazon S3. DSS will automaticallyuse this optimal S3-to-Redshift copy mechanism when using a Sync recipe. For more information, see other_recipes/sync
In other words:
- you should never have a Flow with a recipe that writes from a non-Redshift non-S3 source to a Redshift dataset.
- S3 to Redshift recipes should only be the “Sync” recipe
- Redshift to Redshift recipes will be fast if and only if the “In-database (SQL)” engine is selected.
For example, if you have a table in Redshift, and want to use a prepare recipe, since the Prepare recipe has no “In-database (SQL) engine”, you should instead use two steps: * A first Redshift-to-S3 prepare recipe * A S3-to-Redshift sync recipe
Setting distribute and sort clauses¶
DSS does not have builtin support for setting Redshift “DISTRIBUTE BY” and “SORT BY” clauses. If you want or need to set it on a managed dataset written by DSS, go to the settings of the dataset, in the “Advanced” tab, and override the “Table creation SQL statement”
Limitations¶
- DSS uses the PostgreSQL driver for connecting to Redshift. This driver limits the size of result sets to 2 billion records. You cannot read or write more than 2 billion records from/to a Redshift dataset (apart from using the In-database SQL engine)
- SSL support is not tested by Dataiku