ElasticSearch¶

Data Science Studio can both read and write datasets on ElasticSearch versions 1.1 to 5.6.

Append Mode (to append to an elasticsearch dataset instead of replacing) is not supported.

Define an ElasticSearch connection¶

Go to Administration > Connections
Click the “New connection” button and pick ElasticSearch
Enter a name for the new connection, and the required connection parameters, then test and save the new connection

Note

The port parameter should be ElasticSearch’s HTTP API port (9200 by default), not the Java API port.

Managed ElasticSearch datasets¶

If you allow DSS to write managed dataset into the ElasticSearch connection, you can use this connection to create output datasets for recipes.

Creating such a dataset creates a new index on your ElasticSearch server, with the name of the dataset by default, and its data as a type also the name of the dataset by default. For example, if your ElasticSearch server is hoster on localhost:9200, a managed dataset named Articles stores its data into localhost:9200/articles/articles. This name will not change if you rename the dataset in case you are relying on its presence, so if you rename the dataset and want those names to remain similar, you should edit the index and type names after renaming the dataset, then rebuild it and manually delete the previous index.

Warning

You should not create other types in the index that are managed by DSS, they might be deleted or altered.

By default, fields get the default ElasticSearch mapping, e.g. string are analyzed and indexed (mapped to text in ElasticSearch 5+). If you want access to a non-analyzed version(mapped to keyword in ElasticSearch 5+) of some or all of your columns, you can list those columns (comma-separated, or * for all string columns) in the dataset settings. You can also specify your own complete type mapping.

If your dataset is partitioned, then one index per partition is created (prefixed by the index name) and the index name is actually an ElasticSearch alias that points to all the partition’s indices. You can still search or delete from the alias normally.

External ElasticSearch datasets¶

You can also import existing data from ElasticSearch into DSS. Simply create an ElasticSearch dataset and specify the index and type name of the data. If the connection is writable, DSS can also overwrite that data, but the type mapping will not be modified by DSS and the index/type, not created if they don’t already exist.

Your index may be an alias if it’s only used for reading, or for writing if it only points to one index (otherwise ElasticSearch refuses the write operation).

You can partition your external dataset in DSS: simply specify the partitioning column and the type of partitioning (value or time-based). You can only partition on one column for external datasets.

Note

The partitioning column must have fielddata enabled, which is the case by default for keyword fields in Elasticsearch 5+ but not for text.