Java runtime environment¶
Customizing Java runtime options¶
The main backend for Data Science Studio is a Java application. Runtime options for this Java process can be customized.
The different Java processes¶
DSS is made up of 4 main kind of Java processes:
- The “backend” is the main server, which handles all interaction with users, the configuration, and the visual data preparation. There is only one backend.
- The “jek” is a process which runs the jobs (ie, what happens when you use “Build”). There are multiple jeks (one per running job)
- The “fek” handles long-running background tasks. It is also responsible for building the data samples. There are multiple feks (one per running background task)
- The “hproxy” handles interactions with Hive and Pig. There is only one hproxy.
What can be customized¶
All Java options of these 4 kinds of processes can be customized.
For each of these, DSS provides an easy way to configure:
- Configure the amount of memory allocated to each process
- Configure the “permgen” (a specific kind of memory for Java processes)
- Add custom options
These three kinds of customizations can be done by editing the install.ini file.
More advanced customization (taking precedence over default DSS options) can be done via environment files.
Customizing memory (xmx) and permgen¶
Most often, you will want to customize the amount of memory (“xmx”) variable, which is the maximum memory allocated to the Java process.
Xmx is configured by setting the <processtype>.xmx
setting in the javaopts
section of the install.ini file (where <processtype>
is one of backend, jek, fek or hproxy).
By default, Xmx is set to 2GB. This might not be enough for DSS instances with large number of users. If that amount of memory is not sufficient, the DSS backend may crash, and all users may get disconnected.
Example: Set Xmx of backend to 3g¶
- Go to the DSS data directory
Note
On Mac OS X, the DATA_DIR is always: $HOME/Library/DataScienceStudio/dss_home
Stop DSS
./bin/dss stop
Edit the install.ini file
If it does not exist, add a
[javaopts]
sectionAdd a line:
backend.xmx = 3g
Regenerate the runtime config:
./bin/dssadmin regenerate-config
Start DSS
./bin/dss start
Example install.ini¶
Here is an example of an install.ini file that configures the Xmx for backend and jek:
[javaopts]
backend.xmx = 3g
jek.xmx = 2g
Memory amounts can be suffixed with “m” or “g” for megabytes and gigabytes
Setting the “permgen”¶
Permgen is a specific kind of Java memory. You will need to increase it if you encounter DSS restarts with “OutOfMemoryError: PermGen space” messages.
To set the permgen, use the <processtype>.permgen
setting (where <processtype>
is one of backend, jek, fek or hproxy).
Memory amounts can be suffixed with “m” or “g” for megabytes and gigabytes.
The same stop / regenerate-config / start logic applies
Adding additional options¶
Use the same procedure as the previous one, but add a line like
[javaopts]
backend.additional.opts=-Dmy.option=value
Advanced customization¶
The full Java runtime options can be configured by setting environment variables in the DATA_DIR/bin/env-site.sh file in the Data Science Studio data directory.
Warning
You should only use this section if you could not obtain the desired set of options using the optinos above.
The default runtime options are stored in several environment variables:
- DKU_BACKEND_JAVA_OPTS
- DKU_JEK_JAVA_OPTS
- DKU_FEK_JAVA_OPTS
- DKU_HPROXY_JAVA_OPTS
The default values for these files (computed from install.ini) are stored in the DATA_DIR/bin/env-default.sh.
Warning
Do not modify DATA_DIR/bin/env-default.sh, it would get overwritten at the next Data Science Studio upgrade and after each call to ./bin/dssadmin regenerate-config
To configure these options:
Stop DSS
./bin/dss stop
Open the bin/env-default.sh file
Copy the line you want to change. They look like
export DKU_BACKEND_JAVA_OPTS
,export DKU_JEK_JAVA_OPTS
, …Open the DATA_DIR/bin/env-site.sh file
Paste the line and modify it to your needs
Start DSS
./bin/dss start
Customizing the JVM¶
Data Science Studio requires an installation of Java Development Kit version 7 or 8. Supported versions are OpenJDK (http://openjdk.java.net) and Oracle JDK (http://www.oracle.com/technetwork/java/javase/downloads/index.html).
As part of the standard Data Science Studio installation, a suitable version of Java is looked for in standard locations, and if none is found the OpenJDK 7 system package appropriate for this distribution is pulled by the dependency installation phase.
You can force Data Science Studio to use a specific version of Java (for example, when there are several versions installed on the server, or when you manually installed Java in a non-standard place) by setting the DKUJAVABIN environment variable while running the DSS installer script. This variable should point to the java binary to use. For example:
$ DKUJAVABIN=/usr/local/bin/java dataiku-dss-VERSION/installer.sh <INSTALLER_OPTIONS>
Note that the installer script stores this value in the file DATA_DIR/bin/env-default.sh. You do not need to define it permanently for the Linux user account running the Studio.