Running on Amazon Elastic Kubernetes Service¶

You can use container execution on Amazom Elastic Kubernetes Service as a fully managed Kubernetes solution.

Setup¶

Create your EKS cluster¶

Follow AWS documentation on how to create your EKS cluster. We recommend that you allocate at least 15 GB of memory for each cluster node. More memory may be required if you plan on running very large in-memory recipes.

You’ll be able to configure the memory allocation for each container and per-namespace using multiple container execution configurations.

Prepare your local `aws`, `docker` and `kubectl` commands¶

Follow AWS documentation to make sure that:

Your local (on the DSS machine) aws ecr command can list and create docker image repositories, and authenticate docker for image push.
Your local (on the DSS machine) kubectl command can interact with the cluster.
Your local (on the DSS machine) docker command can successfully push images to the ECR repository.

Create the execution configuration¶

Build the base image as indicated in Setting up.

In Administration > Settings > Container exec, add a new execution config of type “Kubernetes”.

The image registry URL is the one given by aws ecr describe-repositories, without the image name. It typically looks like XXXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/PREFIX, where XXXXXXXXXXXX is your AWS account ID, us-east-1 is the AWS region for the repository and PREFIX is an optional prefix to triage your repositories.
The image pre-push hook should be a script that takes care of the Amazon ECR requirements when pushing images. ECR mandates that:
- a aws ecr create-repository call is performed when pushing to a new repository (in Docker parlance, when using an image example.com/prefix/image-name:image-tag, the repository is example.com/prefix/image-name)
- the docker client must be authenticated to push to this repository, which one can do using the aws ecr get-login command
DSS comes with a sample script for simple EKS/ECR deployment, you can set the pre-push hook to INSTALL_DIR/resources/container-exec/kubernetes/aws-ecr-prepush.sh, where INSTALL_DIR is the full path of DSS installation directory (containing the installer.sh script).

You’re now ready to run recipes and models on EKS

Using GPUs¶

AWS provides GPU-enabled instances with NVIDIA GPUs. Several steps are required in order to use them for container execution

Build a CUDA-enabled base image¶

The base image that is built by default (see Setting up) does not have CUDA support and cannot use NVIDIA GPUs.

You need to build a CUDA-enabled base image.

Then create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.

Enable GPU support on the cluster¶

To execute containers leveraging GPUs, your worker nodes as well as the control plane need to support them. The following instructions describe a simplified way to achieve this. It is subject to variations depending on the underlying hardware and software version requirements for your projects.

To make a worker node able to leverage its GPUs:

Install the NVIDIA Driver that goes with the model of GPU the instance is provisioned with.
Install the Cuda driver. We recommend the runfile installation method. It is not needed to also install the cuda toolkit, the driver alone is sufficient.
Install the nvidia docker runtime and set this runtime as the default docker runtime.

Finally, enable the cluster GPU support with the nvidia device plugin. Be careful to select the version that matches your Kubernetes version (v1.10 as of July 2018).

Add a custom reservation¶

In order for your container execution to be located on nodes with GPU accelerators, and for EKS to configure the CUDA driver on your containers, the corresponding EKS pods must be created with a custom “limit” (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and Memory)

You must configure this limit in the container execution

In the “Custom limits” section, add a new entry with key: nvidia.com/gpu and value: 1 (to request 1 GPU)
Don’t forget to add the new entry, save settings

Deploy¶

You can now deploy your GPU-requiring recipes and models