Running on Amazon Elastic Kubernetes Service¶
You can use container execution on Amazom Elastic Kubernetes Service as a fully managed Kubernetes solution.
Setup¶
Create your EKS cluster¶
Follow AWS documentation on how to create your EKS cluster. We recommend that you allocate at least 15 GB of memory for each cluster node. More memory may be required if you plan on running very large in-memory recipes.
You’ll be able to configure the memory allocation for each container and per-namespace using multiple container execution configurations.
Prepare your local aws
, docker
and kubectl
commands¶
Follow AWS documentation to make sure that:
- Your local (on the DSS machine)
aws ecr
command can list and create docker image repositories, and authenticatedocker
for image push. - Your local (on the DSS machine)
kubectl
command can interact with the cluster. - Your local (on the DSS machine)
docker
command can successfully push images to the ECR repository.
Create the execution configuration¶
Build the base image as indicated in Setting up.
In Administration > Settings > Container exec, add a new execution config of type “Kubernetes”.
The image registry URL is the one given by
aws ecr describe-repositories
, without the image name. It typically looks likeXXXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/PREFIX
, whereXXXXXXXXXXXX
is your AWS account ID,us-east-1
is the AWS region for the repository andPREFIX
is an optional prefix to triage your repositories.The image pre-push hook should be a script that takes care of the Amazon ECR requirements when pushing images. ECR mandates that:
- a
aws ecr create-repository
call is performed when pushing to a new repository (in Docker parlance, when using an imageexample.com/prefix/image-name:image-tag
, the repository isexample.com/prefix/image-name
) - the
docker
client must be authenticated to push to this repository, which one can do using theaws ecr get-login
command
DSS comes with a sample script for simple EKS/ECR deployment, you can set the pre-push hook to
INSTALL_DIR/resources/container-exec/kubernetes/aws-ecr-prepush.sh
, whereINSTALL_DIR
is the full path of DSS installation directory (containing theinstaller.sh
script).- a
You’re now ready to run recipes and models on EKS
Using GPUs¶
AWS provides GPU-enabled instances with NVIDIA GPUs. Several steps are required in order to use them for container execution
Build a CUDA-enabled base image¶
The base image that is built by default (see Setting up) does not have CUDA support and cannot use NVIDIA GPUs.
You need to build a CUDA-enabled base image.
Then create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.
Enable GPU support on the cluster¶
To execute containers leveraging GPUs, your worker nodes as well as the control plane need to support them. The following instructions describe a simplified way to achieve this. It is subject to variations depending on the underlying hardware and software version requirements for your projects.
To make a worker node able to leverage its GPUs:
- Install the NVIDIA Driver that goes with the model of GPU the instance is provisioned with.
- Install the Cuda driver. We recommend the runfile installation method. It is not needed to also install the cuda toolkit, the driver alone is sufficient.
- Install the nvidia docker runtime and set this runtime as the default docker runtime.
Finally, enable the cluster GPU support with the nvidia device plugin. Be
careful to select the version that matches your Kubernetes version (v1.10
as of July 2018).
Add a custom reservation¶
In order for your container execution to be located on nodes with GPU accelerators, and for EKS to configure the CUDA driver on your containers, the corresponding EKS pods must be created with a custom “limit” (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and Memory)
You must configure this limit in the container execution
- In the “Custom limits” section, add a new entry with key:
nvidia.com/gpu
and value:1
(to request 1 GPU) - Don’t forget to add the new entry, save settings
Deploy¶
You can now deploy your GPU-requiring recipes and models