Running on Azure Kubernetes Service¶
You can use container execution on Azure Kubernetes Service as a fully managed Kubernetes solution.
Setup¶
Create your ACR registry¶
Follow the Azure documentation on how to create your ACR registry. We recommend that you pay extra attention to the pricing plan since it is directly related to the registry storage capacity.
Create your AKS cluster¶
Follow Azure documentation on how to create your AKS cluster. We recommend that you allocate at least 15 GB of memory for each cluster node. More memory may be required if you plan on running very large in-memory recipes. Once the cluster is created, you must modify the registry IAM credentials to grant AKS access to ACR (Kubernetes secret mode is not supported). This is required for the worker nodes to pull the images from the registry.
You’ll be able to configure the memory allocation for each container and per-namespace using multiple container execution configurations.
Prepare your local docker
and kubectl
commands¶
Follow Azure documentation to make sure that:
- Your local (on the DSS machine)
kubectl
command can interact with the cluster. As of July 2018, this implies adding to theKUBECONFIG
path the JSON file obtained with theaz aks get-credentials --resource-group resource_group --name cluster_name
command - Your local (on the DSS machine)
docker
command can successfully push images to the ACR repository. As of July 2018, this implies logging into ACR withaz login --service-principal -p client_secret -u service_principal --tenant tenant_id
thenaz acr login --name registry_name
. If you use the same principal than the cluster principal, it must have write credentials onto the registry too.
Create the execution configuration¶
Build the base image as indicated in Setting up.
In Administration > Settings > Container exec, add a new execution config of type “Kubernetes”.
The image registry URL is registry_name.azurecr.io/PREFIX
, without the image name,
where PREFIX
is an optional prefix to triage your repositories.
You’re now ready to run recipes and models on AKS
Using GPUs¶
Azure provides GPU-enabled instances with NVIDIA GPUs. Several steps are required in order to use them for container execution.
Build a CUDA-enabled base image¶
The base image that is built by default (see Setting up) does not have CUDA support and cannot use NVIDIA GPUs.
You need to build a CUDA-enabled base image.
Then create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.
Create a cluster with GPUs¶
Follow Azure documentation for how to create a cluster with GPU accelerators.
Add a custom reservation¶
In order for your container execution to be located on nodes with GPU accelerators, and for AKS to configure the CUDA driver on your containers, the corresponding AKS pods must be created with a custom “limit” (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and Memory). Also, NVIDIA dirvers should be mounted in the containers.
You must configure these in the container execution
- In the “Custom limits” section, add a new entry with key:
alpha.kubernetes.io/nvidia-gpu
and value:1
(to request 1 GPU) - Don’t forget to add the new entry
- In HostPath volume configuration, mount
/usr/local/nvidia
as/usr/local/nvidia
- Don’t forget to add the new entry, save settings
Deploy¶
You can now deploy your GPU-requiring recipes and models