Kubernetes and OpenShift allows you to schedule Pods that have access to GPU accelerator resources. It is also possible to request a specific kind of accelerator.
This can be done on any resource type that allows defining the container specifications, such as: Jobs, Pods, Deployments, etc.
It is therefore possible to specify the number of GPUs request in the resource limits section and the specific GPU type in the Node Selector section.
The following is an example for a Job requesting one NVIDIA V100 GPU.
apiVersion: batch/v1
kind: Job
metadata:
name: job-with-gpu-access
spec:
template:
spec:
containers:
- name: job-with-gpu-access
image: <image>:<tag>
resources:
requests:
nvidia.com/gpu: "1"
nodeSelector:
nvidia.com/gpu.product: Tesla-V100-PCIE-32GB
More information and more complex scheduling, such as requesting only GPUs with a specific amount of VRAM available (useful when you know the model size you’re trying to run inference on) can be found in the Kubernetes documentation on GPU scheduling.