Mastering Deep Learning VMs: Expert Answers to Your key Questions

My last post was on ultimate introduction to VMware Private AI. If you have not read that post, I highly recommend you do it. As a natural extension to my last post, in this post I thought to write FAQs around one of the key components of the vPAIF (VMware Private AI Foundation) i.e. Deep learning VMs.

Note: Though starting with 4-5 key FAQs, I plan to keep this blog post updated in few days with new questions till we have covered all the key aspects of the deep learning VMs.

What is preconfigured in DL VM?

  1. Deep Learning VMs (DL VMs): These are specialized VMs (Ubuntu guest OS) preconfigured with AI/ML libraries, frameworks, tools, and drivers, all validated and optimized by NVIDIA and VMware for deployment within the VCF environment. Since these VMs are preconfigured, data scientists or AI developers can immediately focus on their AI app development, including LLM fine-tuning and inference, without the need to spend significant time deploying and installing compatible tools and frameworks, thus saving considerable time.
Source: VMware official blog

How are they delivered/released?

These validated Deep learning VM images are available for consumption from this content delivery network (CDN) URL i.e. https://packages.vmware.com/dl-vm/lib.json (This is similar to the TKG/guest cluster images). As a vSphere admin, you need to create a subscribed content library using above URL to make it available for consumption. In case, there is air-gapped environment, admin needs to manually download these images from this URL https://packages.vmware.com/dl-vm/ (latest version recommended) and create a local content library with downloaded deep learning VM images. While each VCF release will have associated Deep learning VM but it is expected to have async releases periodically. Refer deep learning VM release notes for more details

What are the different ways to deploy deep learning VM?

These Deep Learning VMs can be deployed in one of ways mentioned below. It is assumed that content library is configured as mentioned in last question.

  1. Directly from the vSphere client UI using the standard “Deploy from content library template” workflow. This does not require you to have vSphere IaaS control plane (aka WCP or supervisor cluster) enabled on the cluster
  2. If you have vSphere IaaS control plane enabled, you can use regular kubectl k8s interface to deploy VM-service based DL VM.
  3. One of the compelling value of the vPAIF VCF Add on is ability for users to deploy DL VMs as a AI workstations using the Aria Automation (formerly known as vRealize Automation) self-service catalog. This drastically simplifies the consumption of AI workstations with just a few clicks.

Can we customize DL VMs?

Yes, as shown in the diagram above, while DL VM comes preconfigured, user can customize the way they want using industry standard cloud-init mechanism as per their requirement

Hope you got the quick value out of this post, please stay tuned for new questions those will deep dive into various key aspects of deep learning VMs.