The Ultimate Introduction to VMware Private AI Foundation with NVIDIA

Earlier, I discussed my involvement with private AI at VMware Explore Singapore 2023 here and and later, when the first release of ‘VMware Private AI Foundation with NVIDIA’ became generally available here. Today, I am starting a series of blog posts on various aspects of VMware Private AI Foundation with NVIDIA. I will begin with an easy-to-understand introduction to the platform.

Basics

VMware Private AI Foundation with NVIDIA is also known as vPAIF-N, PAIF-N, PAIF, or simply VMware Private AI. I will refer to it as PAIF-N going forward
As the name indicates, it is a jointly engineered solution by VMware by Broadcom (a Private cloud leader) and NVIDIA (a leader in GPU-accelerated computing).
PAIF-N primarily focuses on Generative AI (GenAI) use cases. It enables enterprises to quickly start building Generative AI capabilities around their business. It has been generally available since the VCF 5.1.1 release.
It is sold as an add-on (one of the advanced services) on top of the flagship VMware Cloud Foundation (VCF).
- In fact, customers need to have a PAIF-N add-on license from VMware by Broadcom, as well as a separate license from NVIDIA for the NVIDIA AI Enterprise (NVAIE) suite.
- Most importantly, customers must separately purchase supported GPU cards (such as H100 or A100) for certified servers/hosts
PAIF-N caters to key user personas such as cloud administrators, data scientists, and DevOps/developers.
While we are focusing on PAIF with NVIDIA, there are different such partnership with Intel and AMD (may be more in future) respectively as well.

High level architecture

Source/Credit: VMware Explore 2024 session

What are the key components from VMware by Broadcom?

Since PAIF-N is an add-on to VCF, customers benefit from all the strengths of the underlying platform (compute, storage, networking, and management—each of which has been a leader in its respective category for many years).
VCF can be deployed as a private cloud in your own data centers, through VMware cloud providers, or via certified hyperscalers with solutions such as Google Cloud VMware Engine. This flexibility allows PAIF-N to be deployed based on the customer’s choice, closer to their domain-specific data
With each release, new capabilities are continuously being developed within VCF to maximize the value of PAIF-N. Examples include:
- Virtualizing GPUs in much the same way that compute, storage, and networking are virtualized.
- Bringing all the goodness of DRS (Distributed Resource Scheduler) to effectively manage GPU resources.
- Fine-tuning the amazing VMware vMotion capability specifically for GPU workloads,
- Extending the goodness of vSphere IaaS platform to GPU workloads using constructs such as
  - VM-class with GPU,
  - VM-service with GPU, TKG/Guest clusters with GPU,
  - Harbor as a LLM store (Harbor capability announced in VMware Explore 2024, session link at the end) and so on…
- Unified vCenter UI for standing up the entire GenAI infrastructure on VCF
- VMware Data Service Manager (DSM), which supports vector databases for Retrieval-Augmented Generation (RAG), a popular GenAI use case. DSM, available as an advanced service within VCF, also caters to non-PAIF-N use cases
- and so on..hopefully you get the idea.
Deep Learning VMs (DL VMs): These are specialized VMs (Ubuntu guest OS) preconfigured with AI/ML libraries, frameworks, tools, and drivers, all validated and optimized by NVIDIA and VMware for deployment within the VCF environment. These validated VM images are released by VMware by Broadcom through a content delivery network (CDN), similar to TKG/Guest cluster images. This allows customers to subscribe to the CDN via the well-known content library construct in vSphere
These Deep Learning VMs can be deployed either directly from the vSphere UI using the standard “Deploy from content library template” workflow or through the kubectl interface as part of the VM service construct on top of the vSphere IaaS platform (also known as WCP or Supervisor cluster). Since these VMs are preconfigured, data scientists or AI developers can immediately focus on their AI app development, including LLM fine-tuning and inference, without the need to spend significant time deploying and installing compatible tools and frameworks, thus saving considerable time.
If a user prefers a Kubernetes cluster (instead of DL VMs) with preconfigured tools and libraries, it is easily achievable by deploying TKG clusters (also known as Guest clusters) and installing NVIDIA-specific operators such as the GPU-operator and RAG operator. Of course, these Kubernetes clusters can be customized with a choice of AI tools and libraries.
Importantly, data scientists or DevOps users can deploy DL VMs or Kubernetes clusters (TKG/Guest clusters) as AI workstations using the Aria Automation (formerly known as vRealize Automation) self-service catalog. This drastically simplifies the consumption of AI workstations with just a few clicks. Even though these catalog items (including RAG-based AI workstations) are preconfigured, users can still customize them to meet their specific needs.
From a GPU monitoring perspective, vCenter Server’s H5C client offers basic performance charts, while Aria Operations provides advanced dashboards specifically designed for GPU monitoring.
Although obvious, it’s important to note that all NVIDIA components (more on it below) are seamlessly integrated with PAIF-N.
As per VMware Explore 2024 announcements, vRA/Aria automation will be called as VCF automation and vRops/Aria Ops will be called as VCF Operations. I am hoping this is the last naming ceremony (lol).
Disclaimer : VMware Explore 2024 also announced new capabilities in VCF 5.2.1 and VCF 9.0. The list above is not exhaustive, as new capabilities are being rapidly added with each release to maximize the value of this jointly engineered solution

Why the term “Private-AI”?

An architectural approach that balances the business gains from AI with privacy and compliance needs of the organisation

There are multiple challenges in the GenAI space today, but the most critical ones are privacy, compliance, and security.
Privacy: Customers are concerned about the privacy of their domain-specific data and IP assets, how proprietary LLMs process or handle them, who gets access to it, etc. There is fear about how their data is being used by LLMs for inferencing or training.
Compliance: Many enterprise customers operate in highly regulated industries, so they must be 100% compliant with GDPR, HIPAA, and other regional rules and laws.
Security: With the advent of Gen AI, new security threats are emerging. Data leaks and unauthorized access can lead to significant breaches unless proper guardrails, secure APIs, encrypted data sources, and secure AI infrastructure are in place. Security is a major concern
PAIF-N resolves all these challenges. In fact, it not only addresses these critical issues but also tackles other challenges such as the choice of open LLMs, cost, and even performance (equal to or better than bare metal).

What are the key components from NVIDIA?

Credit/source: NVIDIA’s official website

In short, NVIDIA defines it as the “Operating System” for enterprise AI
It brings a lots of goodness from data-scientist, AI developer perspective with various tooling, frameworks, libraries & drivers around GenAI application development, model inferencing, model fine tuning, pretrained models etc.
Examples:
- NVIDIA vGPU technology (joint engineering between VMware and NVIDIA to virtualize GPUs)
- vGPU drivers for guest OS (DL VM or K8s worker node) compatible with VMware ESXi GPU host drivers
- NIM microservices with AI frameworks like pyTorch, Tensorflow, CUDA, sample chatbots and so on. NVIDIA NIM microservices are fastest way to AI interference
- Validated and optimized pre-trained LLMs (Community i.e. open source LLMs, NVIDIA’s custom models etc)
- Recently announced, NVIDIA NIM agent blueprints for various enterprise GenAI use cases
- GPU-operator and Network operator for K8s clusters making GPU as a first class citizen in K8s world.
- RAG operator for RAG use case
- and so on…I hope you get the idea
Note: Reading the names above might feel overwhelming, but don’t worry! You don’t need to set up or deploy any of this individually. The VMware capabilities I mentioned earlier make everything seamlessly integrated out of the box—how cool is that?
Disclaimer: Just like VMware, NVIDIA is also keep improving/adding their capabilities as we speak, so above list goes on.

How it fits together: an example

Imagine you are a multinational bank named ABC Inc., and you are already a VMware customer (or decided to go for it) with your own data center (though you had a choice to get this VCF stack on hyperscalers or providers) running at scale on an industry-leading VCF stack.
As a bank, you offer a wide range of products and services to millions of customers. All these services are developed, deployed, and managed on top of your VCF infrastructure.
With the rise of amazing GenAI technology, and to stay ahead in your industry, you have developed a GenAI strategy to integrate AI into every aspect of your operations. A couple of simple examples might include improving customer service or deploying a code assistant for your internal developer teams
As a multinational bank operating in a highly regulated industry, you are concerned about critical challenges like privacy, compliance, and security. You want to move quickly but still retain full control over your data, intellectual property (IP), and costs.
Given your focus on the banking sector, building your own proprietary LLM (like OpenAI’s GPT) is not feasible due to the high cost and complexity. Instead, your strategy is to embrace open LLMs that are pre-trained for specific use cases (such as text, code, or video) to accelerate your AI journey and solve your business-specific AI challenges.
You want to leverage existing VMware investments, skillset and reduce the learning curve around GenAI for these new set of workloads
With above constraints and concerns you are wondering how to go about enabling your internal teams to move forward in order to quickly integrate Gen AI into all facets of your business.
By now, it should be clear that PAIF-N is the perfect solution for you! 🙂

Further learning

If you want to see how it looks like in action with cool demos, this VMware Explore 2024 session is must watch
Glimpse into VMware Private AI future state and PAIF-N with VCF 9.0, another session from VMware Explore 2024
Official VMware Private-AI documentation
Please stay tuned for further deep dives.

If you think this blog post added value to your time, please share with others as appropriate. Please feel free to connect with me on linkedIn or Twitter for all such posts.

Vikas Shitole

Vikas Shitole is a Staff engineer 2 at VMware (by Broadcom) India R&D. He currently contributes to core VMware products such as vSphere, VMware Private AI foundation and partly VCF . He is an AI and Kubernetes enthusiast. He is passionate about helping VMware customers & enjoys exploring automation opportunities around core VMware technologies. He has been a vExpert since last 11 years (2014-24) in row for his significant contributions to the VMware communities. He is author of 2 VMware flings & holds multiple technology certifications. He is one of the lead contributors to VMware API Sample Exchange with more than 35000+ downloads for his API scripts. He has been speaker at International conferences such as VMworld Europe, USA, Singapore & was designated VMworld 2018 blogger as well. He was the lead technical reviewer of the two books “vSphere design” and “VMware virtual SAN essentials” by packt publishing.

In addition, he is passionate cricketer, enjoys bicycle riding, learning about fitness/nutrition and one day aspire to be an Ironman 70.3