Category Archives: DRS, SDRS

drs dpm sdrs sioc related posts/scripts

One of cool vSphere features: Tutorial:How Storage DRS works with storage policies in SPBM?

SDRS integration with storage profiles was one of the cool feature I was waiting for since vSphere 5.5. In vSphere 6.0 GA, there were some public references of its support but no detailed documentation was available. As per this KB I am happy to know that in vCenter Server 6.0.0b and above, Storage DRS is fully supported to have storage profile enforcement. Now SDRS is aware of storage policies in SPBM (Storage Policy Based Management). Recently I got opportunity to play with this feature and I thought to write a detailed post which can help all. Here we go

As part of this SDRS integration with storage profile/policy, One SDRS cluster level advanced option is introduced i.e. “EnforceStorageProfiles”. Advanced option EnforceStorageProfiles takes one of these integer values, 0,1 or 2 where the default value is 0.

When option is set to 0, it is mean that NO storage profile enforcement on the SDRS cluster.

When option is set to 1, it is mean that there is storage profile SOFT enforcement on the SDRS cluster. SDRS will try its best to comply with storage profile/policy. However if required, SDRS will violate the storage profile compliant.

When option is set to 2, it is mean that there is storage profile HARD enforcement on the SDRS cluster. In any case, SDRS will not violate the storage profile.

Refer KB 2142765 in order to know how to configure SDRS advanced option “EnforceStorageProfiles” using vSphere web client and vSphere client.

Now I will walk you through vSphere web client workflows as follows in order to play with this cool feature. This is kind of Tutorial.

1. Configuring SDRS advanced option to enable SOFT (1) storage profile enforcement with Storage DRS.

-Create a SDRS cluster (aka POD) with 3 datastores namely DS1, DS2 and DS3.
-Go to SDRS Cluster >> Manage >> Settings >> Storage DRS web client workflow. Click on Edit and configure the option “EnforceStorageProfiles” as shown in below screenshot.

Adding SDRS option

You could see I had set this option to 1 i.e. SOFT enforcement

2. Creating 2 Tags named “Gold” and Silver”.

Tags are required to attach to all the datastores in datastore cluster (as per datastore capability) as well as to create tag based VM storage policies.

– Go to Home >> Tags >> click on New tag and create a “Gold” tag with “FC SAN” category as shown in below screenshot

Create Tag

Similar to Gold tag, please create a “Silver” tag with “iSCSI SAN” category. At this point make sure you have 2 tags, Gold and Silver.

3. Assign the tags to datastores in SDRS POD.

-Go to SDRS POD >>DS1 >> Manage >> Tags >> Click on Assign tags as shown in below screenshot

Assign tags to Datastore

Please assign “Gold” tag to the DS1 as well as DS2. Finally assign “Silver” tag to datastore DS3. You can see the assigned tags for each datastore as shown below screenshot.

Assiged-tag-on-DS3

You can see in above screenshot, DS3 was assigned with “Silver” tag.

4. Creating VM storage policy

– Go to Home >> Policies and profiles >> VM storage policy >> Click on the create VM storage policy as specified in below screen shot.

Create VM storage policy

-Once you click on “Create a new VM storage policy”, specify policy name as “Gold VM Storage Policy” and Click next.
– On Rule-Sets window, click on Add tag-based rule and select “Gold” tag under “FC SAN” category as shown in below screenshot.
Adding tag based rule
– On Storage compatibility UI page, keep default and click next. Finally click on finish at the end as shown below screenshot.
VM storage policy finish

– Similar to “Gold VM Storage Policy” creation, repeat above steps to create “Silver VM Storage Policy” based on “Silver” tag. At this moment you will have 2 VM storage policies “Gold VM Storage Policy” and “Silver VM Storage Policy”.

Now we are ready to play with SDRS with Storage profile integration feature. Lets try out some workflows from SDRS perspective to verify whether profiles are being considered by SDRS. Note that in the very first step we have set the “EnforceStorageProfiles” SDRS option to 1 i.e. SOFT profile enforcement.

1. Create VM workflow: i.e. SDRS initial placement workflow.

– From web client. Start create VM workflow >> Give VM name as “GoldVM” >> select compute resource >> under select storage, select VM storage policy as “Gold VM Storage Policy” that we created in last section and storage as SDRS POD we created initially as shown in below screenshot

Warning on selecting policy create VM

If you see in above screenshot, SDRS POD is listed under “incompatible” storage. Note that this is expected as SDRS POD has 3 datastores and 2 of those have “Gold” tag attached and 3rd has “Silver” attached. Warning shown in above screenshot also shows the same. i.e. Datastore does not satisfy compatibility since it does not support one or more required properties. Tags with name “Gold” not found on datastore”. As I said, this is expected as we have selected “Gold VM Storage policy” and NOT all the datastores in SDRS POD have “Gold” tag attached. Overall, no panic, just click next.

-Finally on the finish page of VM creation, we can see SDRS initial placement recommendations as shown in below screenshotSDRS recommendations

From above screen shot you can understand that SDRS placement recommendations on DS2 are absolutely spot on as DS2 datastore has “Gold” tag attached.

Now you can check whether the created VM “GoldVM” is placed on the datastore compatible with Gold VM storage policy by SDRS. You could see in below screenshot, SDRS has placed the GoldVM on right datastore and VM storage policies are compliant.

VM-complaint

Below is the screenshot for the “GoldVM” VM files. You can see all the VM files are placed on datastore DS2 which has Gold tag attached. is not it cool?

Datastore files on DS2

Based on above screenshots we can say that SDRS is aware of VM storage policies. Now lets try another SDRS workflow i.e. Putting datastore in maintenance mode.

2. Putting datastore in maintenance mode.

As we know that all the “GoldVM” VM files are in datastore DS2 as expected, now we will put datastore DS2 into maintenance mode and we expect SDRS will storage vMotion the VM to DS1 as DS1 is the only other datastore where “Gold” tag is attached.

From web client, Go to SDRS POD >> DS2 >> right click on DS2 and select “Enter Maintenance mode”. As soon as we click Enter Maintenance mode we get SDRS migration recommendations in order to evacuate the DS2 datastore as shown in below screenshot.

After putting into MM

You could see in above screenshot, SDRS has recommended to migrate VM files to DS1 as expected. How cool is that?

To see if SOFT profile integration is working fine, I went ahead and put DS1 also into maintenance mode and I observed VM files were moved to DS3 as expected as we have set the “EnforceStorageProfiles” SDRS option to 1 i.e. SOFT profile enforcement (DS2 is already in maintenamce mode.).

3. Configuring SDRS affinity rules.

I tested first case with VMDK anti affinity (Intra VM) and second case with VM anti affinity(Inter VM), in both the cases, I have observed affinity rules will have higher precedence over storage profiles attached to the VMs/datastores. SDRS will first obey the affinity rules first.

Finally I even tested some SDRS workflows where I filled the datastores so that they cross the SDRS space threshold and finally invoked SDRS to see if SDRS really does consider storage profiles while generating space migration recommendations. I observed SDRS does consider storage profiles as expected.

Overall, above tutorial will help you to get started. Testing SDRS with HARD storage profile enforcement I leave it to you. Enjoy!

One caveat : As noted in this KB, vCloud Director (vCD) backed by SDRS cluster does NOT support Soft (1) or Hard (2) storage profile enforcements. vCloud Director (vCD) will work well with Default (0) option.

Let me know if you have comments.

PART 2: VMware DPM vs ESXi memory ballooning

In PART I of my post we learned how DPM works, DPM memory demand metric. If you have not read PART I, I strongly recommend reading it first.

In PART 2 we will touch upon 2 important points with example.
1. How earlier DPM behavior was more aggressive from memory perspective?
2. How can we now control DPM behavior with new memory demand metric & avoid memory ballooning?

Let’s start with first point: How earlier DPM behavior was more aggressive?
OLD DPM memory demand metric was considering just Active memory as memory demand. To understand it clearly, we will take one example: Say we have 2 ESXi hosts(H1 & H2) with 2 VMs on each hosts(VM1,VM2 on H1 & VM3, VM4 on H2) in the DPM enabled cluster. Memory configuration was as follows:
H1-4GB
H2-4GB
VM1-3GB
VM2-3GB
VM3-3GB
VM4-3GB
Clearly environment is memory over-committed. All VMs are already powered ON, host consumed memory on H1 =3 GB, H2=1GB & current active memory usage on each VM is just 256MB. You may wonder that why host consumed memory is 3 GB on H1 when current active memory on H1 is just 512 MB(256x2VMs). The reason is, initially active memory usage for VMs on host H1 was 3GB but at this point active memory usage on H1 reduced to 512MB & as per the ESXi memory management, unused memory on the VMs will not freed itself to host until ESXi does not use its memory reclamation technique such as memory ballooning. Memory ballooning reclaims the VM memory only when host memory usage crosses 96% of its total host memory.

DPM will evaluate the cluster based on Target Resource Utilization Range = 63±18 i.e. Default range is 45 to 81%. As per active memory usage on both hosts, it is clear that only 1GB(256×4 VMs) memory is being used actively in the cluster which is just around 12.5% of the total memory (4 GB) available per host. Each ESXi host’s resource utilization demand is calculated as aggregate of memory required by VMs running on that host. Based on Target utilization range DPM identifies one of hosts as candidate host (in our case H2) to put into standby mode. DPM runs DRS simulation on the remaining one host i.e. H1 (DRS simulation will not consider the candidate hosts to be powered OFF, in our case it is H2), simulation uses the DPM demand metric formula i.e. just active memory to analyze whether VMs(VM3,VM4) on candidate host H2 can be accommodated on host H1 without impacting existing VMs(VM1,VM2). As the overall active memory across all the VMs is 1GB (256×4 VMs) & 1GB is just 25% of the memory utilization of H1 which is too less than upper target utilization range i.e. 81. (Before putting host into standby mode, DPM also makes sure that remaining host memory utilization should not cross the upper utilization range i.e. 81%). Hence DPM will vMotion VMs (with the help of DRS) from candidate host H2 to H1 and will put H2 into standby mode to save the power consumption. Note that by this time host consumed memory would be at-least 3 GB(earlier)+512 MB(VMs on H2 those are migrated to H1)=3.5 GB. If say, suddenly memory demand for recently migrated VMs increased by even 200MB each, host consumed memory on H1 would cross 96% of total memory available. This is where host H1 is very short on memory & it will immediately start memory ballooning in order to reclaim the unused VM memory to satisfy the memory demand by VMs. It is clear that old DPM memory demand metric did not consider future memory demand growth on any VMs those are currently on H1. Memory ballooning itself will not cause the performance impact as balloon driver will first reclaim the guest memory which is unused by guest OS which is perfectly safe (More on ballooning in next post). If memory is excessively over-committed & memory reclaimed by ballooning is not enough, it can lead to host swapping and it severally impacts the performance of the VMs.

Note: For the sake of simplicity in examples I did not consider memory required for virtualization layer.

Now you will be wondering, does not DPM evaluate(in above case) cluster to bring back the standby host H2 to meet the memory demand? Of course YES but as DPM evaluates hosts for power ON recommendation every 5 min, DPM will wait to complete the 5 min. As soon as DPM is invoked, DPM evaluates the cluster to bring back the standby host to meet the memory demand, once the standby host is powered ON, DRS balances the memory load in the cluster and it will stop the memory ballooning. However, hosts in the cluster may come across memory ballooning for the minimum time ranges from 5 min i.e. DPM invocation time for power on host recommendations + time DPM takes to evaluate the hosts + time required for the host to boot up from standby mode + Time required to vMotion as a result of balancing the memory load.

Overall, with DPM’s old memory demand metric DPM may lead to memory ballooning when active memory is low but host consumed memory is high. Host consumed memory can be high with low active memory when allocated VMs memory is either overcommitted OR it can even happen when VMs memory is fully backed by physical memory.

At this point I assume that you clearly understood that how earlier memory demand metric (active memory) was very aggressive.

Now it is time to see, how can we now control DPM behavior with new memory demand metric?
In order to control DPM’s aggressiveness, from vCenter 5.1 U2c onwards and all the versions of vCenter 5.5, DPM can be tuned to consider idle consumed memory as well in DPM memory demand metric. i.e. new DPM memory demand metric = active memory + X% of idle consumed memory. Default value of the X is 25. X value can be modified by using DRS advanced option “ PercentIdleMBInMemDemand” on cluster level. We can set this value in the range from 0 to 100. Refer this KB on how to configure PercentIdleMBInMemDemand advanced option.

We will continue the same above example:
We set “PercentIdleMBInMemDemand” option to 100 i.e. X value is 100. Initially on H1 host consumed memory was 3 GB & active memory usage was 512MB (256×2 VMs) and on H2 host consumed memory was 1 GB & active memory usage was 512MB(256×2 VMs). In this case when DPM evaluates the cluster, as host H2 has less than 45% memory usage, DPM picks host H2 as candidate host in order to put H2 into standby mode. DPM runs DRS simulation without considering the host H2 as it is a identified candidate host to be put into standby mode. DRS simulation uses DPM new demand memory metric i.e. Active memory + X% of idle consumed memory. Active memory on H1 is 512 MB, hence idle consumed memory on H1 is 3GB-512 MB=2.5GB. It shows that memory demand on H1 is 3 GB as X is 100. Active memory usage for the VMs on H2 those are going to be migrated to H1 (only if DPM finalizes to put H2 into standby mode) is 512 MBs, hence idle consumed memory on H2 is 1GB-512 MB=512 MB, it shows memory demand for VMs on H2 is 1 GB as X is 100. Finally, total memory demand by DPM is 4GB (3GB from VMs on H1 and 1 GB from VMs on H2), 4GB memory demand is way out of memory utilization target range i.e. 81%. As total memory demand by all the VMs goes out of utilization target range(Default range 45%-81%), DPM does not see any value in putting H2 into standby mode as for DPM performance is preference than saving power consumption, hence DPM will not put any host into standby mode & consequently avoids memory ballooning. This example shows that when there is high consumed memory and low active memory usage in your environment, it is better to set DRS advanced option PercentIdleMBInMemDemand to 100.

Note: DRS memory demand metric also uses the same formula but in this post I have just focused on DPM.

Now I am sure that you understood how DPM’s new memory demand metric can be used to fine tune DPM behavior which in turn can help to avoid memory ballooning.

is there direct relationship between new DPM memory demand metric and ESXi host memory ballooning. Answer is NO. There is NO direct relationship between new DPM memory demand metric and VM memory ballooning. New memory demand metric just gives us configurable option to fine tune DPM to consider more consumed memory as future memory demand while making host power ON/Off decisions. This would keep more memory resources available in cluster. Hence, it should indirectly avoid the ballooning.
It is also important to note that, DPM will not power on the standby host only because ballooning is happening on other host in the cluster as there is no direct relation between ballooning and DPM. In this case as well, when DPM gets invoked, it will check the target utilization range and only if host memory utilization exceeds the range, it starts evaluating the standby host based on memory demand formula (active memory + X% of idle consumed memory) in order to take hosts out of standby mode.
However, memory ballooning on VMs may happen (when host(s) are in standby mode) very rarely as DPM already would have considered conservative X% value before putting hosts into standby mode i.e. DPM would have kept enough memory resources(of course it is depend on the value of X i.e. PercentIdleMBInMemDemand) available in the cluster before putting any host into standby. Even then if ballooning happens, it is mean that there is excessive memory over commitment and/or actual memory demand by powered ON VMs is more than anticipated by DPM (using active + X% idle consumed).

Can actual memory demand of powered ON VMs be more than anticipated by DPM (which was based on new demand metric)? Yes, it can happen in very rare cases that too due to highly unpredictable increase in memory workloads/usage . Ex. Say cluster has 2 hosts(H1 and H2) with 1 VM on each. Consider, VM on a H1 has 8GB memory allocated but only 3GB is consumed by VM at the moment & X is set to 100. If X is 100, DPM considers entire consumed memory as memory demand. Based on 3GB memory demand, DPM puts host into standby mode (consider 3 GB is available on other host H2) but unfortunately, the moment DPM put the host into standby mode, memory demand of the VM got increased & consumed memory for the VM reached to say 7GB (very corner case) which is 4GB more than DPM had just anticipated. Now if host H2 does not have memory to satisfy this memory demand, it can lead to ballooning. However, once DPM realizes that memory utilization range is exceeding the target utilization range, it again evaluates cluster to bring back the standby host. It is worth to note that if VMs shares,limits & reservations are misconfigured, it can lead to ballooning even if there is plenty of memory available on host. (More on this in next post).

I hope you enjoyed DPM memory behavior, please leave the comment if you have any query. Stay tuned for PART 3 post on ESXi memory ballooning & memory best practices.

If you want to have even more depth understanding of DPM, please refer below resources
1. White Paper on DPM by VMware
2. Great book by “Duncan & Frank”: VMware vSphere 5.1 Clustering Deepdive

PART 1: VMware DPM vs ESXi memory ballooning

One question is always getting popped into my inbox , the question is: can VMware DPM lead to ESXi host memory ballooning? if yes, is there any way we can fine tune DPM to avoid the memory ballooning? I explained it whenever possible but when this question keeps popping up again and again, I thought its better to write one posts to give overview of the DPM, its memory demand metric and how memory ballooning relates to DPM.

I have divided this post in 3 parts as follows
Part 1: DPM basic overview & its memory demand metric calculations.
Part 2 : DPM vs Memory ballooning & memory best practices.
Part 3: What are the ways we can fine tune DPM?

Today I have covered Part 1 : “DPM basic overview & its memory demand metric calculations”.

DPM basic overview:
As we already know that consolidation of physical servers into virtual machines reduces significant power consumption. VMware DPM (Distributed Power Management) takes this reduction in power consumption to the next level.
DPM is feature of VMware DRS (Distributed Resource Scheduler), once we enable DRS on cluster from vSphere client or Web client, enabling DPM is just a click away. DRS does dynamic CPU and memory load balancing across all ESXi in the cluster & DPM does the evaluation of each ESXi host in the cluster so that DPM can put one or more hosts into standby mode (Power OFF) to save the power consumption OR bring back one or more hosts from standby mode to meet the resource (cpu, memory) demand of virtual machines in the cluster. You might be wondering how DPM evaluates ESXi host? It is the Target Resource utilization Range that plays the crucial role. DPM calculates Target Resource Utilization Range as follows.

Target Resource Utilization Range = DemandCapacityRatioTarget ±
DemandCapacityRatioToleranceHost

DemandCapacityRatioTarget is the target utilization of the ESXi host in the cluster. By default this is set at 63%.
DemandCapacityRatioToleranceHost sets the tolerance value around target utilization of each ESXi host, by default this is set at 18%.
Hence, by default, Target Resource Utilization Range = 63±18 i.e. Range is 45 to 81%

It is mean that DPM try its best to keep the ESXi host resource utilization in the range between 45 and 81 percent. If resource utilization of cpu or memory on each ESXi host is below 45%, DPM evaluates that host for putting into standby mode (Power OFF). If the resource utilization exceeds the 81% of either CPU or memory resources, DPM evaluates ESXi host to bring back that hosts from standby mode.
Note: DPM considers CPU & memory as resource for evaluation however, In this blog post, we would be focusing only on memory resource.

Basic terms:
Active memory: Memory which is being actively used at any point of moment by VM. This keeps changing as the VM load increases or decreases.

Consumed memory: It is the memory consumed by VM since it is booted. Note that consumed memory is not the same as memory allocated to VM. Consumed memory can be equal to allocated (configured) memory if & only if VM consumes entire memory allocated to VM. ESXi host never allocates memory to any VM until that VM touches/requests the host memory. When VM is powered off consumed memory would be zero. It is good to note that every VM only can get min(Configured memory, specified limit). When VM is powered OFF, consumed memory would be zero. If there is no any limit set on VM, configured memory itself will be default limit.

DPM memory demand metric.

In earlier releases, DPM was just considering active memory as memory demand from each VM on the host. i.e. “DPM memory demand metric=active memory” which is aggressive. In order to control DPM’s aggressiveness, with version vCenter 5.1 U2c onwards and all the versions of vCenter 5.5, DPM can be tuned to consider idle consumed memory as well in DPM memory demand metric. i.e. DPM memory demand metric = active memory + X% of idle consumed memory.

1. Default value of the X is 25. X value can be modified by using DRS advanced option “ PercentIdleMBInMemDemand” on cluster level. We can set this value in the range from 0 to 100. When we set this value to 0, it is mean that DPM will be aggressive the way it was in earlier release & as we increase X value DPM keeps becoming less aggressive. If X value is 100, it is mean that DPM considers entire consumed memory as memory demand. (Consumed memory =active memory + idle consumed memory).

2. Example. : Say, we have one VM with 8192MB (8GB) configured memory. Consider since the VM is booted, VM has consumed 6144MB (6GB) memory from host but only 20% is being used actively, hence active memory would be 20% of 6144 MB=1228.8 MB. Idle consumed memory =6144-1228.8=4915.2 MB. If X value is 25 then DPM memory demand would be=1228.8 + 25 % of 4915.2 =2457.6 MB + overhead. Setting X to 25 means, DPM considers 25% of idle consume memory as a demand by VM to avoid performance impact. As we increase X, DPM becomes more conservative. Hence user needs to set the X value as per his environment & requirement.

3. DPM Power OFF recommendations: Based on Target Resource utilization range, DPM evaluates candidate hosts to put into standby mode (i.e. When utilization is under 45%), and then DPM takes help from DRS to run the simulations considering candidate hosts are powered off in the cluster. These DRS simulations internally use the DPM memory demand metric (active memory + X% of idle consumed memory) to calculate the memory demand by each VM in the cluster. These simulations will be used by DPM to see if there is improvement in Target Resource Utilization Range when candidate host(s) is powered OFF. If resource utilization of the all non-candidate hosts is within the target range (i.e. 45%-81%), DPM puts the candidate hosts into standby mode & saves the power.

4. DPM Power ON recommendations: DPM evaluates each standby host when resource utilization of the powered ON host is above 81%, and then DPM takes help from DRS to run the simulations considering standby host(s) is powered ON in the cluster. These DRS simulations internally use the DPM memory demand metric (active memory + X% of idle consumed memory) to calculate the memory demand by each VM in the cluster & distributes the VMs across all hosts. These simulations will be used by DPM to see if there is improvement in Target Resource Utilization Range when standby host(s) is powered-on. If resource utilization of the all hosts is within the target range, DPM generates host power ON recommendations.

I hope you enjoyed how DPM works in general, please do leave comment for any clarification & stay tuned for exciting PART 2 “DPM vs Memory ballooning & memory best practices.”

If you want to have even more depth understanding of DPM, please refer below resources
1. White Paper on DPM by VMware
2. Great book by “Duncan & Frank”: VMware vSphere 5.1 Clustering Deepdive

Schedule DRS & DPM in Off-Hours by using vSphere Web Client

In desktop client (aka VI client) we do not have option to schedule DRS, also scheduling DPM in desktop client we have only option i.e. ON or OFF DPM through “Change cluster power settings” scheduled task.  Recently I was exploring DRS & DPM scheduled task web client workflows & I should say DRS/DPM scheduled task web client workflow has been enhanced greatly. Now we can schedule a task for both DRS & DPM not only just to set ON or OFF but also to configure its automation level & threshold.  This is going to simplify admin efforts in maintaining DRS & DPM.  Is not it great? As everybody probably knows basics of DRS & DPM so I am not going to explain how DRS & DPM works. This post is about how to configure DRS/DPM scheduled tasks by using web client, as DPM is part of DRS, we can configure both DRS & DPM scheduled task in single window pane. Here is where you can get DRS schedule configuration tab.

DRS schedule configuration tab

Once you click on Schedule DRS tab, we will get the UI page where we can configure DRS & DPM.

DRS configuration:You can set Automation level, migration threshold & VM automation level as per requirement as below

DRS configuration

DPM configuration: In off hours now you can keep DPM in fully automated mode & make DPM threshold aggressive, these settings will lead DPM to put as many as hosts into standby mode & will generate power ON host recommendation only when it is absolutely required.  DPM_ConfigurationOnce the configuration is over as per the requirement, next important setting is to set time at which schedule needs to be executed to take configuration in effect. It is better if you choose schedule to execute on daily basis in off hours as shown in below UI, this will avoid schedule management efforts.Schedule settingsYou could see in above UI, you have various options to set the timings for schedule to be executed.

Note: Please note that you need to schedule one more scheduled tasks in order to change DRS/DPM settings before peak hours starts.

Here is one more useful blog post by “Frank” on DPM schedule tasks using desktop client.