I have been asked recently, why the numbers for total Storage capacity seen in vCenter Server not match reality? Specifically the numbers for Free and Capacity are much more than there really is. Allow me to attempt to help explain this.
The short answer to this is because vCenter Server and ESXi, natively do not understand thin provisioned block storage nor NFS storage. They naively take figures presented by the storage as-is. Although you may argue it is right or wrong, one must understand the integration of the technologies to fully make sense of things.
The long answer…
In the vCenter Server Cluster Summary page, the storage numbers presented are simply the sum of all datastores visible to each host in the vSphere cluster. Specifically all local (non-shared) and shared datastores are taken into account.
The capture above only caught the Capacity for each datastore, hence I will use that as the example. If you sum up the numbers 9.16TB, 9.34TB, 52GB, 52GB, 52GB, 52GB, you should get the 18.71TB figure shown in the Cluster Summary Capacity. The numbers for Used and Free capacity behaves the same way.
Such behaviour is rightly so, as that is simply how the widget behaves. However it can be misleading, as that number may not reflect real physical capacity. Also the number includes local datastores which are often not supposed to be used for VMs. Adding irrelevant numbers can skew the overall perception.
Therefore, more often than not, you must be careful when using the Cluster Summary number for the purpose of capacity management. This would be true for most advance Enterprise Storage where Thin Provisioned Block Storage is presented to ESXi, or when it is NFS based storage, like what Nutanix presents to ESXi.
Before I provide a recommendation, allow me to relate the datastore view to Containers in Prism. We shall focused on the shared datastores, which are apjpoc004ctr01 and another-ctr01.
Side note: it is important to keep Datastore names to be identical to Container names. While it works to rename a Datastore, it will cause a lot to confusion you don’t want. Do keep the names identical.
In this example, the actual amount of Free (Logical) space remaining on the cluster is 9.16 TiB. This is evident from the list above as both containers reflects that number. These numbers can change if advanced settings like Advertised Capacity and Reserved Capacity for these containers have been customised from the default of blank. (I’ll explain the effect of these settings in another post.)
Relating back to what is seen in vCenter Server, the numbers under Max Capacity column in Prism (Fig 3) will match the Capacity column in Datastore list in vCenter Server (Fig 2). Similarly (but not illustrated), the Free and Used figures will match as well.
We cannot, however, rely on the Free and Capacity numbers shown in vCenter Server Cluster Summary (Fig 1). Due to the nature of the calculations, it can un-intentionally over state the numbers.
Side note: Notice the storage units labelled in Prism are TiB and GiB, whereas in vCenter Server it is TB and GB. To be true to the computer science behind the numbers presented, the correct units ought to be TiB and GiB. These are read as “Tebibytes” and Gibibytes“. Read more about these units here. Suppose vCenter Server is simply following typical conventions as most are used to seeing MB, GB, TB, etc.
Recommendation for Storage Capacity Planning
On Capacity Management, specifically to determine when additional physical capacity needs to be added, my recommendation is to primarily look at the physical space utilisation of the Nutanix cluster in Prism. This is the same as all environment providing thin provisioned LUNs to ESXi.
Some may say the above is insufficient for capacity management. It really depends on what you are after. To know when more physical storage needs to be purchased/added, that is enough. What doesn’t get covered would be individual datastores running out of space before the cluster.
How does that happen?!?
With default settings, all Nutanix storage containers will share a common pool of free space. Hence they ought to have the same free space figure. In this case, if one runs low, it should be the same throughout. The exception happens if a container has Advertised Capacity configured to a number smaller than what the cluster provides. As a result, that container can run out of space sooner than the cluster. This means the datastore can reach 100% utilisation sooner than the cluster.
To be thorough, operationally we need to monitor the Datastore utilisation level in vCenter Server, as well as the cluster physical utilisation in Prism.
Always fully understand the solution that you are working on. As different layers get abstracted, the complexity increases as things may not be as they seem. Keep an open mind so that you can recognise when complexity is getting too much and always look out for better ways of doing things.
Specific to this topic, one option to consider is to keep to having one container in the cluster. In many scenarios this is recommended as helps to simplify many things. With a single mounted container, the figures in Cluster Summary are also closer to actual. #KeepItSimple