One of the biggest advantages of virtualisation is that Virtual Machines (VMs) are easy to provision and deploy, as one does not have to order a physical machine, wait for it, find space for it and then deploy it.
The problem is, the very advantages that VMs offer – cheap, quick easy and effective – make it easy for a company to deploy too many of them without disposing of those that have become obsolete, resulting in virtual server sprawl.
Server sprawl is a term used to describe an environment where a company has numerous, stale and under-utilised servers, which take up more space and consume more resources than can be justified by their assigned workload. This can ultimately inflate a company’s operational costs, as maintaining these VMs require time, money and IT personnel.
Assuming that on average, 30-40% of VMs in a typical enterprise eventually become under-utilised and part of a server sprawl, this could result in a waste of 30-40% of production resources including storage space, processing power and vRAM.
The causes of virtual server sprawl
Virtual server sprawl is the result of smaller seemingly legitimate decisions that IT may take that eventually build up to something that hurts that company’s operations. Some of them are:
Lack of adherence to policy – Sometimes, the policy set in place to govern the deployment of VMs is not adhered too. This may result in servers that are created and not looked after as per the protocol or the protocols set in place for cleaning up stale VMs that have not been activated as of yet.
The one-off virtual servers that fall outside normal parameters – Some organisations create, use and abandon hundreds of VMs as a normal part of their operations. They usually have well-defined processes to manage the issue. In instances where they occasionally create VMs that fall outside their regular parameters, it may fall through the cracks. Or it may not be directly “owned” administrators, making them wary to delete a machine they don’t know much about, as deletion is permanent and results in data loss.
The fear of disrupting a perfectly good system – When in doubt about the purpose of a system or its workload, an administrator may also be reluctant to disrupt a seemingly well-functioning system, only to have virtual server sprawl creep up on them.
The impact of virtual server sprawl
Virtual server sprawl can have a devastating effect on IT resources and the company’s bottom-line:
– IT staff waste time supporting servers that no longer add value to the business.
– In an environment where there is VM sprawl, stale VMs continue to be stored in tier 1 storage space, which is typically more expensive and of higher quality, instead of moving them to a less expensive storage environment.
– Each VM uses network ports, CPU and RAM, a wasted cost and lost opportunity for alternative usage of these resources for the organisation.
– Unchecked server sprawl results in longer back-up windows due to unnecessary replication of VMs, ultimately affecting disaster recovery operations.
– The organisation risks losing focus on what are mission-critical workloads and may inadvertently implement back-up and recovery processes which, while they may be the most efficient, do not meet the requirements of Service Level Agreements (SLAs) of the individual applications.
Resolving virtual server sprawl
One of the most effective approaches to manage virtual assets is to implement an automated system that keeps a constant watch over the environment and takes an automated view to the issue of server sprawl.
The solution should provide a policy-based framework that has a lifecycle-based solution and integrates into the vCenter console of everything necessary to provide organisations with the best of both worlds. It should feature:
– The ability to retain VMs for the long-term without adversely affecting the overall performance of the environment or consuming production resources
– Maintaining ongoing access to all VMs so that they can be placed under production at a moment’s notice should the need arise.
It also important to move virtual server management into the organisation’s back-up and recovery cycle, ensuring that sprawl control becomes part of the overall data protection lifecycle. The framework also allows administrators to define different rules for different classes of machines.
The protection cycle summarised:
1. Identify target VMs – The first step in managing “unused” VMs is to shut them down and leave them in place. The criteria for a machine to be declared as unused differ across different types of organizations. Therefore, VM monitoring tools should be easily configurable to use one or multiple metrics and thresholds to be classify VMs as “unused.”
2. Relocate VMs – The monitoring software should allow the administrator to define the interim storage destination for powered down VMs. The solution should allow the administrator to fully configure their options around archiving and to define the interim storage destination as well as storing the VM using thin provisioning, which can conserve additional storage resources.
3. Archive – Once dormant VMs have been powered down and offloaded to secondary storage, they can be moved to longer-term archival storage.
4. Ensure real-time VM recovery – The system must ensure that archived VMs can easily be accessed on demand in real-time. The VM user should be able to do this through a self-service portal, so they can browse through the contents of the archived VM and restore single files as needed and have the option to bring the retired VM back into production as quickly as possible.
By helping administrators take safe, proactive steps to eliminate dormant VMs, organisations can reduce their IT costs, which can have a positive impact on their bottom-line.
By Johan Scheepers, Commvault Systems Engineering Director, MESAT