Understanding Excess Capacity: Amazon EC2 Spot vs. Azure Low-Priority VM vs. Google Preemptible VM vs IBM Transient Servers

Introduction

Spot instances are transforming how people consume public cloud services. Spot instances are short-lived instances offered by cloud providers for a very low cost compared to on-demand or reserved instances. Cloud providers use spot market as a way to monetize their excess capacity. The price of spot instances vary with the supply and demand but, on average, users can save up to 80% compared to on-demand instances. The top four cloud providers, Amazon Web Services, Azure, Google Cloud and IBM, offer spot instances to their users. In this post, we will compare the spot instances offered by these three cloud providers to help users understand their strength and weakness, kinds of workloads they support, pricing model, etc..

Mentions in the Academy

The idea of cloud providers using excess capacity to sell low cost spot instances has been studied extensively in academic institutions all around the world. Tsafrir et al. from Technion – Israel Institute of Technology had done an analysis of AWS spot instance pricing, including historical pricing, to come up with a conclusion that Amazon sets spot prices using a random reserved price. Rajiv Ranjan et al. has done extensive analysis to conclude that most research favors a more spot market-based pricing model than a fixed pricing model. In their study, they have also found out that there is a big gap between using sophisticated pricing models for spot market pricing and the reality in how these services are consumed. They could not advocate a specific model that could help cloud providers with suitable margins while also giving users the relevant pricing advantage. However, Rajkumar Buyya et al. from The University of Melbourne, Australia, have come up with a spot pricing framework called Spot instance pricing as a Service (SipaaS) based on a pricing mechanism called Online Extended Consensus Revenue Estimate (online Ex-CORE) auction that provides an optimal profit for the cloud provider. They have tested their model on top of OpenStack and found it to be useful for spot instance pricing. They have released their framework under Open Source license. Rajkumar Buyya’s team have analyzed the pricing data from AWS spot instances and found that the pricing model fits with a mixture of Gaussian distribution.

While the academic community has been studying the spot pricing model extensively, we have also seen cloud providers and startups offer various services based on spot instance pricing. In this blog post, we will compare the offerings of some of these providers.

Amazon Web Services

Since 2009 (!) Amazon EC2 Spot Instances are offered by AWS based on their excess capacity in a spot market offering discounts up to 90% based on the supply and demand. Even though AWS offered spot instances through a bidding model in early days, they have since changed it to making them available based on available excess capacity available in their inventory and on a first come first service basis. The bidding model is made optional.  EC2 Spot instances can be used along with other AWS services like EMR, Auto Scaling, Elastic Container Service (ECS), Data Pipeline and AWS Batch.

Strengths
  • Offers 2-minute advance notice on spot instance removal, giving time for users to gracefully shut down or fall back to other instances
  • Very big advantage is that there is no time limit on the life of the spot instance like what Google offers
  • Offers EC2 Fleet as a way to orchestrate and manage spot instances along with on-demand instances based on a target price or target distribution across pools
  • Spot Instance Advisor allows users to determine where there will be the least disruption across regions
Weakness
  • Spot Instances are still not “application aware”  limiting the support to certain transient use cases
  • AWS cannot commit for consistent availability of this type of EC2 capacity
Considerations

AWS EC2 Spot instances are useful for various fault-tolerant and flexible applications, such as big data, containerized workloads, high-performance computing (HPC), stateless web servers, rendering, CI/CD and other test & development workloads. With EC2 Spot Fleet, you could use automation scripts to move workloads to other available instances (including on-demand instances) for long-running workloads to exist beyond the life of the spot instances. However, this adds additional operational overhead with risks for failures. But, with right automation, analytics and by leveraging a mixture of spot instances along with on-demand and reserved instances, it is possible to run varied and mission-critical workloads.

Google Cloud Preemptible VMs

Preemptible VMs offered by Google Cloud are short-lived low-cost virtual machines that can help users run fault tolerant workloads or other short-lived workloads. This is similar to Amazon EC2 Spot Instances but with some important differences. With the fixed price Preemptible instances, you can save up to 80% of on-demand instances. Google also offers Preemptible GPU instances and Preemptible VMs with Google Kubernetes Engine.

Strengths
  • Fixed discount and pricing with no uncertainties on the cost
  • No limitation on the instance type
  • Provisioning Preemptible instances is easier and involves just an addition of a bit in the command line
Weakness
  • Major weakness – maximum limit of 24 hours
  • Only 30 seconds notification before removal of the instance. Though this is enough for a graceful shutdown, it may be limited in certain failover use cases
  • According to internal research, and statical information,  certain instance types will be interrupted after less than 6 hours. This is causing major concern to run production workloads on top of PVMs, even though its stateless web servers/container.
Considerations

Google Preemptible Instances are suitable for batch jobs and other fault-tolerant workloads and it is not applicable for any other workloads. Organizations must be careful before considering Preemptible VMs for production or mission critical environments. They are short-lived and adds additional operational overhead to handle production and mission critical workloads. In order to leverage Preemptible instances for such workloads, organizations should consider using platforms that use automation and analytics to provide SLAs consistent with these workloads.

Azure Low Priority VMs

Launched in 2017, Azure Low Priority VMs. These VMs are offered from the excess capacity in the Azure data centers. Unlike in the case of AWS and Google Cloud, Azure offers an 80% discount on Linux Low Priority VMs and a 60% discount on Windows instances. The low priority VMs are available as a part of Azure Batch and VM scale sets. These instances are useful for batch processing workloads like media rendering and other large processing jobs, certain dev/test workloads, demos, etc.

Strengths
  • Fixed pricing offering some predictability in costs
  • Can work natively inside the Azure batch pools with on-demand instances.
Weakness
  • There are issues with scaling to hundreds of VMs. Often, some VMs are not available.
  • Low visibility into information on the resource availability. This makes porting and planning of VMs difficult
  • Since these VMs are available only in Azure Batch and VM Scale Sets, you cannot launch a single instance of Low Priority VMs
  • Limited integration to other Azure services, limiting developers from using these instances along with other Azure services
Considerations

Azure Low Priority VMs are useful in very limited use cases of batch processing and they are not well integrated with other Azure services. Since Azure Low Priority VMs are limited in their features and supported use cases, it is important to consider how these VMs can be leveraged to run other types of workloads using automation.

IBM Transient Virtual Servers

IBM Transient Virtual Servers are low cost short-lived instances offered by IBM based on the excess capacity at all their data centers across the globe. They are multi tenant virtual instances offering cost savings of 75% over standard on-demand instances. These instances support dev/test and batch workloads.

Weakness
  • Instances are deprovisioned without notice and can lead to disruption
  • Instances cannot be upgraded or downgraded and no support for local storage
  • Limited integration with other IBM cloud services
Considerations

IBM Transient Virtual Servers provides a cost effective way to run some short-lived workloads like batch processing and dev/test workloads

Conclusion

Every major cloud providers offer spot instances with a varying set of features. These short-lived instances have limited applicability and requires an operational overhead to extend their use beyond these use cases. Enterprises are faced with using on-demand and reserved instances for many of their workloads. This is a suboptimal way to use the cloud infrastructure. With automation and analytics, organizations can use a mix of spot instances, on-demand instances and reserved instances to run many workloads with guaranteed SLAs.

The following table summarizes various spot instance features offered by the four major cloud providers.