Spot instances are a great way for organizations to save money on their cloud computing costs by taking advantage of Amazon’s unused computing capacity. They have become attractive to users because they have steep discounts compared to running on-demand or reserved instances. However, the downside to running workloads on spot instances is that AWS can terminate them without giving enough of a warning.
This concern makes system administrators apprehensive to run databases like Apache Cassandra due to data consistency concerns. In this post, I will show how to use Elastigroup to safely deploy and manage Cassandra with persistent storage on top of spot instances and prove the cost savings.
What is Apache Cassandra?
Apache Cassandra is a powerful open-source NoSQL database. A Cassandra cluster is made up of multiple compute nodes and can scale linearly for more computing power and data capacity. Cassandra is very fault-tolerant because data is replicated across cluster nodes and has configurable consistency levels for data access. Later in this post, we will spin up a cluster of two Cassandra nodes and show how Elastigroup can manage and scale up a cluster.
What is Elastigroup?
Elastigroup is an application scaling service by Spotinst designed to optimize performance and costs. Using predictive algorithms, Elastigroup reliably leverages Spot Instances and supports all major cloud providers such as AWS, Microsoft Azure, and Google Cloud while removing the risk and complexity, providing a simple orchestration and management at scale.
Spotinst Elastigroup predicts EC2 Spot markets behavior, capacity trends, pricing, and interruptions rate. Whenever there’s a risk of interruption, Elastigroup acts accordingly up to 15 minutes ahead of time, ensuring 100% availability. To play along with this blog post, you will need access to the Spotinst Console. If you do not have access yet, please sign up for a 14-day free trial.
Before we can continue, we need to cover the prerequisites so that you can follow along with the technical instructions.
- A Spotinst Console account
- An AWS account
- A VPC in AWS with a 10.0.0.0/16 subnet (Does not work using a different subnet)
- A security group with SSH access
- The Elastigroup Cassandra configuration file downloaded to your computer (save the contents of the GitHub Gist to cassandra-elastigroup.json)
- An EC2 Key pair to SSH into the instances
With the prerequisites out of the way, let’s log in to the Spostinst Console and create an Elastigroup for Cassandra
Deploying the Cassandra Cluster with Elastigroup
Once logged in the Spotinst Console, click on Elastigroups followed by clicking on the CREATE button.
From here we can choose a particular use case for the new Elastigroup. Under the Big Data and DB section, choose Cassandra.
Now we can import the Cassandra configuration for Elastigroup by clicking on the IMPORT button and choosing JSON File. Click Choose File and select the JSON file to upload from step 5 of the prerequisites. When the upload is complete, you will be redirected back to the previous screen with the configuration imported. Let’s take a look at the following values:
- Enter a name for the group.
- Choose the appropriate region for your AWS account.
- Under Capacity, the target field defines the number of instances that Elastigroup will launch once created. The minimum and maximum values set the scaling borders for the Elastigroup. By setting a value of 2 in each field, a static Cassandra cluster with 2-nodes will be created.
- Please click next to proceed.
Change the VPC and Availability Zone to match the ones created in step 3 of the prerequisites. An important section to pay attention to is the SPOT MARKET SCORING section.
Spot Market Scoring is a unique feature that helps you choose the best spot markets. The scoring section provides a visual aid showing the number of separate spot markets available based on the number of Availability Zones and Spot Types selected. The scale goes from 0-100 where 0 is a non-functional market and 100 will provide the best price and longevity for the spot instance. With the Spot Market explained, let’s proceed with the configuration.
Scroll down the page and choose a Key Pair and Security Group under Launch Specification. When finished, click on ADDITIONAL CONFIGURATIONS to expand the section.
Under ADDITIONAL CONFIGURATIONS, we can see that Public IP is checked and this will allow us to use SSH to access the nodes later. The USER DATA section contains a startup script for the instances that will download, compile, configure, and start Cassandra as well as handle the EBS mounts. Scroll down and click on the STATEFUL section.
This section is important because it handles disk and network persistence for the instances. In a Cassandra cluster, each node is identified by an IP address and owns a certain range of data in the cluster. If a new instance was added to the Cassandra cluster without disk and network persistence, the instance will be added as a new node with no data. With that in mind, It is important to persist the root and data volumes so Elastigroup can create new instances if one is replaced. WIth the original private IP and volumes attached, the instance will join the cluster in the exact place it should be.
Under Maintain Private IP, there is two static IP’s set according to the VPC created in step 3 of the prerequisites. In a Cassandra cluster, seed nodes bootstrap nodes together in the cluster. In this Elastigroup configuration, the first two nodes deployed will be seed nodes. Nodes added to the cluster later will reach out to them to join the cluster.
To conclude the setup, click the Next button twice followed by CREATE.
When both instances are in a running state, we should have a 2-node Cassandra cluster ready to go. However, it is a general rule of thumb in the tech world that you should have three copies of your data or it does not exist. Let’s see how Elastigroup can help with that.
Scaling Up Cassandra
Scaling instances up or down is very easy in Elastigroup. Let’s see how easily we can add a third node to the Cassandra cluster. From the Actions menu, select Manage Capacity. In the dialog window that appears next, enter a value of 1 in the “Increment group instances by *” box followed by clicking on the Launch button.
When all three instances are in a Running state, we can check that the cluster is running properly.
Verifying the Cassandra Cluster Setup
In this section, we can get our hands dirty on the command line and connect to the Cassandra cluster to make sure it is set up properly. First, we need to get the public IP address of one of the first two nodes that were created. Select one of the first two nodes and copy the Public IP address.
ssh -i ~/.ssh/my-ec2-key.pem ubuntu@public-ip
To check the status of the Cassandra cluster, use the nodetool command as shown below:
The UN status means Up/Normal. When all of the nodes have a UN status, the cluster is healthy. If the node does not join the cluster after several minutes, please select the node from the Spotinst Console and select Recycle:
A new instance will be created and will hopefully join the cluster in a few minutes.
Now that we saw how easily Elastigroup can scale up instances, let’s see how it handles a failed instance.
This is the fun part of the blog post where we will sabotage the Cassandra cluster by failing a node. To get started, log in the AWS EC2 console and terminate the newly added instance:
After several minutes, the Spotinst Console will detect that the instance has failed and then recycle it.
Also at this time, Cassandra should be reporting the node as down (DN):
It is important to understand how the recycling process works. The node resources such as the root and data volumes and IP address are detached from the old instance and transferred to the new one. With those resources available on the new instance, the node will join the Cassandra cluster without any issues.
When the recycling operation is complete, the node should be part of the cluster again:
Thanks to Elastigroup’s stateful capabilities, the node rejoined the cluster as the existing node without any issues or data loss. If AWS decides to terminate the spot instances later, Elastigroup will create new instances using the existing resources and downtime will be minimal.
Now that we have been running a Cassandra cluster for a few minutes, let’s take a look at how much money was saved using Spot instances.
Cost and Savings
The whole point of using Spotinst to manage your Spot instances is to save money. We can see the cost savings from the Spotinst Console by clicking on the Elastigroup for the Cassandra cluster.
This 3-node cluster was only up for 5 hours and could have cost $1.39 by using on-demand instances. Since we are using Spotinst to manage the workload and spot instances, the cost is only $0.42 cents resulting in a 69.99% savings just for 5 hours. Now it is time to think about the future with this food for thought: imagine the cost of running this Cassandra cluster in production for a longer period of time with even more nodes.
In this post, I went over Spot instances and explained how they are a great way for organizations to save money on cloud computing. With the downside of spot instances being terminated with little warning, organizations need an efficient way to manage them and that is where Spotinst Elastigroup comes in to save the day. Using Elastigroup, I explained how to set up a 3-node Cassandra cluster on spot instances and how to scale up compute capacity and recover from a node failure. Thanks to the Spot Market, I was able to choose instances that worked best for my budget and seen the savings instantly in the Spotinst Console.