Clockwork Solutions Saves 86% on Non-GovCloud EC2 costs by Running Kubernetes + Rancher with Elastigroup

Clockwork Solutions enables industrial and defense MRO professionals to reduce spare parts and maintenance expenditures while extending the life of, and meeting availability goals for, their mission-critical assets. Clockwork’s consultative and data-driven approach helps customers significantly decrease operational and capital expenditures while increasing organizational and asset performance.

The Challenge

When organizations are running their workloads on the cloud, they will begin to realize that controlling costs is crucial as workloads and infrastructure scale. While scaling is important, it does add a few complexities such as scheduling workloads across compute resources properly and downscaling to use only the resources that you need. These complexities require heavy thinking and a lot of work to solve on your own. Clockwork Solutions, like many organizations, are looking for ways to reduce their cloud computing costs for their development environment on Amazon Web Services (AWS) and find ways to manage their infrastructure resources properly. At the time, they were running Rancher 1.6 and was planning to upgrade to version 2.0 and heard via word of mouth from a former colleague at SpotHero on how Spotinst Elastigroup can reduce costs and scale their Kubernetes Pods and Infrastructure automatically and efficiently.

Why Spotinst

An efficient way to reduce costs on AWS is by using excess-capacity instances known as Spot. Users can reduce costs up to 90% compared to the default on-demand instance types. One of the challenges in running workloads on Spot is availability. Since AWS can terminate the instance with little warning, It can be difficult to migrate the workload with little notice, especially during peak times.

Clockwork Solutions chose Spotinst Elastigroup because it provided the integration with Rancher to run their Kubernetes Pods on Spot Instances to reduce costs. There is more to the story than simply reducing costs, Spotinst Elastigroup also managed the scaling of their Kubernetes Pods and infrastructure. Let’s dive into how Elastigroup works to better understand how Clockwork Solutions is using it.

Communication between Kubernetes and Spotinst is done through the Spotinst Controller which is a pod that resides within the Kubernetes cluster. The controller is responsible for collecting metrics and events. The events are pushed via a one-way secure link to the Ocean SaaS for business logic and capacity scale up/down activities.

The Ocean SaaS is responsible for aggregating the metrics from the Spotinst Controller and build the cluster topology. Using the aggregated metrics, the SaaS component is applying other business logic algorithms such as Spot Instance availability prediction and Instance size/type recommendations to increase performance and optimize costs via workload density instance pricing models (across On-Demand / Reserved and Spot Instances). This is all done autonomously without the administrator having to worry about sizing and scaling.

“A significant part of my job is vetting and integrating new technologies into our projects and workflows; SpotInst, by far, was the easiest to set up in terms of complexity and time while providing some of the biggest impacts. ” – Sean Brooks, Senior DevOps Engineer.

There are two key ways in which Elastigroup helps create the most efficient container scaling possible:

  • Tetris Scaling – To cut down on poor efficiency for Kubernetes environments, Elastigroup will analyze event messages when pods fail to start (such as insufficient memory, insufficient CPU, etc). With these messages analyzed, Elastigroup will launch an additional instance of the required size and type. This means that scaling is totally optimized to be as efficient as possible.
  • Smart Scale Down – Elastigroup will automatically detect and scale down idle instances, where a less than 40% utilization (in terms of both Memory and CPU) has been recorded for a specified number of consecutive periods. When an idle instance is detected, Elastigroup will locate enough spare capacity in the other instances in the cluster, drain the instance pods, reschedule these on other instances and terminate the idle instance. This means that Clockwork Solutions workloads were constantly and automatically self-optimizing with the help of Elastigroup.

The Results

Clockwork Solutions worked with Spotinst staff to get familiar with the product and in a few days time, the Spotinst integration was part of their Rancher 2.0 rollout, and they were able to start maximizing the efficiency of their deployment. After the integration was complete, Clockwork Solutions pushed their entire development work to Kubernetes. After using Elastigroup, they were able to reduce their AWS EC2 development costs by 86% outside of GovCloud.

“So far we’re loving the Spotinst platform and it has really made a big difference in our operations. We haven’t had any problems with the service and Spotinst live support is phenomenal so it’s pretty much been smooth sailing. It’s unfortunate that GovCloud doesn’t offer Spot instances.” – Sean Brooks, Senior DevOps Engineer.