ECS Autoscaling On Spot Saves $40K Monthly
ClearCare migrated ECS workloads to Spot and saw 80% savings after a month of activity
Cost Optimization is a priority, but time and complex architecture prove limiting
ClearCare was founded in 2010, growing steadily over the course of the past seven years. After a $60M growth equity round of funding in 2016, usage spiked quickly. As a health-tech company, reliability is essential to their customers, so their DevOps team focused heavily on SRE (systems reliability engineering). As scale increased, costs containment quickly became a priority.
“Our VP of Engineering got a huge bill from AWS. But the complexity of our architecture made it difficult to optimize costs without dedicating significant internal resources to that effort.” said Glenn Poston, ClearCare’s Manager of Systems Reliability Engineering and DevOps.
Following best practices of AWS for reliability, it was difficult to quickly hone in on a cost optimization strategy that was practical, given the size of ClearCare’s customer base and competing for internal priorities.
“The complexity of our architecture – immutable infrastructure rollouts, multi-AZ deployment, auto scaling groups, Amazon ECS – made it difficult to optimize costs without dedicating significant internal resources to that effort.” They knew they needed to proactively prevent high costs, but with reliability as the primary focus, the team wanted to invest in cost optimization but in a very efficient manner.
The path to cost optimization
The first step towards optimizing costs was getting a clear picture of their cloud spend. After exploring few alternatives, they identified a cost monitoring solution with the ability to show simple data yet still allowing the team to drill in on specific areas.
With a monitoring tool in place, the team identified a few obvious ways to cut costs- rightsizing instances, reducing their DB snapshots from daily to weekly or monthly, and setting S3 lifecycle rules. Though these measures were necessary and helpful, there was still a lot of room for improvement.
ECS optimization was a major challenge
They had previously explored implementing
Spot Instances to reduce costs but with the customer uptime demands combined with ClearCare’s complex architecture, “EC2 Spot just wasn’t an option,” said Poston. When Poston stumbled upon Spotinst, he “thought it was too easy to be true.”After discovering Spotinst, they decided to try it out.
“It was really an out-of-the-box solution,” said Poston. “It worked exactly as pitched in the demo.” The team quickly used the Spotinst platform to migrate their
ECS workloads onto Spot, seeing 80% savings within a month of their initial POC.
“It was really an out-of-the-box solution. All we had to do was turn it on and we were done.”
Perhaps more importantly, “the Customer Success team are incredibly timely and helped us adopt the platform for our specific needs.”For ClearCare, these needs were all around ECS autoscaling.
Large-scale compute clusters are expensive. Ensuring full utilization is essential to achieving 100% cost-efficiency.
After pointing out the need to the Spotinst team, one week later, a personalized ECS autoscaler was fully fleshed out. Within a day, ClearCare was saving an additional 60% on compute costs.
What’s next for ClearCare
Security and reliability are ClearCare’s primary focus from a DevOps standpoint. With Spotinst implemented, the ClearCare team was able to blow away their cost optimization goals without shifting much focus from the primary goal of systems reliability.
Though they are still eyeing more cost optimization projects, Poston is confident that Spotinst delivered most of the cost related returns they expected. After implementing a few lengthy rightsizing projects along with S3 optimization, the impact on savings was minimal compared to running on Spot. And using Spotinst to handle spot management, Spot implementation was actually easier and less risky, considering their customer demands for high availability. It was a win-win for ClearCare and their customers.
As Poston said, “all we had to do was turn it on and we were done.”