How Lenskart Orchestrated Their Infrastructure And Saved 80% On EC2 Costs
"With Terraform and Spotinst, we created additional 4 new production environments and still managed to save 81%"
~ Nirbhab Barat, AVP DevOps at Lenskart
Lenskart is a largest online optical store with a mission, built to sell glasses within India in order to reduce the massive need for eyesight in what they deem “the blind capital of the world”.
In the process of achieving this mission, they’ve become one of India’s largest and fastest growing online optical store.http://www.lenskart.com/
Fast growth creates difficult architectural challenges
As a growing e-commerce company facing nearly 5M visitors every month, the technical team faced many architectural challenges. Growing infrastructural systems is never easy, but when you grow as fast as Lenskart has, the challenges tend to pile up faster than estimated.
For any e-commerce company, lowering COGS (cost of goods sold) is always essential, but with a long list of challenges on their plate, CTO Pankaj Kankar and his engineering team just didn’t have the capacity to focus on lowering their high cloud costs. As a young company, Pankaj’s main focus was simply stabilizing the processes and internal architecture. Technical debt was growing along with the company’s revenue and the engineering team was working overtime just to keep the site stable.
Pankaj recognized that these kinds of architectural issues only grow at scale so building out a scalable system for provisioning and orchestration was the team’s main focus in 2017.
The challenge of scaling architecture
Pankaj’s first step was to build a SWAT team to put out some of the fires but Pankaj knew that this wouldn’t be a viable long-term solution. As great as the speedy growth was for the bottom line, technical debt and cloud costs were quickly rising.
To eliminate future debt, Pankaj quickly set his sights on optimizing DevOps processes. He knew that he couldn’t build it out alone and looked for external platforms to establish a clear system that would manage itself.
Knowing this would require several tools, Terraform was adopted to talk to the various cloud vendors they would be adopting without needing to manage each of them individually. Similarly,
Opscode Chef was adopted to streamline their config management and
Cloudflare to lower latency for their public DNS.
Little time for managing costs
Facing the task of fixing the entire infrastructural processes, however, meant that the team didn’t have the capacity to focus on reducing the high cloud costs. Still, as an essential element of any e-commerce company, Pankaj knew they needed to find a way to reduce costs before they went completely out of control.
CPU utilization, purchasing Reserved Instances, and starting to leverage
Spot Instances were evaluated as solutions to reduce their skyrocketing EC2 costs, but each required lots of work. Needless to say, the last thing Lenskart wanted when looking for help reducing costs was another platform that they’d need to manage. The DevOps team had enough on their plates already, so a turnkey solution was a necessity.
Terraform integration was a must
Orchestration quickly became a necessity for Lenskart’s DevOps team. The ease of management of various tools through
Terraform was their way of mitigating the time and effort that would otherwise be required when managing several projects and tools concurrently.
After reviewing Spotinst, the team was quickly impressed by the savings potential but it was Spotinst’s Terraform integration that made it a natural fit. The lack of effort required to install Spotinst along with the long list of integrations with the other tools they had adopted made implementation quite simple for Pankaj and his team.
And by managing Spot Instances for Lenskart, Spotinst played a vital role in reducing the high cloud costs, allowing Pankaj’s team to focus on their main tasks. Spotinst was not only reducing the cost but also doing basic DevOps work on its way like autoscaling based on CPU utilization, taking continuous AMI backups, shutdown spot machines based on time of the day etc.
And with Terraform managing it all, immutable deployment was a breeze.