Running TensorFlow on EKS with Spotinst Ocean

Machine learning is becoming widely used in the tech industry to improve processes, enhance the user experience or existing applications, solve complex problems, and more. There are many different machine learning tools and algorithms available for developers. Machine learning platforms such as Tensorflow have become very popular and provide an easy way for users to learn and begin using it. As an organization’s use for machine learning increases, they will need to start worrying about the underlying infrastructure in regards to compute, deployment, and scaling.

To simplify infrastructure management, organizations began moving their infrastructure and workloads to the public cloud. Containers came along later and became a popular way to build, ship, and run applications but there was not an efficient way to manage and scale them. Then Kubernetes was born and became the de facto way to manage and orchestrate container-based workloads. As organizations use cloud services more and more, they will start to realize how expensive it can be.

Spotinst has been helping customers reduce their cloud computing costs for years now and recently came out with a new product called Ocean which manages Kubernetes clusters. In this post, I will show how organizations can run TensorFlow on a Kubernetes cluster managed by Ocean and explain how it can simplify Pod and infrastructure scaling as calculations become more complex.

Prerequisites

  1. A Spotinst Console account
  2. An AWS account
  3. AWS CLI
  4. TensorFlow Kubernetes Manifest

With the prerequisites out of the way, let’s learn about TensorFlow and Spotinst Ocean.

What is TensorFlow?

TensorFlow is an open source software library developed by the Google Brain team for high-performance numerical computation. It comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.

What is Ocean?

Ocean is the Serverless Kubernetes Engine. It takes away the pains of scaling and managing containers and nodes in a Kubernetes cluster. With Ocean, you can forget about mixing and matching instance types and trying to figure out when and how to scale nodes in the cluster. Ocean automatically ensures your containers are placed on the best possible mix of Spot, RIs, and On-Demand instances – Optimizing your clusters for both cost, availability, and performance.

Now that you know more about TensorFlow and Ocean, let’s begin the tutorial by creating a new Ocean Cluster running on Amazon EKS.

Creating an EKS Cluster with Ocean

To get started, click on Cloud Clusters under the Ocean section of the Spotinst Console followed by Create Cluster.

Choose EKS.

Complete Step 1 to create and use a Token that will provide connectivity from EKS to Elastigroup. For Step 2, set the desired Cluster Name, Region, and Key Pair.

When you are ready to proceed, click on Launch CloudFormation Stack. A new browser window will open where you can configure the CloudFormation stack. Enter the Stack name and EKS ClusterName to use.

Click Create. This step may take up to 10 minutes to complete. When the cluster is finally created, please return to the Ocean window in your browser.

For Step 4, we need to use the AWS CLI to configure connectivity to the EKS cluster with kubectl.

# aws eks update-kubeconfig --name MyKubernetesCluster

Follow the instructions under Step 5 to configure the AWS Authentication and click Done.

It will take a few minutes for the data to populate and for nodes to register.  After Ocean is set up, we can proceed to run TensorFlow on Kubernetes.

Launching TensorFlow in Kubernetes

Before TensorFlow can be run on Kubernetes, we will need to download the following manifest to our local computer.

With the manifest downloaded, we can use kubectl to run TensorFlow:

To access the TensorFlow GUI, copy the URL for the tensorflow service from the output above in your web browser.

The value for “Password or tokencan be found in the output from the previous kubectl commands. Copy and paste it here and click “Log in”.

Now you should be in the main screen of the Tensorflow GUI.

Running a Sample App

Now for the fun part, let’s run a machine learning sample application that uses the MNIST database of handwritten digits to train an image classifier with roughly 90% accuracy. To proceed, click on New followed by Python2 and copy and paste the code below in the proceeding window:

To run the code, click on the “Run” button. The final results will take a few minutes to complete.

tensorflow

Examining the TensorFlow Costs in Spotinst Ocean

Let’s return to the Spotinst Console and see how much we spent running TensorFlow on Kubernetes by using Ocean’s Showback feature. Showback breaks down the infrastructure costs of the containerized cluster and provides insights on each of the layers and application which can later be used to analyze the applications’ costs and perform chargebacks.

Looking at the output above, TensorFlow only cost $1.46.

Conclusion

In this post, I explained how to run TensorFlow on Kubernetes managed by Ocean. TensorFlow is a great machine learning solution for the masses and running it on Kubernetes makes perfect sense as organizations scale in production. Spotinst Ocean is the Serverless Kubernetes Engine that takes away the pains of scaling and managing containers and nodes in a Kubernetes cluster. TensorFlow users that use Ocean will no longer have to worry about the underlying infrastructure and can spend more time working with their machine learning algorithms.

To learn more about Spotinst Ocean, please check out our Product page.