How to deploy a scalable Scikit-Learn machine learning model in Production with Kubernetes (Part 3)

3 min readMay 20, 2021

In the previous chapter we have successfully deployed a model so that it can be consumed by end users.

However, in real life scenario any endpoint is hit by hundreds to millions of users per seconds. So we need to think about scalability aspect also.

Creating Docker image file:

In this section we are going to create docker images of serving application and later will deploy them into Kubernetes cluster.

To run the application we are going to use Gunicorn server. Before that we have to install gunicorn python package.

$ pip install gunicorn

2. Create a requirements.txt file.

$ pip freeze >> requirements.txt

2. Create a Dockerfile .

3. Now create a docker image:

$ docker build -t soumyadipdutta2007/scikit-learn-demo-api -f Dockerfile .

4. Before testing the newly created image make sure GOOGLE_APPLICATION_CREDENTIALS is set in environment.

5. Run the image.

docker run -e GOOGLE_APPLICATION_CREDENTIALS=/tmp/keys/service_account_file.json -v $GOOGLE_APPLICATION_CREDENTIALS:/tmp/keys/service_account_file.json:ro --network host soumyadipdutta2007/scikit-learn-demo-api

6. Now make a POST request with data to http://127.0.0.1:8000/predict/ If you get any response that means it’s working perfectly.

7. If you do not have an account a DockerHub account please create a account here: https://hub.docker.com/ If you are not logged in to Docker CLI tool Please login first.

$ docker login

8. Push this image to DockerHub.

$ docker push soumyadipdutta2007/scikit-learn-demo-api

Create a GKE (Google Kubernetes Engine) Cluster:

Now we have to deploy docker image to Kubernetes cluster. If can choose alternative Kubernetes cluster (AWS EKS, DigitalOcean Kubernetes or On-Premise).

Go to Google Cloud console. Then Kubernetes Engine -> Clusters -> CREATE
Select Autopilot mode and click on configure.(You can choose Standard mode also but it requires more configuration parameters and if you don’t use the cluster still you have to pay for resources). In Autopilot mode you have to pay for resources only when you use resources.
Click on CONFIGURE .

4. Now use this value for cluster settings:

Name: As per your choice
Region: As per your choice
Networking: Public Cluster

5. Click on CREATE . It may take 10–15 mins to complete the setup process of the cluster

6. Now click on cluster that you have created. Then click on CONNECT -> RUN IN CLOUD SHELL

7. A GCP Cloud Shell will be opened with some auto-generated command to connect with Kubernetes cluster. Press Enter . If it asks for authorization click on Authorize.

Now we are connected with Kubernetes cluster.

Create a deployment file:

Now we have to create a deployment yaml file where we will mention Kubernetes deployment and services configuration.

1. First create a Kubernetes deployment files. Create a file called scikit-learn-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: scikit-learn-deployment-demo-api
  labels:
    app: scikit-learn-deployment-demo-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: scikit-learn-deployment-demo-api
  template:
    metadata:
      labels:
        app: scikit-learn-deployment-demo-api
    spec:      containers:
        - name: backend
          image: soumyadipdutta2007/scikit-learn-demo-api:latest
          imagePullPolicy: Always
          resources:
            requests:
              memory: "2048Mi"
              cpu: "2000m"
            limits:
              memory: "4096Mi"
              cpu: "4000m"
          volumeMounts:
          - name: google-cloud-key
            mountPath: /var/secrets/google
          env:
          - name: GOOGLE_APPLICATION_CREDENTIALS
            value: /var/secrets/google/key.json
          ports:
            - containerPort: 8000

2. Write a Kubernetes Loadbalancer configuration to access the service from outside network.

apiVersion: v1
kind: Service
metadata:
  name: scikit-learn-deployment-demo-api-svc
  labels:
    app: scikit-learn-deployment-demo-api-svc
spec:
  type: LoadBalancer
  ports:
    - port: 8000
      targetPort: 8000
  selector:
    app: scikit-learn-deployment-demo-api

3. Save the file and upload to Google Cloud Shell. In case of if you want to know how to upload files to cloud shell check it here: https://cloud.google.com/shell/docs/uploading-and-downloading-files

4. Before deployment we have to create some environment variables. Upload service account file to cloud shell. Then run this command:

$ kubectl create secret generic service-account-key --from-file=key.json=/path/to/service_account_json_file.json

5. Deploy the service into Kubernetes cluster:

$ kubectl apply -f scikit-learn-deployment.yaml

6. Check whether pods are getting created or not.

$ kubectl get pods

6. Wait for few minutes and check the public IP of the service:

$ kubectl get services

Take a note of EXTERNAL-IP value.

7. Now go to http://EXTERNAL-IP:8000/predict/ to check whether everything is working perfectly or not.

8. Now send some value to get some prediction.

Now we have successfully deployed a SCALABLE machine learning prediction API service. In the next chapter we are going to discuss about some best practices of the deployments.

To get the source code of this tutorial please visit this link: https://github.com/dsoumyadip/scikit-learn-deployment-demo

How to deploy a scalable Scikit-Learn machine learning model in Production with Kubernetes (Part 3)

Creating Docker image file:

Create a GKE (Google Kubernetes Engine) Cluster:

Create a deployment file:

Written by Soumyadip Dutta