How to Serve TensorFlow Models in Amazon SageMaker

Amazon SageMaker is an end-to-end machine learning platform that simplifies the process of building, training, and deploying machine learning models. It provides a secure, flexible, and scalable platform for data scientists and developers to build and deploy machine learning models. This article will guide you through a step-by-step process of serving TensorFlow models using Amazon SageMaker.

Amazon SageMaker is an end-to-end machine learning platform that simplifies the process of building, training, and deploying machine learning models. It provides a secure, flexible, and scalable platform for data scientists and developers to build and deploy machine learning models. This article will guide you through a step-by-step process of serving TensorFlow models using Amazon SageMaker.

CTA

Table of Contents

  1. What is TensorFlow Serving?
  2. What is Amazon SageMaker?
  3. Step-by-Step Guide to Serve TensorFlow Models in Amazon SageMaker
  4. Conclusion

What is TensorFlow Serving?

TensorFlow Serving is a powerful, flexible serving system designed for production environments, making it easy to deploy new algorithms and experiments while keeping the same server architecture and APIs. It provides out-of-the-box integration with TensorFlow models but can be easily extended to serve other types of models.

What is Amazon SageMaker?

Amazon SageMaker is a fully managed machine learning service from Amazon Web Services (AWS). With SageMaker, data scientists and developers can quickly and easily build, train, and deploy machine learning models at any scale.

Step-by-Step Guide to Serve TensorFlow Models in Amazon SageMaker

Step 1: Building a TensorFlow Model

For this example, we’ll use a very simple network architecture, consisting of two densely connected layers and train it using the famous Iris dataset.

# import libraries
import tensorflow as tf
from tensorflow import keras

# Create a simple model
model = keras.Sequential([
    keras.layers.Dense(10, activation='relu', input_shape=(None, 5)),
    keras.layers.Dense(1, activation='sigmoid')
])


# Compile the model
model.compile(optimizer='adam',
              loss = "sparse_categorical_crossentropy",
              metrics=['accuracy'])

# Train the model

EPOCHS = 50
BATCH_SIZE = 32

EARLY_STOPPING = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss", mode="auto", restore_best_weights=True
)

history = model.fit(
    x=train_np,
    y=train_labels,
    validation_data=(test_np, test_labels),
    callbacks=[EARLY_STOPPING],
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
)

Step 2: Saving the TensorFlow Model

To set up hosting, the first step involves importing the model from training to hosting. This process begins by exporting the model from TensorFlow and saving it to the file system. Additionally, the model needs to be converted into a format compatible with sagemaker.tensorflow.model.TensorFlowModel, which has a slight difference compared to a standard TensorFlow model. However, this conversion is straightforward and involves moving the TensorFlow exported model into a directory named export\Servo\ and compressing the entire directory into a tar file. SageMaker will recognize this compressed file as a loadable TensorFlow model.

model.save("export/Servo/1")
with tarfile.open("model.tar.gz", "w:gz") as tar:
    tar.add("export")

Step 3: Uploading the Model to Amazon S3

Next, open a new SageMaker session and upload the model to the default S3 bucket. This can be achieved using the sagemaker.Session.upload_data method, which requires the location of the exported model from TensorFlow and the desired location within the default bucket (e.g., “/model”). The default S3 bucket can be obtained using the sagemaker.Session.default_bucket method.

s3_response = sm_session.upload_data("model.tar.gz", bucket=bucket_name, key_prefix="model")

After uploading the model to S3, it can be imported into SageMaker using sagemaker.tensorflow.model.TensorFlowModel for deployment. This step necessitates providing the location of the S3 bucket containing the model and the role for authentication.

sagemaker_model = TensorFlowModel(
    model_data=f"s3://{bucket_name}/model/model.tar.gz",
    role=role,
    framework_version="2.3",
)

Step 4: Deploying the Model

At this stage, the model is prepared for deployment at a SageMaker endpoint. Utilizing the sagemaker.tensorflow.model.TensorFlowModel.deploy method accomplishes this task. For the purposes of this example, we suggest employing a single 'ml.m5.2xlarge' instance unless other instances have been previously created or preferred.

%%time
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type="ml.m5.2xlarge")

With the endpoint established, we can validate its functionality by classifying an example. The output generated by the predict function will consist of an array containing probabilities corresponding to each of the 3 classes.

sample = [6.4, 3.2, 4.5, 1.5]
predictor.predict(sample)

Output:

# please note that this output might be different to any other trials
{'predictions': [[0.01628883, 0.716617942, 0.267093182]]}

Step 5: Delete Temporary Folders, Files and Endpoint (optional)

Remove all temporary directories to ensure they do not interfere with subsequent runs. Additionally, consider deleting the endpoints if they are no longer needed. Keep in mind that open endpoints incur charges. If this is merely a test or practice, it is advisable to remove them.

import os
import shutil

# Remove temporary files
os.remove("model.tar.gz")
os.remove("iris_test.csv")
os.remove("iris_train.csv")
os.remove("iris.data")
shutil.rmtree("export")

# Optionally delete the endpoint
predictor.delete_endpoint()

CTA

Conclusion

Amazon SageMaker and TensorFlow Serving provide a powerful combination for serving machine learning models. With these tools, you can easily build, train, and serve TensorFlow models in a production environment. Remember that TensorFlow Serving isn’t limited to TensorFlow models. It can be extended to serve different types of models, making it a versatile tool for your machine learning needs.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.