Sharing a Directory from Your Local System to a Kubernetes Container: A Guide

As data scientists, we often find ourselves working with large datasets and complex computations. Kubernetes, an open-source platform for managing containerized workloads and services, has become a go-to solution for handling these tasks. However, sharing data between your local system and a Kubernetes container can be a bit tricky. This blog post will guide you through the process of sharing a directory from your local system to a Kubernetes container.

Sharing a Directory from Your Local System to a Kubernetes Container: A Guide

As data scientists, we often find ourselves working with large datasets and complex computations. Kubernetes, an open-source platform for managing containerized workloads and services, has become a go-to solution for handling these tasks. However, sharing data between your local system and a Kubernetes container can be a bit tricky. This blog post will guide you through the process of sharing a directory from your local system to a Kubernetes container.

Prerequisites

Before we start, make sure you have the following installed on your system:

  • Docker
  • Kubernetes (Minikube for local development)
  • kubectl command-line tool

Step 1: Create a Docker Image

First, we need to create a Docker image with the necessary software. In this case, we’ll use Python as an example. Create a Dockerfile with the following content:

FROM python:3.8-slim-buster

WORKDIR /app

COPY . /app

RUN pip install --no-cache-dir -r requirements.txt

This Dockerfile creates a Docker image based on Python 3.8, sets the working directory to /app, copies the current directory into the Docker image, and installs the Python dependencies.

Build the Docker image with the following command:

docker build -t my-python-app .

Step 2: Create a Persistent Volume

Next, we need to create a Persistent Volume (PV) in Kubernetes. A PV is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource.

Create a pv.yaml file with the following content:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data"

This YAML file creates a PV named pv-volume with a storage capacity of 10Gi and a ReadWriteOnce access mode. The hostPath field is used to mount the /data directory from your local system.

Apply the pv.yaml file with the following command:

kubectl apply -f pv.yaml

Step 3: Create a Persistent Volume Claim

Now, we need to create a Persistent Volume Claim (PVC). A PVC is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources.

Create a pvc.yaml file with the following content:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pv-claim
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

This YAML file creates a PVC named pv-claim that requests 3Gi of storage.

Apply the pvc.yaml file with the following command:

kubectl apply -f pvc.yaml

Step 4: Create a Kubernetes Deployment

Finally, we need to create a Kubernetes Deployment that uses the PVC.

Create a deployment.yaml file with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-python-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-python-app
  template:
    metadata:
      labels:
        app: my-python-app
    spec:
      containers:
      - name: my-python-app
        image: my-python-app
        volumeMounts:
        - mountPath: "/app/data"
          name: volume
      volumes:
      - name: volume
        persistentVolumeClaim:
          claimName: pv-claim

This YAML file creates a Deployment named my-python-app that runs the my-python-app Docker image and mounts the PVC to the /app/data directory in the container.

Apply the deployment.yaml file with the following command:

kubectl apply -f deployment.yaml

Conclusion

And that’s it! You’ve successfully shared a directory from your local system to a Kubernetes container. This process can be a bit complex, but it’s a powerful way to share data between your local system and a Kubernetes container. With this knowledge, you can now leverage the full power of Kubernetes for your data science projects.

Remember, Kubernetes is a powerful tool, but with great power comes great responsibility. Always ensure that your data is secure and that you’re following best practices for data management.

Happy coding!


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.