Sharing a Directory from Your Local System to a Kubernetes Container: A Guide

Sharing a Directory from Your Local System to a Kubernetes Container: A Guide
As data scientists, we often find ourselves working with large datasets and complex computations. Kubernetes, an open-source platform for managing containerized workloads and services, has become a go-to solution for handling these tasks. However, sharing data between your local system and a Kubernetes container can be a bit tricky. This blog post will guide you through the process of sharing a directory from your local system to a Kubernetes container.
Prerequisites
Before we start, make sure you have the following installed on your system:
- Docker
- Kubernetes (Minikube for local development)
- kubectl command-line tool
Step 1: Create a Docker Image
First, we need to create a Docker image with the necessary software. In this case, we’ll use Python as an example. Create a Dockerfile
with the following content:
FROM python:3.8-slim-buster
WORKDIR /app
COPY . /app
RUN pip install --no-cache-dir -r requirements.txt
This Dockerfile
creates a Docker image based on Python 3.8, sets the working directory to /app
, copies the current directory into the Docker image, and installs the Python dependencies.
Build the Docker image with the following command:
docker build -t my-python-app .
Step 2: Create a Persistent Volume
Next, we need to create a Persistent Volume (PV) in Kubernetes. A PV is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource.
Create a pv.yaml
file with the following content:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
This YAML file creates a PV named pv-volume
with a storage capacity of 10Gi and a ReadWriteOnce
access mode. The hostPath
field is used to mount the /data
directory from your local system.
Apply the pv.yaml
file with the following command:
kubectl apply -f pv.yaml
Step 3: Create a Persistent Volume Claim
Now, we need to create a Persistent Volume Claim (PVC). A PVC is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources.
Create a pvc.yaml
file with the following content:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
This YAML file creates a PVC named pv-claim
that requests 3Gi of storage.
Apply the pvc.yaml
file with the following command:
kubectl apply -f pvc.yaml
Step 4: Create a Kubernetes Deployment
Finally, we need to create a Kubernetes Deployment that uses the PVC.
Create a deployment.yaml
file with the following content:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-python-app
spec:
replicas: 1
selector:
matchLabels:
app: my-python-app
template:
metadata:
labels:
app: my-python-app
spec:
containers:
- name: my-python-app
image: my-python-app
volumeMounts:
- mountPath: "/app/data"
name: volume
volumes:
- name: volume
persistentVolumeClaim:
claimName: pv-claim
This YAML file creates a Deployment named my-python-app
that runs the my-python-app
Docker image and mounts the PVC to the /app/data
directory in the container.
Apply the deployment.yaml
file with the following command:
kubectl apply -f deployment.yaml
Conclusion
And that’s it! You’ve successfully shared a directory from your local system to a Kubernetes container. This process can be a bit complex, but it’s a powerful way to share data between your local system and a Kubernetes container. With this knowledge, you can now leverage the full power of Kubernetes for your data science projects.
Remember, Kubernetes is a powerful tool, but with great power comes great responsibility. Always ensure that your data is secure and that you’re following best practices for data management.
Happy coding!
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.