Managing Persistent Storage in Kubernetes with PersistentVolume and PersistentVolumeClaim

DevOps tutorial - IT technology blog
DevOps tutorial - IT technology blog

Six Months of Kubernetes Storage in Production — Here’s What Actually Matters

When I first moved a stateful workload to Kubernetes, storage was the part that tripped me up the most. Deployments, Services, ConfigMaps — those felt intuitive. Persistent storage? The abstraction layers made no sense until I sat down and traced exactly what happens when a pod needs a disk that survives restarts.

After running PersistentVolumes and PersistentVolumeClaims across several clusters for about six months — including a PostgreSQL setup and an Elasticsearch stack — I have a much clearer picture of what works, what breaks, and which gotchas cost me the most hours early on.

The Two Storage Models: Static vs Dynamic Provisioning

Choosing the right provisioning model early saves a lot of painful refactoring. There are two options, and they suit very different environments.

Static Provisioning

With static provisioning, you manually create a PersistentVolume (PV) object pointing to an existing storage resource — an NFS share, a pre-provisioned cloud disk, or a local path. A PersistentVolumeClaim (PVC) then requests that storage. Kubernetes binds them based on capacity, access mode, and storage class.

# Static PV pointing to an NFS share
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv-data
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: 192.168.1.50
    path: /exports/k8s-data
  persistentVolumeReclaimPolicy: Retain
# PVC that binds to the above PV
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data-claim
  namespace: production
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi

Dynamic Provisioning

Dynamic provisioning flips the model. Define a StorageClass that knows how to create storage on demand. When a PVC lands in the cluster, Kubernetes calls the provisioner — AWS EBS CSI driver, GCE PD, Longhorn, whatever you’ve configured — and the backing storage appears automatically.

# StorageClass using AWS EBS CSI driver
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
# PVC using dynamic provisioning
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: production
spec:
  storageClassName: fast-ssd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

Pros and Cons of Each Approach

Static Provisioning

  • Pros: Full control over where data lives. Works well with on-prem NFS or existing SAN infrastructure. No external provisioner required.
  • Cons: Manual overhead at scale. You pre-create volumes and track them yourself. Binding failures happen when capacity or access modes don’t match exactly — even one field off and the PVC stays Pending. Not practical once you’re spinning up dozens of stateful apps.

Dynamic Provisioning

  • Pros: Scales without ceremony. Developers submit a PVC and storage appears. CSI drivers handle snapshots, resizing, and encryption out of the box. On cloud clusters, this eliminates most storage ops work.
  • Cons: Requires a working CSI driver and a cooperating storage provider. Costs can escalate fast if developers request 500Gi volumes by habit. And the default Delete reclaim policy will silently destroy your cloud disk the moment a PVC is removed — more on that below.

On a home lab cluster with a single NFS server, static provisioning was fine. The moment I moved to AWS EKS, dynamic provisioning with EBS gp3 was the obvious call. Six months in — through node replacements, a cluster upgrade from 1.27 to 1.30, and one accidental pod deletion cascade — zero data loss.

Recommended Setup for Most Teams

Cloud-based clusters should default to dynamic provisioning with a CSI driver. Create at least two StorageClasses:

  • standard — general purpose, gp2 or gp3, for logs and non-critical data
  • fast-ssd — high IOPS (gp3 with 3000 IOPS), for databases and message queues

Set volumeBindingMode: WaitForFirstConsumer on both. Without it, Kubernetes creates the volume before scheduling the pod — and if the volume lands in us-east-1a while the pod schedules to us-east-1b, the pod never starts. Skipping this cost me a two-hour debugging session early on.

One more: always set reclaimPolicy: Retain for volumes holding real data. The default Delete policy on dynamically provisioned PVs will permanently destroy your cloud disk when the PVC is deleted. No warning, no recovery.

# Production-safe StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Implementation Guide

Step 1: Check Available StorageClasses

kubectl get storageclass
# Look for the (default) marker — this is used when no storageClassName is specified

Step 2: Create a PVC

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: production
spec:
  storageClassName: fast-ssd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
kubectl apply -f postgres-pvc.yaml
kubectl get pvc -n production
# STATUS should be Bound after a pod references it (WaitForFirstConsumer)

Step 3: Mount the PVC in a Pod or StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: production
spec:
  selector:
    matchLabels:
      app: postgres
  serviceName: postgres
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16
          env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        storageClassName: fast-ssd
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 50Gi

volumeClaimTemplates inside a StatefulSet is the cleanest pattern for databases. Each replica gets its own PVC automatically — named data-postgres-0, data-postgres-1, and so on. No manual PVC management needed.

Step 4: Verify and Troubleshoot

# Check PVC status
kubectl describe pvc postgres-data -n production

# Check events if stuck in Pending
kubectl get events -n production --sort-by=.lastTimestamp

# Inspect the bound PV
kubectl get pv
kubectl describe pv <pv-name>

A PVC stuck in Pending usually means one of two things: no StorageClass matches the request, or WaitForFirstConsumer is set and no pod has been scheduled yet. Run kubectl describe pvc — the Events section almost always tells you exactly what’s wrong within seconds.

Step 5: Expand a Volume Without Downtime

If your StorageClass has allowVolumeExpansion: true, resizing is a one-liner:

kubectl patch pvc postgres-data -n production \
  -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'

EBS volumes resize online — no pod restart needed. Some other backends (like certain NFS setups) require a pod restart to trigger the filesystem resize after the underlying volume expands.

Access Modes — A Quick Reference

  • ReadWriteOnce (RWO): One node mounts the volume as read-write. Standard for databases. Most cloud block storage (EBS, Azure Disk) supports only this mode.
  • ReadOnlyMany (ROX): Many nodes mount read-only. Good for shared config files or static assets.
  • ReadWriteMany (RWX): Many nodes mount read-write simultaneously. Requires NFS, CephFS, or a distributed filesystem. Not available with standard cloud block storage like EBS.

The most common mistake I see: requesting RWX on an EBS-backed StorageClass. The PVC sits in Pending indefinitely, with a cryptic error about unsupported access modes. If you genuinely need RWX on AWS, use EFS with the EFS CSI driver instead.

Final Thoughts

Kubernetes storage clicked for me once I stopped thinking of it as “attaching a disk.” PVs and PVCs are an abstraction layer — your app declares what it needs, and the cluster figures out where that storage comes from.

Three decisions get you most of the way there: use dynamic provisioning, set reclaim policies to Retain for anything that matters, and reach for StatefulSets with volumeClaimTemplates for databases. Get those right and the rest is just tuning.

Share: