Six Months of Kubernetes Storage in Production — Here’s What Actually Matters
When I first moved a stateful workload to Kubernetes, storage was the part that tripped me up the most. Deployments, Services, ConfigMaps — those felt intuitive. Persistent storage? The abstraction layers made no sense until I sat down and traced exactly what happens when a pod needs a disk that survives restarts.
After running PersistentVolumes and PersistentVolumeClaims across several clusters for about six months — including a PostgreSQL setup and an Elasticsearch stack — I have a much clearer picture of what works, what breaks, and which gotchas cost me the most hours early on.
The Two Storage Models: Static vs Dynamic Provisioning
Choosing the right provisioning model early saves a lot of painful refactoring. There are two options, and they suit very different environments.
Static Provisioning
With static provisioning, you manually create a PersistentVolume (PV) object pointing to an existing storage resource — an NFS share, a pre-provisioned cloud disk, or a local path. A PersistentVolumeClaim (PVC) then requests that storage. Kubernetes binds them based on capacity, access mode, and storage class.
# Static PV pointing to an NFS share
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv-data
spec:
capacity:
storage: 20Gi
accessModes:
- ReadWriteMany
nfs:
server: 192.168.1.50
path: /exports/k8s-data
persistentVolumeReclaimPolicy: Retain
# PVC that binds to the above PV
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data-claim
namespace: production
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 20Gi
Dynamic Provisioning
Dynamic provisioning flips the model. Define a StorageClass that knows how to create storage on demand. When a PVC lands in the cluster, Kubernetes calls the provisioner — AWS EBS CSI driver, GCE PD, Longhorn, whatever you’ve configured — and the backing storage appears automatically.
# StorageClass using AWS EBS CSI driver
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
# PVC using dynamic provisioning
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: production
spec:
storageClassName: fast-ssd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
Pros and Cons of Each Approach
Static Provisioning
- Pros: Full control over where data lives. Works well with on-prem NFS or existing SAN infrastructure. No external provisioner required.
- Cons: Manual overhead at scale. You pre-create volumes and track them yourself. Binding failures happen when capacity or access modes don’t match exactly — even one field off and the PVC stays Pending. Not practical once you’re spinning up dozens of stateful apps.
Dynamic Provisioning
- Pros: Scales without ceremony. Developers submit a PVC and storage appears. CSI drivers handle snapshots, resizing, and encryption out of the box. On cloud clusters, this eliminates most storage ops work.
- Cons: Requires a working CSI driver and a cooperating storage provider. Costs can escalate fast if developers request 500Gi volumes by habit. And the default
Deletereclaim policy will silently destroy your cloud disk the moment a PVC is removed — more on that below.
On a home lab cluster with a single NFS server, static provisioning was fine. The moment I moved to AWS EKS, dynamic provisioning with EBS gp3 was the obvious call. Six months in — through node replacements, a cluster upgrade from 1.27 to 1.30, and one accidental pod deletion cascade — zero data loss.
Recommended Setup for Most Teams
Cloud-based clusters should default to dynamic provisioning with a CSI driver. Create at least two StorageClasses:
- standard — general purpose, gp2 or gp3, for logs and non-critical data
- fast-ssd — high IOPS (gp3 with 3000 IOPS), for databases and message queues
Set volumeBindingMode: WaitForFirstConsumer on both. Without it, Kubernetes creates the volume before scheduling the pod — and if the volume lands in us-east-1a while the pod schedules to us-east-1b, the pod never starts. Skipping this cost me a two-hour debugging session early on.
One more: always set reclaimPolicy: Retain for volumes holding real data. The default Delete policy on dynamically provisioned PVs will permanently destroy your cloud disk when the PVC is deleted. No warning, no recovery.
# Production-safe StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
Implementation Guide
Step 1: Check Available StorageClasses
kubectl get storageclass
# Look for the (default) marker — this is used when no storageClassName is specified
Step 2: Create a PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: production
spec:
storageClassName: fast-ssd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
kubectl apply -f postgres-pvc.yaml
kubectl get pvc -n production
# STATUS should be Bound after a pod references it (WaitForFirstConsumer)
Step 3: Mount the PVC in a Pod or StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: production
spec:
selector:
matchLabels:
app: postgres
serviceName: postgres
replicas: 1
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
storageClassName: fast-ssd
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
volumeClaimTemplates inside a StatefulSet is the cleanest pattern for databases. Each replica gets its own PVC automatically — named data-postgres-0, data-postgres-1, and so on. No manual PVC management needed.
Step 4: Verify and Troubleshoot
# Check PVC status
kubectl describe pvc postgres-data -n production
# Check events if stuck in Pending
kubectl get events -n production --sort-by=.lastTimestamp
# Inspect the bound PV
kubectl get pv
kubectl describe pv <pv-name>
A PVC stuck in Pending usually means one of two things: no StorageClass matches the request, or WaitForFirstConsumer is set and no pod has been scheduled yet. Run kubectl describe pvc — the Events section almost always tells you exactly what’s wrong within seconds.
Step 5: Expand a Volume Without Downtime
If your StorageClass has allowVolumeExpansion: true, resizing is a one-liner:
kubectl patch pvc postgres-data -n production \
-p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
EBS volumes resize online — no pod restart needed. Some other backends (like certain NFS setups) require a pod restart to trigger the filesystem resize after the underlying volume expands.
Access Modes — A Quick Reference
- ReadWriteOnce (RWO): One node mounts the volume as read-write. Standard for databases. Most cloud block storage (EBS, Azure Disk) supports only this mode.
- ReadOnlyMany (ROX): Many nodes mount read-only. Good for shared config files or static assets.
- ReadWriteMany (RWX): Many nodes mount read-write simultaneously. Requires NFS, CephFS, or a distributed filesystem. Not available with standard cloud block storage like EBS.
The most common mistake I see: requesting RWX on an EBS-backed StorageClass. The PVC sits in Pending indefinitely, with a cryptic error about unsupported access modes. If you genuinely need RWX on AWS, use EFS with the EFS CSI driver instead.
Final Thoughts
Kubernetes storage clicked for me once I stopped thinking of it as “attaching a disk.” PVs and PVCs are an abstraction layer — your app declares what it needs, and the cluster figures out where that storage comes from.
Three decisions get you most of the way there: use dynamic provisioning, set reclaim policies to Retain for anything that matters, and reach for StatefulSets with volumeClaimTemplates for databases. Get those right and the rest is just tuning.

