Back Up and Restore Object Storage
نظره عامه
إيلوم does not run automated backups of object storage by default.
The bundled providers persist their data on PersistentVolumeClaims
managed by the underlying CSI driver, and the chart preserves those
PVCs across ترقية Helm و helm rollback. Disaster recovery
beyond PVC retention is operator-driven.
This page describes three layers of data protection, ordered from infrastructure-level to application-level:
- PV snapshots via the Kubernetes
VolumeSnapshotAPI. Point-in-time copies of the underlying volume; CSI-driver-dependent. - Off-cluster
mc mirrorcopies to an external S3 backend. Logical-object-level mirrors that survive cluster loss. - Application-level table snapshots in Iceberg, Delta, and DuckLake. Time-travel semantics inside the table format; no infrastructure involvement.
The recipes below cover the active provider's bucket data. For recovering from misconfiguration without data loss, refer to Troubleshoot Object Storage.
Backup layers compared
| Layer | RPO | RTO | Coverage | Cluster loss survives? |
|---|---|---|---|---|
| PV snapshot | Snapshot interval (typically hourly) | Minutes (restore + provider restart) | All buckets, including metadata indices | No (snapshot lives on the same storage backend) |
Off-cluster mc mirror | Mirror interval (typically hourly) | Minutes (re-mirror to new cluster) | All buckets at the S3 layer | نعم |
| Iceberg / Delta snapshot | Per-commit | Seconds (VERSION AS OF) | One table at a time | Only if the table's underlying objects survive |
For production deployments, combine an off-cluster mc mirror job for
disaster recovery with the table-format snapshots that Iceberg, Delta,
and DuckLake already provide.
Layer 1: PV snapshots via the VolumeSnapshotواجهة برمجة التطبيقات
The Kubernetes VolumeSnapshot API has been generally available since
Kubernetes 1.20 (December 2020). Snapshot support is provided by the
CSI driver and must be advertised by the driver itself; not every CSI
driver implements snapshotting. For the upstream reference, see
Volume Snapshots.
Verify CSI snapshot support
kubectl get csidriver -o custom-columns=NAME:.metadata.name,SNAP:.spec.attachRequired
kubectl get volumesnapshotclass
If volumesnapshotclass returns nothing, install the
external-snapshotter controller and a VolumeSnapshotClass for the
cluster's CSI driver before proceeding.
إنشاء VolumeSnapshotClass (one-time)
apiVersion : snapshot.storage.k8s.io/v1
نوع : VolumeSnapshotClass
البيانات الوصفية :
اسم : إيلوم - objectstorage- لقطات
سائق : < الخاص بك - csi- سائق >
deletionPolicy: Retain
deletionPolicy: Retain ensures snapshots survive the deletion of the
VolumeSnapshotمورد. حذف is appropriate when snapshots
should be removed automatically with their parent resource.
Snapshot the active provider's PVC
# RustFS: the chart names the PVC after the StatefulSet.
PVC=$(kubectl -n ilum get pvc -l app.kubernetes.io/name=rustfs \
-o jsonpath='{.items[0].metadata.name}')
cat <<EOF | kubectl -n ilum apply -f -
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
البيانات الوصفية:
name: ilum-objectstorage-$(date +%Y%m%d-%H%M%S)
المواصفات:
volumeSnapshotClassName: ilum-objectstorage-snapshots
source:
persistentVolumeClaimName: $PVC
EOF
kubectl -n ilum get volumesnapshot
Restore from a snapshot
Create a new PersistentVolumeClaim whose dataSource references the
VolumeSnapshot:
apiVersion : الإصدار 1
نوع : PersistentVolumeClaim
البيانات الوصفية :
اسم : rustfs- restore
Namespace : إيلوم
المواصفات :
storageClassName: < الخاص بك - storage- فصل >
dataSource:
اسم : إيلوم - objectstorage- <timestamp>
نوع : VolumeSnapshot
apiGroup : snapshot.storage.k8s.io
accessModes: [ القراءة الكتابة مرة واحدة ]
موارد :
requests:
storage: <same- مثل - source>
To swap the restored PVC into the running provider, scale the
provider's StatefulSet to zero, repoint the active PVC at the
restored volume, and scale back. The exact procedure depends on the
CSI driver's reconciliation behavior; refer to the driver's
documentation.
Limitations
VolumeSnapshotهل CSI-driver-dependent. Not every cloud provider implements snapshot support in their CSI driver, and on-prem drivers vary in maturity. Verify before relying on this layer.- Snapshots typically live on the same underlying storage backend. A failure of the backend (region outage, hardware loss) takes the snapshots with it. Layer this with off-cluster mirrors for true DR.
القراءة الكتابة مرة واحدةPVCs (the bundled-provider default) can be snapshotted without quiescing the provider, but the resulting snapshot is crash-consistent rather than application-consistent. For application-consistent snapshots, quiesce writes through the alias before triggering the snapshot.
Layer 2: Off-cluster mc mirror to external S3
mc mirror from the active provider to an external S3 backend produces
a logical-object-level copy that survives full cluster loss. The
typical pattern is a CronJob that mirrors every default bucket once
per hour.
Provision an external destination
Provision an external S3 backend (AWS S3, Wasabi, Backblaze B2, or any
S3-compatible service) with a bucket per source bucket. The destination
bucket names should match objectStorage.defaultBuckets. For
provider-specific endpoint shapes, refer to
Provider Reference: External S3.
Store the destination credentials in a separate Secret so they do
not conflict with ilum-objectstorage-credentials:
kubectl -n ilum create secret generic ilum-backup-credentials \
--from-literal=access-key=<external-access-key> \
--from-literal=secret-key=<external-secret-key>
Run the mirror as a CronJob
apiVersion : batch/v1
نوع : CronJob
البيانات الوصفية :
اسم : إيلوم - objectstorage- backup
Namespace : إيلوم
المواصفات :
schedule: "0 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
المواصفات :
backoffLimit: 1
ttlSecondsAfterFinished: 86400
قالب :
المواصفات :
restartPolicy: OnFailure
حاويات :
- اسم : mc
صورة : minio/mc: RELEASE.2025- 04- 16T18- 13 - 26Z
envFrom:
- secretRef:
اسم : إيلوم - objectstorage- وثائق التفويض
- secretRef:
اسم : إيلوم - backup- وثائق التفويض
command: [ sh, - c ]
أرجس :
- |
set -eu
mc alias set src http://ilum-objectstorage:9000 "$access-key" "$secret-key"
mc alias set dst https://<external-endpoint> "$access-key" "$secret-key"
for bucket in ilum-files ilum-data ilum-tables ilum-mlflow ilum-kestra ilum-ducklake ilum-langfuse; do
mc mb --ignore-existing dst/$bucket
mc mirror --preserve --remove src/$bucket dst/$bucket
منجز
ال --remove flag mirrors deletions from source to destination. Omit
it when an append-only archive is preferred. The bundled minio/mc
image tag is pinned to the same release used by the in-cluster
migration مهمة , ensuring behavior parity.
Restore from the external mirror
The restore procedure is mc mirror in reverse:
-
Stand up a clean إيلوم install with the active provider enabled but the bucket-init
مهمةdisabled (to avoid overwriting the restored objects). -
Configure the same external destination as a source alias.
-
Mirror back to the in-cluster provider:
for bucket in ilum-files ilum-data ilum-tables ilum-mlflow ilum-kestra ilum-ducklake ilum-langfuse; do
mc mirror --preserve src/$bucket dst/$bucket
منجز -
Re-enable the bundled consumers. The shared
Secretand theilum-objectstoragealias point them at the restored data automatically.
Layer 3: Application-level table snapshots
For Iceberg and Delta tables managed by إيلوم , the table format itself provides point-in-time snapshots through its commit history. These cover one table at a time, not full buckets, but offer fine-grained recovery without infrastructure involvement.
مثلجة
-- List snapshots.
اختار snapshot_id, committed_at, operation
من مثلجة . < كتالوج > . < جدول > . لقطات
ORDER ب committed_at DESC;
-- Time-travel read.
اختار *
من مثلجة . < كتالوج > . < جدول >
الإصدار مثل من < snapshot_id> ;
-- Roll the table back to a snapshot.
دعا مثلجة . نظام . rollback_to_snapshot( '<catalog>.<table>', < snapshot_id> ) ;
بحيرة دلتا
وصف تاريخ < كتالوج > . < جدول > ;
اختار * من < كتالوج > . < جدول > الإصدار مثل من < الإصدار > ;
يستعيد جدول < كتالوج > . < جدول > ل الإصدار مثل من < الإصدار > ;
DuckLake
DuckLake snapshots are recorded in the DuckLake catalog. Refer to the DuckLake documentation for the time-travel and rollback syntax that matches the catalog version in use.
The retention window of these snapshots is governed by the table
format's expiration policy (Iceberg's expire_snapshots procedure,
Delta's مِكْنَسَة كَهْرَبَائِيَّة ). Tune the retention to match the operator's recovery
objectives before relying on this layer for DR.
What is not backed up by default
- PVC snapshots are not taken automatically. The chart does not
create
VolumeSnapshotresources; the operator must schedule them. - Off-cluster mirrors are not configured by default.ال
CronJobrecipe above is operator-installed. - Bucket policies, lifecycle rules, and IAM users that the operator configures directly against the provider are not part of any layer above. Capture them separately (typically with a GitOps pipeline).
- Hydra OIDC client registrations and Kubernetes Secrets are not covered by object-storage backups. Use cluster-wide tooling (Velero, Kasten K10, similar) for those.
Reference
- كوبرنيتيس
VolumeSnapshotreference: kubernetes.io/docs/concepts/storage/volume-snapshots/ external-snapshottercontroller: github.com/kubernetes-csi/external-snapshottermcclient reference: min.io/docs/minio/linux/reference/minio-mc.html- Iceberg maintenance procedures: iceberg.apache.org/docs/latest/maintenance/
- Delta Lake time travel: docs.delta.io/latest/quick-start.html#read-older-versions-of-data-using-time-travel
- Migration playbook: Migrate Between Providers
- Provider reference: External S3