تخطي إلى المحتوى الرئيسي

Back Up and Restore Object Storage

نظره عامه

إيلوم does not run automated backups of object storage by default. The bundled providers persist their data on PersistentVolumeClaims managed by the underlying CSI driver, and the chart preserves those PVCs across ترقية Helm و helm rollback. Disaster recovery beyond PVC retention is operator-driven.

This page describes three layers of data protection, ordered from infrastructure-level to application-level:

  • PV snapshots via the Kubernetes VolumeSnapshot API. Point-in-time copies of the underlying volume; CSI-driver-dependent.
  • Off-cluster mc mirror copies to an external S3 backend. Logical-object-level mirrors that survive cluster loss.
  • Application-level table snapshots in Iceberg, Delta, and DuckLake. Time-travel semantics inside the table format; no infrastructure involvement.

The recipes below cover the active provider's bucket data. For recovering from misconfiguration without data loss, refer to Troubleshoot Object Storage.

Backup layers compared

LayerRPORTOCoverageCluster loss survives?
PV snapshotSnapshot interval (typically hourly)Minutes (restore + provider restart)All buckets, including metadata indicesNo (snapshot lives on the same storage backend)
Off-cluster mc mirrorMirror interval (typically hourly)Minutes (re-mirror to new cluster)All buckets at the S3 layerنعم
Iceberg / Delta snapshotPer-commitSeconds (VERSION AS OF) One table at a timeOnly if the table's underlying objects survive

For production deployments, combine an off-cluster mc mirror job for disaster recovery with the table-format snapshots that Iceberg, Delta, and DuckLake already provide.

Layer 1: PV snapshots via the VolumeSnapshotواجهة برمجة التطبيقات

The Kubernetes VolumeSnapshot API has been generally available since Kubernetes 1.20 (December 2020). Snapshot support is provided by the CSI driver and must be advertised by the driver itself; not every CSI driver implements snapshotting. For the upstream reference, see Volume Snapshots.

Verify CSI snapshot support

kubectl get csidriver -o custom-columns=NAME:.metadata.name,SNAP:.spec.attachRequired
kubectl get volumesnapshotclass

If volumesnapshotclass returns nothing, install the external-snapshotter controller and a VolumeSnapshotClass for the cluster's CSI driver before proceeding.

إنشاء VolumeSnapshotClass (one-time)

apiVersion :  snapshot.storage.k8s.io/v1
نوع : VolumeSnapshotClass
البيانات الوصفية :
اسم : إيلوم - objectstorage- لقطات
سائق : < الخاص بك - csi- سائق >
deletionPolicy: Retain

deletionPolicy: Retain ensures snapshots survive the deletion of the VolumeSnapshotمورد. حذف is appropriate when snapshots should be removed automatically with their parent resource.

Snapshot the active provider's PVC

# RustFS: the chart names the PVC after the StatefulSet.
PVC=$(kubectl -n ilum get pvc -l app.kubernetes.io/name=rustfs \
-o jsonpath='{.items[0].metadata.name}')

cat <<EOF | kubectl -n ilum apply -f -
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
البيانات الوصفية:
name: ilum-objectstorage-$(date +%Y%m%d-%H%M%S)
المواصفات:
volumeSnapshotClassName: ilum-objectstorage-snapshots
source:
persistentVolumeClaimName: $PVC
EOF

kubectl -n ilum get volumesnapshot

Restore from a snapshot

Create a new PersistentVolumeClaim whose dataSource references the VolumeSnapshot:

apiVersion : الإصدار 1 
نوع : PersistentVolumeClaim
البيانات الوصفية :
اسم : rustfs- restore
Namespace : إيلوم
المواصفات :
storageClassName: < الخاص بك - storage- فصل >
dataSource:
اسم : إيلوم - objectstorage- <timestamp>
نوع : VolumeSnapshot
apiGroup : snapshot.storage.k8s.io
accessModes: [ القراءة الكتابة مرة واحدة ]
موارد :
requests:
storage: <same- مثل - source>

To swap the restored PVC into the running provider, scale the provider's StatefulSet to zero, repoint the active PVC at the restored volume, and scale back. The exact procedure depends on the CSI driver's reconciliation behavior; refer to the driver's documentation.

Limitations

  • VolumeSnapshotهل CSI-driver-dependent. Not every cloud provider implements snapshot support in their CSI driver, and on-prem drivers vary in maturity. Verify before relying on this layer.
  • Snapshots typically live on the same underlying storage backend. A failure of the backend (region outage, hardware loss) takes the snapshots with it. Layer this with off-cluster mirrors for true DR.
  • القراءة الكتابة مرة واحدة PVCs (the bundled-provider default) can be snapshotted without quiescing the provider, but the resulting snapshot is crash-consistent rather than application-consistent. For application-consistent snapshots, quiesce writes through the alias before triggering the snapshot.

Layer 2: Off-cluster mc mirror to external S3

mc mirror from the active provider to an external S3 backend produces a logical-object-level copy that survives full cluster loss. The typical pattern is a CronJob that mirrors every default bucket once per hour.

Provision an external destination

Provision an external S3 backend (AWS S3, Wasabi, Backblaze B2, or any S3-compatible service) with a bucket per source bucket. The destination bucket names should match objectStorage.defaultBuckets. For provider-specific endpoint shapes, refer to Provider Reference: External S3.

Store the destination credentials in a separate Secret so they do not conflict with ilum-objectstorage-credentials:

kubectl -n ilum create secret generic ilum-backup-credentials \
--from-literal=access-key=<external-access-key> \
--from-literal=secret-key=<external-secret-key>

Run the mirror as a CronJob

apiVersion :  batch/v1
نوع : CronJob
البيانات الوصفية :
اسم : إيلوم - objectstorage- backup
Namespace : إيلوم
المواصفات :
schedule: "0 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
المواصفات :
backoffLimit: 1
ttlSecondsAfterFinished: 86400
قالب :
المواصفات :
restartPolicy: OnFailure
حاويات :
- اسم : mc
صورة : minio/mc: RELEASE.2025- 04- 16T18- 13 - 26Z
envFrom:
- secretRef:
اسم : إيلوم - objectstorage- وثائق التفويض
- secretRef:
اسم : إيلوم - backup- وثائق التفويض
command: [ sh, - c ]
أرجس :
- |
set -eu
mc alias set src http://ilum-objectstorage:9000 "$access-key" "$secret-key"
mc alias set dst https://<external-endpoint> "$access-key" "$secret-key"
for bucket in ilum-files ilum-data ilum-tables ilum-mlflow ilum-kestra ilum-ducklake ilum-langfuse; do
mc mb --ignore-existing dst/$bucket
mc mirror --preserve --remove src/$bucket dst/$bucket
منجز

ال --remove flag mirrors deletions from source to destination. Omit it when an append-only archive is preferred. The bundled minio/mc image tag is pinned to the same release used by the in-cluster migration مهمة , ensuring behavior parity.

Restore from the external mirror

The restore procedure is mc mirror in reverse:

  1. Stand up a clean إيلوم install with the active provider enabled but the bucket-init مهمة disabled (to avoid overwriting the restored objects).

  2. Configure the same external destination as a source alias.

  3. Mirror back to the in-cluster provider:

    for bucket in ilum-files ilum-data ilum-tables ilum-mlflow ilum-kestra ilum-ducklake ilum-langfuse; do
    mc mirror --preserve src/$bucket dst/$bucket
    منجز
  4. Re-enable the bundled consumers. The shared Secret and the ilum-objectstorage alias point them at the restored data automatically.

Layer 3: Application-level table snapshots

For Iceberg and Delta tables managed by إيلوم , the table format itself provides point-in-time snapshots through its commit history. These cover one table at a time, not full buckets, but offer fine-grained recovery without infrastructure involvement.

مثلجة

-- List snapshots.
اختار snapshot_id, committed_at, operation
من مثلجة . < كتالوج > . < جدول > . لقطات
ORDER ب committed_at DESC;

-- Time-travel read.
اختار *
من مثلجة . < كتالوج > . < جدول >
الإصدار مثل من < snapshot_id> ;

-- Roll the table back to a snapshot.
دعا مثلجة . نظام . rollback_to_snapshot( '<catalog>.<table>', < snapshot_id> ) ;

بحيرة دلتا

وصف تاريخ < كتالوج > . < جدول > ; 
اختار * من < كتالوج > . < جدول > الإصدار مثل من < الإصدار > ;
يستعيد جدول < كتالوج > . < جدول > ل الإصدار مثل من < الإصدار > ;

DuckLake

DuckLake snapshots are recorded in the DuckLake catalog. Refer to the DuckLake documentation for the time-travel and rollback syntax that matches the catalog version in use.

The retention window of these snapshots is governed by the table format's expiration policy (Iceberg's expire_snapshots procedure, Delta's مِكْنَسَة كَهْرَبَائِيَّة ). Tune the retention to match the operator's recovery objectives before relying on this layer for DR.

What is not backed up by default

  • PVC snapshots are not taken automatically. The chart does not create VolumeSnapshot resources; the operator must schedule them.
  • Off-cluster mirrors are not configured by default.ال CronJob recipe above is operator-installed.
  • Bucket policies, lifecycle rules, and IAM users that the operator configures directly against the provider are not part of any layer above. Capture them separately (typically with a GitOps pipeline).
  • Hydra OIDC client registrations and Kubernetes Secrets are not covered by object-storage backups. Use cluster-wide tooling (Velero, Kasten K10, similar) for those.

Reference