Troubleshoot Object Storage

نظره عامه

This page catalogs the symptoms an operator most commonly encounters when something is off with the object-storage layer, the underlying cause, and the recovery procedure. Each recipe ends in one or two concrete كوبيكتل أو helm commands.

502 Bad Gateway from `/external/object-storage/` أو `/ خارجي / مينيو /`

Symptom

Loading http://<ingress>/external/object-storage/ أو http://<ingress>/external/minio/ returns 502 Bad Gateway من nginx. The Object Storage view in the إيلوم UI shows the gateway error inside the iframe.

Likely cause

ال ilum-objectstorage Service alias has no endpoints. The selector points at a label that no pod carries.

Diagnosis

Inspect the alias annotation, selector, and endpoints:

كوبيكتل -ن ilum get svc ilum-objectstorage \
  -س jsonpath='active-provider: {.metadata.annotations.ilum\.cloud/object-storage-active-provider}{"\n"}selector: {.spec.selector}{"\n"}'
كوبيكتل -ن ilum get endpoints ilum-objectstorage

If the endpoints column shows , the selector does not match any pod. Common causes:

objectStorage.activeProvider was set to a name that does not match any running provider's app.kubernetes.io/name label.
The provider's chart was disabled (.enabled=false) without flipping activeProvider to a still-running provider.
A pre-upgrade override left the alias selector in an inconsistent state.

Recovery

Roll back to the last release revision whose values are known to be correct:

helm تاريخ إيلوم -ن إيلوم
helm rollback ilum <revision> -ن إيلوم
كوبيكتل -ن ilum rollout restart deploy/ilum-ui

Alternatively, override activeProvider to a still-running provider and re-upgrade:

helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values \
  --جبر objectStorage.activeProvider=auto
كوبيكتل -ن ilum rollout restart deploy/ilum-ui

`/external/object-storage/` redirects in a loop

Symptom

The browser keeps bouncing between /external/object-storage/ and the provider-specific console path; the page never renders.

Likely cause

The active provider's consoleMode هل nginx-rewrite and its consolePath هل /external/object-storage/ itself, so the redirect sends the browser back to where it came from.

Recovery

Set the provider's consolePath to a provider-specific path so the redirect target is distinct:

helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values \
  --جبر objectStorage.providers.<كاسب>.consolePath=/external/<كاسب>/

Object Storage nav button does not load

Symptom

Clicking the Object Storage entry in the إيلوم UI loads a blank iframe or shows a "file not found" message.

Likely cause

ILUM_OBJECT_STORAGE_PATH في المربع ILUM-UI خريطة التكوين resolves to a path that the nginx proxy does not route, or no provider is active and the path falls back to the chart-wide default /external/object-storage/ which then 404s because no upstream is configured.

Diagnosis

Inspect the runtime path the UI uses:

كوبيكتل -ن ilum get configmap ilum-ui \
  -س jsonpath='ILUM_OBJECT_STORAGE_PATH={.data.ILUM_OBJECT_STORAGE_PATH}{"\n"}'

Cross-check against the nginx configuration for the matching location block:

كوبيكتل -ن إيلوم exec deploy/ilum-ui -c ilum-ui -- \
  grep -A5 'location /external/' /etc/nginx/conf.d/server.conf

Recovery

Ensure an in-cluster provider is enabled and either rely on the resolved default or override objectStorage.providers..consolePath explicitly. Then restart the إيلوم UI to pick up the new خريطة التكوين:

helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values \
  --جبر <كاسب>.enabled=صحيح
كوبيكتل -ن ilum rollout restart deploy/ilum-ui

`helm template` fails with "3 providers enabled"

Symptom

A helm install أو ترقية Helm fails at render time with a message similar to:

Error: ... objectStorage: 3 providers enabled ([minio rustfs seaweedfs]);
set objectStorage.activeProvider= to pick which one user traffic
routes through

Likely cause

More than two providers are enabled simultaneously, and objectStorage.activeProvider is left at auto. The chart refuses to guess.

Recovery

Set the active provider explicitly:

helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values \
  --جبر objectStorage.activeProvider=<كاسب>

Alternatively, disable the providers that are not relevant to user traffic by setting their تمكين flags to خطأ.

Alias has no endpoints despite a running provider

Symptom

A provider pod is running and ready, but kubectl get endpoints ilum-objectstorage shows .

Likely cause

The pod's labels do not match the alias خدمة selector. The selector requires both app.kubernetes.io/name: و app.kubernetes.io/instance: .

Diagnosis

كوبيكتل -ن ilum get pod -l app.kubernetes.io/name=<كاسب> \
  -س jsonpath='{.items[*].metadata.labels}'
كوبيكتل -ن ilum get svc ilum-objectstorage -س jsonpath='{.spec.selector}'

Recovery

For pods deployed by a chart, ensure the chart sets both required labels. For hand-rolled Deployments (such as those created by the Add a New Provider procedure), patch the pod template to include the missing labels and re-roll the نشر.

Stuck `pending-upgrade` after a failed `helm upgrade --wait`

Symptom

helm history ilum shows a revision in pending-upgrade state. Every subsequent ترقية Helm fails immediately with a message similar to:

Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

Likely cause

A previous helm upgrade --wait was interrupted (network drop, laptop crash, Ctrl-C). The release Secret recording the in-flight upgrade was never finalized.

Recovery

Delete the stuck release Secret and retry:

كوبيكتل -ن ilum get secret -l owner=helm,name=إيلوم
كوبيكتل -ن ilum delete secret sh.helm.release.v1.ilum.v<revision>
helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values

The revision number is the highest one listed by helm history ilum that is in pending-upgrade حالة.

Cutover acknowledged but the alias still targets the old provider

Symptom

objectStorage.cutoverAcknowledged=true is set (or its legacy alias rustfs.migrationAcknowledged=true), but the alias annotation still shows the previous provider.

Likely cause

Either the إيلوم UI's خريطة التكوين was not regenerated (the rollme: annotation that forces a ILUM-UI rollout did not change), or the operator did not run ترقية Helm after flipping the flag.

Recovery

Re-run ترقية Helm and force a UI rollout:

helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values \
  --جبر objectStorage.cutoverAcknowledged=صحيح
كوبيكتل -ن ilum rollout restart deploy/ilum-ui

Verify by inspecting the alias annotation:

كوبيكتل -ن ilum get svc ilum-objectstorage \
  -س jsonpath='{.metadata.annotations.ilum\.cloud/object-storage-active-provider}{"\n"}'

Bucket-init Job stays `Pending` or fails

Symptom

After helm install أو ترقية Helmال init-rustfs-buckets أو init-minio-policies مهمة does not reach Complete. helm install --wait times out, or the bundled consumers report missing buckets at startup.

Likely cause

One of the following:

ال ilum-objectstorage-credentials Secret is missing or has empty values for access-key / secret-key.
The provider's Service is reachable on cluster DNS but the provider pod is not yet Ready; the init مهمة's wait-for- init container is still looping.
The provider rejected the credentials (the bundled image baked in a different default than the live Secret).

Diagnosis

كوبيكتل -ن ilum logs job/init-rustfs-buckets -c wait-for-rustfs --tail=50
كوبيكتل -ن ilum logs job/init-rustfs-buckets --tail=200
كوبيكتل -ن ilum get secret ilum-objectstorage-credentials \
  -س jsonpath='{.data.access-key}' | base64 -d; echo

Recovery

Populate the credentials Secret with all six aliased keys (access-key, secret-key, root-user, root-password, RUSTFS_ACCESS_KEY, RUSTFS_SECRET_KEY) and re-run the upgrade. The init مهمة is idempotent; it can be retried by deleting and re-applying via ترقية Helm:

كوبيكتل -ن ilum delete job init-rustfs-buckets || صحيح
helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values

Credentials lookup error on `ترقية Helm`

Symptom

ترقية Helm fails at render time with a message similar to:

Error: ... values don't meet the specifications of the schema(s) ...
... ilum-objectstorage-credentials lookup is missing required keys ...

Likely cause

The chart resolves credentials in this order: live Secret values via lookup (when objectStorage.credentials.preserveExisting=true), then the literal defaults in القيم.yaml. When the live Secret exists but is missing one of the six aliased keys, the lookup returns an incomplete dictionary and the template fails the schema check.

Recovery

Either re-create the Secret with all six aliased keys, or disable the lookup and let the chart re-render the defaults:

# Option A: repopulate the Secret.
كوبيكتل -ن ilum delete secret ilum-objectstorage-credentials
helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values

# Option B: force deterministic render (loses any rotated credentials).
helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values \
  --جبر objectStorage.credentials.preserveExisting=خطأ

PVC bound to wrong `StorageClass`

Symptom

The provider's StatefulSet أو نشر stays Pending. The pod's events log a message similar to:

0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims

Likely cause

The chart-default storageClassName resolves to a class that does not match a CSI driver available on the cluster. This is common when moving the chart between cloud providers without overriding the storage class.

Recovery

Destructive

Deleting an existing PersistentVolumeClaim deletes the underlying volume on most CSI drivers. Use this recipe on net-new installs only.

Set the correct storage class and re-roll the PVCs:

كوبيكتل -ن ilum get storageclass
كوبيكتل -ن ilum delete pvc -l app.kubernetes.io/name=rustfs
helm upgrade ilum ilum/helm_aio -ن ilum --reuse-values \
  --جبر rustfs.persistence.storageClass=<cluster-storage-class>

For pre-existing data, snapshot the source PVC and restore against the correct storage class before deletion. See Back Up and Restore Object Storage.

Post-cutover consumer still writes to the previous provider

Symptom

objectStorage.cutoverAcknowledged=true is set and mc diff confirms data parity, but one or more bundled consumers continue writing into the old provider's bucket.

Likely cause

The consumer cached its S3 endpoint at startup and has not refreshed since the cutover. The ilum-objectstorage Service alias re-targets the new provider instantly, but consumers that resolve the alias once on Pod startup do not pick up the change until they restart.

ال إيلوم UI rolls automatically when the ترقية Helm regenerates the ILUM-UI خريطة التكوين. Other consumers do not.

Recovery

Restart every consumer that targets the alias:

كوبيكتل -ن ilum rollout restart \
  deploy/ilum-core \
  deploy/ilum-jupyter \
  deploy/ilum-mlflow \
  deploy/ilum-kestra \
  deploy/ilum-langfuse-web \
  statefulset/ilum-hive-metastore

Long-running Spark driver Pods are unaffected: each Spark job creates its own S3 client and resolves the alias afresh.

Reference

Object Storage Overview for the alias model.
Migrate Between Providers for the data migration playbook.
Add a New Provider for plugging in new backends.
Back Up and Restore Object Storage for data protection recipes.
Object Storage Helm Values for the value reference.

نظره عامه​

502 Bad Gateway from /external/object-storage/ أو / خارجي / مينيو /​

Symptom​

Likely cause​

Diagnosis​

Recovery​

/external/object-storage/ redirects in a loop​

Symptom​

Likely cause​

Recovery​

Object Storage nav button does not load​

Symptom​

Likely cause​

Diagnosis​

Recovery​

helm template fails with "3 providers enabled"​

Symptom​

Likely cause​

Recovery​

Alias has no endpoints despite a running provider​

Symptom​

Likely cause​

Diagnosis​

Recovery​

Stuck pending-upgrade after a failed helm upgrade --wait​

Symptom​

Likely cause​

Recovery​

Cutover acknowledged but the alias still targets the old provider​

Symptom​

Likely cause​

Recovery​

Bucket-init Job stays Pending or fails​

Symptom​

Likely cause​

Diagnosis​

Recovery​

Credentials lookup error on ترقية Helm​

Symptom​

Likely cause​

Recovery​

PVC bound to wrong StorageClass​

Symptom​

Likely cause​

Recovery​

Post-cutover consumer still writes to the previous provider​

Symptom​

Likely cause​

Recovery​

Reference​

نظره عامه

502 Bad Gateway from `/external/object-storage/` أو `/ خارجي / مينيو /`

Symptom

Likely cause

Diagnosis

Recovery

`/external/object-storage/` redirects in a loop

Symptom

Likely cause

Recovery

Object Storage nav button does not load

Symptom

Likely cause

Diagnosis

Recovery

`helm template` fails with "3 providers enabled"

Symptom

Likely cause

Recovery

Alias has no endpoints despite a running provider

Symptom

Likely cause

Diagnosis

Recovery

Stuck `pending-upgrade` after a failed `helm upgrade --wait`

Symptom

Likely cause

Recovery

Cutover acknowledged but the alias still targets the old provider

Symptom

Likely cause

Recovery

Bucket-init Job stays `Pending` or fails

Symptom

Likely cause

Diagnosis

Recovery

Credentials lookup error on `ترقية Helm`

Symptom

Likely cause

Recovery

PVC bound to wrong `StorageClass`

Symptom

Likely cause

Recovery

Post-cutover consumer still writes to the previous provider

Symptom

Likely cause

Recovery

Reference