Troubleshoot Object Storage
نظره عامه
This page catalogs the symptoms an operator most commonly encounters
when something is off with the object-storage layer, the underlying
cause, and the recovery procedure. Each recipe ends in one or two
concrete كوبيكتل أو helm commands.
502 Bad Gateway from /external/object-storage/أو / خارجي / مينيو /
Symptom
Loading http://<ingress>/external/object-storage/أو http://<ingress>/external/minio/ returns 502 Bad Gatewayمن nginx. The Object Storage view in the إيلوم UI shows the gateway
error inside the iframe.
Likely cause
ال ilum-objectstorage Service alias has no endpoints. The selector
points at a label that no pod carries.
Diagnosis
Inspect the alias annotation, selector, and endpoints:
kubectl -n ilum get svc ilum-objectstorage \
-o jsonpath='active-provider: {.metadata.annotations.ilum\.cloud/object-storage-active-provider}{"\n"}selector: {.spec.selector}{"\n"}'
kubectl -n ilum get endpoints ilum-objectstorage
If the endpoints column shows <none>, the selector does not match any
pod. Common causes:
objectStorage.activeProviderwas set to a name that does not match any running provider'sapp.kubernetes.io/namelabel.- The provider's chart was disabled (
<provider>.enabled=false) without flippingactiveProviderto a still-running provider. - A pre-upgrade override left the alias selector in an inconsistent state.
Recovery
Roll back to the last release revision whose values are known to be correct:
helm history ilum -n ilum
helm rollback ilum <revision> -n ilum
kubectl -n ilum rollout restart deploy/ilum-ui
Alternatively, override activeProvider to a still-running provider and
re-upgrade:
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--set objectStorage.activeProvider=auto
kubectl -n ilum rollout restart deploy/ilum-ui
/external/object-storage/ redirects in a loop
Symptom
The browser keeps bouncing between /external/object-storage/ and the
provider-specific console path; the page never renders.
Likely cause
The active provider's consoleModeهل nginx-rewrite and its
consolePathهل /external/object-storage/ itself, so the redirect
sends the browser back to where it came from.
Recovery
Set the provider's consolePath to a provider-specific path so the
redirect target is distinct:
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--set objectStorage.providers.<provider>.consolePath=/external/<provider>/
Object Storage nav button does not load
Symptom
Clicking the Object Storage entry in the إيلوم UI loads a blank iframe or shows a "file not found" message.
Likely cause
ILUM_OBJECT_STORAGE_PATHفي المربع ILUM-UI خريطة التكوين resolves to a
path that the nginx proxy does not route, or no provider is active
and the path falls back to the chart-wide default
/external/object-storage/ which then 404s because no upstream is
configured.
Diagnosis
Inspect the runtime path the UI uses:
kubectl -n ilum get configmap ilum-ui \
-o jsonpath='ILUM_OBJECT_STORAGE_PATH={.data.ILUM_OBJECT_STORAGE_PATH}{"\n"}'
Cross-check against the nginx configuration for the matching location
block:
kubectl -n ilum exec deploy/ilum-ui -c ilum-ui -- \
grep -A5 'location /external/' /etc/nginx/conf.d/server.conf
Recovery
Ensure an in-cluster provider is enabled and either rely on the resolved
default or override objectStorage.providers.<provider>.consolePath
explicitly. Then restart the إيلوم UI to pick up the new خريطة التكوين :
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--set <provider>.enabled=true
kubectl -n ilum rollout restart deploy/ilum-ui
helm template fails with "3 providers enabled"
Symptom
A helm installأو ترقية Helm fails at render time with a message
similar to:
Error: ... objectStorage: 3 providers enabled ([minio rustfs seaweedfs]);
set objectStorage.activeProvider=<name> to pick which one user traffic
routes through
Likely cause
More than two providers are enabled simultaneously, and
objectStorage.activeProvider is left at auto. The chart refuses to
guess.
Recovery
Set the active provider explicitly:
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--set objectStorage.activeProvider=<provider>
Alternatively, disable the providers that are not relevant to user
traffic by setting their تمكين flags to خطأ .
Alias has no endpoints despite a running provider
Symptom
A provider pod is running and ready, but kubectl get endpoints ilum-objectstorage shows <none>.
Likely cause
The pod's labels do not match the alias خدمة selector. The selector
requires both app.kubernetes.io/name: <provider>و app.kubernetes.io/instance: <release>.
Diagnosis
kubectl -n ilum get pod -l app.kubernetes.io/name=<provider> \
-o jsonpath='{.items[*].metadata.labels}'
kubectl -n ilum get svc ilum-objectstorage -o jsonpath='{.spec.selector}'
Recovery
For pods deployed by a chart, ensure the chart sets both required
labels. For hand-rolled Deployments (such as those created by the
Add a New Provider procedure), patch the pod
template to include the missing labels and re-roll the نشر .
Stuck pending-upgrade after a failed helm upgrade --wait
Symptom
helm history ilum shows a revision in pending-upgrade state. Every
subsequent ترقية Helm fails immediately with a message similar to:
Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
Likely cause
A previous helm upgrade --wait was interrupted (network drop, laptop
crash, Ctrl-C). The release Secret recording the in-flight upgrade
was never finalized.
Recovery
Delete the stuck release Secret and retry:
kubectl -n ilum get secret -l owner=helm,name=ilum
kubectl -n ilum delete secret sh.helm.release.v1.ilum.v<revision>
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values
The revision number is the highest one listed by helm history ilum
that is in pending-upgradeحالة.
Cutover acknowledged but the alias still targets the old provider
Symptom
objectStorage.cutoverAcknowledged=true is set (or its legacy alias
rustfs.migrationAcknowledged=true), but the alias annotation still
shows the previous provider.
Likely cause
Either the إيلوم UI's خريطة التكوين was not regenerated (the
rollme: <random> annotation that forces a ILUM-UI rollout did not
change), or the operator did not run ترقية Helm after flipping the
flag.
Recovery
Re-run ترقية Helm and force a UI rollout:
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--set objectStorage.cutoverAcknowledged=true
kubectl -n ilum rollout restart deploy/ilum-ui
Verify by inspecting the alias annotation:
kubectl -n ilum get svc ilum-objectstorage \
-o jsonpath='{.metadata.annotations.ilum\.cloud/object-storage-active-provider}{"\n"}'
Bucket-init Job stays Pending or fails
Symptom
After helm installأو ترقية Helm ال init-rustfs-bucketsأو init-minio-policies مهمة does not reach Complete. helm install --wait
times out, or the bundled consumers report missing buckets at startup.
Likely cause
One of the following:
- ال
ilum-objectstorage-credentialsSecretis missing or has empty values foraccess-key/secret-key. - The provider's Service is reachable on cluster DNS but the provider
pod is not yet
Ready; the initمهمة'swait-for-<provider>init container is still looping. - The provider rejected the credentials (the bundled image baked in a
different default than the live
Secret).
Diagnosis
kubectl -n ilum logs job/init-rustfs-buckets -c wait-for-rustfs --tail=50
kubectl -n ilum logs job/init-rustfs-buckets --tail=200
kubectl -n ilum get secret ilum-objectstorage-credentials \
-o jsonpath='{.data.access-key}' | base64 -d; echo
Recovery
Populate the credentials Secret with all six aliased keys
(access-key, secret-key, root-user, root-password, RUSTFS_ACCESS_KEY, RUSTFS_SECRET_KEY) and re-run the upgrade.
The init مهمة is idempotent; it can be retried by deleting and
re-applying via ترقية Helm :
kubectl -n ilum delete job init-rustfs-buckets || true
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values
Credentials lookup error on ترقية Helm
Symptom
ترقية Helm fails at render time with a message similar to:
Error: ... values don't meet the specifications of the schema(s) ...
... ilum-objectstorage-credentials lookup is missing required keys ...
Likely cause
The chart resolves credentials in this order: live Secret values via
lookup (when objectStorage.credentials.preserveExisting=true), then
the literal defaults in القيم.yaml . When the live Secret exists
but is missing one of the six aliased keys, the lookup returns an
incomplete dictionary and the template fails the schema check.
Recovery
Either re-create the Secret with all six aliased keys, or disable the
lookup and let the chart re-render the defaults:
# Option A: repopulate the Secret.
kubectl -n ilum delete secret ilum-objectstorage-credentials
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values
# Option B: force deterministic render (loses any rotated credentials).
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--set objectStorage.credentials.preserveExisting=false
PVC bound to wrong StorageClass
Symptom
The provider's StatefulSetأو نشر stays Pending. The pod's
events log a message similar to:
0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims
Likely cause
The chart-default storageClassName resolves to a class that does not
match a CSI driver available on the cluster. This is common when moving
the chart between cloud providers without overriding the storage class.
Recovery
Deleting an existing PersistentVolumeClaim deletes the underlying
volume on most CSI drivers. Use this recipe on net-new installs only.
Set the correct storage class and re-roll the PVCs:
kubectl -n ilum get storageclass
kubectl -n ilum delete pvc -l app.kubernetes.io/name=rustfs
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--set rustfs.persistence.storageClass=<cluster-storage-class>
For pre-existing data, snapshot the source PVC and restore against the correct storage class before deletion. See Back Up and Restore Object Storage.
Post-cutover consumer still writes to the previous provider
Symptom
objectStorage.cutoverAcknowledged=true is set and mc diff confirms
data parity, but one or more bundled consumers continue writing into
the old provider's bucket.
Likely cause
The consumer cached its S3 endpoint at startup and has not refreshed
since the cutover. The ilum-objectstorage Service alias re-targets
the new provider instantly, but consumers that resolve the alias once
on Pod startup do not pick up the change until they restart.
ال إيلوم UI rolls automatically when the ترقية Helm regenerates
the ILUM-UI خريطة التكوين . Other consumers do not.
Recovery
Restart every consumer that targets the alias:
kubectl -n ilum rollout restart \
deploy/ilum-core \
deploy/ilum-jupyter \
deploy/ilum-mlflow \
deploy/ilum-kestra \
deploy/ilum-langfuse-web \
statefulset/ilum-hive-metastore
Long-running Spark driver Pods are unaffected: each Spark job creates its own S3 client and resolves the alias afresh.
Reference
- Object Storage Overview for the alias model.
- Migrate Between Providers for the data migration playbook.
- Add a New Provider for plugging in new backends.
- Back Up and Restore Object Storage for data protection recipes.
- Object Storage Helm Values for the value reference.