Securely Managing Secrets for Spark Jobs
Security is critical when handling sensitive data like passwords, API keys, and connection strings. Hardcoding credentials in your job code or Docker images is a major security risk. Instead, إيلوم leverages native Kubernetes Secrets to inject credentials securely at runtime.
- Create: Store sensitive data in Kubernetes Secrets (
kubectl create secret generic). - Mount: Inject secrets into Spark drivers/executors as Environment Variables (recommended) or Volume Mounts.
- Verify: Access them in your code (e.g.,
os.environ.get) without exposing them in logs.
Step 1: Creating Kubernetes Secrets for Spark
First, you need to create a Kubernetes Secret object. This secret must reside in the same namespace where your Ilum Job will be executed (typically افتراضي or your specific tenant namespace).
kubectl create secret generic my-db-creds \
--from-literal=username=admin \
--from-literal=password=SuperSecret123 \
-n default
Secrets are namespaced objects. They must be created in the same namespace where your Spark jobs run. For Ilum, this is typically افتراضي . Use the -ن <namespace> flag to specify the target namespace.
Step 2: Mounting Secrets in Ilum Spark Jobs
You can expose the secret to your Spark driver and executors in two ways: as environment variables or as files via volume mounts.
Method A: Environment Variables (Recommended)
This is the most common method for passing simple keys like database passwords or API tokens.
- Spark Configuration
- Python (PySpark)
- Scala / Java
When submitting a job (via Ilum UI, API, or CLI), add the following Spark configurations to map secret keys to environment variables:
# Valid for Spark on Kubernetes
# Syntax: spark.kubernetes.[driver|executor].secretKeyRef.[ENV_VAR_NAME]=name-of-secret:key-in-secret
spark.kubernetes.driver.secretKeyRef.DB_PASSWORD= my-db-creds:password
spark.kubernetes.executor.secretKeyRef.DB_PASSWORD= my-db-creds:password
spark.kubernetes.driver.secretKeyRef.DB_USER= my-db-creds:username
spark.kubernetes.executor.secretKeyRef.DB_USER= my-db-creds:username
In your Python code, access the credentials using the standard osمكتبة:
استورد os
# Retrieve credentials securely from environment
db_user = os. environ. حصل ( "DB_USER")
db_pass = os. environ. حصل ( "DB_PASSWORD")
لو لا db_pass:
raise ValueError( "Database password not found in environment variables")
طبع ( f"Connecting as user: { db_user} " ) # Safe to log username, never log password!
In Scala, use System.getenv:
val dbUser = System. getenv( "DB_USER")
val dbPass = System. getenv( "DB_PASSWORD")
Method B: Volume Mounts
Use this method when your application expects a file path (e.g., SSL certificates, Keytab files, or configuration files).
- Spark Configuration
- Python Code
Mount the entire secret as a directory. Each key in the secret becomes a file in that directory.
# Mount secret 'my-db-creds' to /etc/secrets/db
spark.kubernetes.driver.secrets.my-db-creds= /etc/secrets/db
spark.kubernetes.executor.secrets.my-db-creds= /etc/secrets/db
Read the file contents from the mounted path:
# Read password from the mounted file
مع open( '/etc/secrets/db/password', 'r') مثل f :
db_pass = f . قرأ ( ) . strip( )
مع open( '/etc/secrets/db/username', 'r') مثل f :
db_user = f . قرأ ( ) . strip( )
Best Practices: Env Vars vs. Volumes
| ميزة | Environment Variables | Volume Mounts |
|---|---|---|
| أفضل ل | Simple strings (API Keys, Passwords, URLs) | Files (Certificates, JSON configs, Keytabs) |
| Complexity | Low (Standard os.environ access) | Medium (File I/O required) |
| Updates | Requires Job Restart | Can update live (if app supports reloading) |
| أمن | Visible in process dump (rare risk) | Writes to tmpfs (memory), safer for large secrets |
Step 3: Using Secrets in Airflow DAGs
If you are orchestrating jobs with the Ilum Airflow integration, you can define these configurations directly in your DAGs using the IlumSparkSubmitOperator.
من airflow استورد DAG
من إيلوم . airflow. operators استورد IlumSparkSubmitOperator
# ... (DAG definition)
submit_spark_job = IlumSparkSubmitOperator(
task_id= "secure_spark_job",
spark_conf= {
"spark.kubernetes.driver.secretKeyRef.DB_PASSWORD": "my-db-creds:password",
"spark.kubernetes.executor.secretKeyRef.DB_PASSWORD": "my-db-creds:password"
}
# ... other configurations
)
Troubleshooting & Verification
verifying Secret Existence
Before running your job, verify the secret exists and contains the correct data:
# List secrets in the namespace
kubectl get secrets -n default
# Decode secret values (for debugging only!)
kubectl get secret my-db-creds -o jsonpath='{.data.password}' -n default | base64 --decode
Debugging Running Pods
If your job fails with authentication errors, inspect the running pod to ensure variables are mounted correctly.
# 1. Find the driver pod name
kubectl get pods -n default | grep spark-driver
# 2. Check environment variables inside the pod
kubectl exec -it <spark-driver-pod-name> -n default -- env | grep DB_
Never print secret values to the console or logs in your production code. If you must debug, print only the first few characters or a checksum/hash of the secret.
Frequently Asked Questions (FAQ)
How do I access secrets in PySpark?
The most secure way is to map the Kubernetes secret to an environment variable using the spark.kubernetes.driver.secretKeyRef.[VAR_NAME] configuration. Then, in your PySpark script, use import osو os.environ.get('VAR_NAME') to retrieve the value.
Can I use external secret stores (Vault, AWS Secrets Manager)?
Yes. You can use the Kubernetes Secrets Store CSI Driver to sync secrets from external providers (HashiCorp Vault, AWS, Azure, GCP) into native Kubernetes Secrets. Once synced, Ilum consumes them just like standard Kubernetes secrets.
Why is my secret not visible to the Spark job?
Common reasons include:
- Namespace Mismatch: The secret exists in
افتراضيbut the job is running inspark-jobs. - Typo in Key Name: The key in the secret (e.g.,
db-pass) doesn't match the config (e.g.,db_pass). - Service Account Permissions: The Spark Service Account might lack
get/listpermissions for Secrets (though standard Ilum setups handle this).