Run Spark Jobs via REST API
نظره عامه
Ilum provides a robust واجهة برمجة تطبيقات REST that allows you to manage, submit, and execute Apache Spark jobs programmatically. This capability is essential for organizations running شرارة على Kubernetes who need to automate their data workflows.
Using the API is particularly effective for:
- CI/CD Integration: Seamlessly trigger Spark jobs from GitLab CI, Jenkins, GitHub Actions, or Airflow.
- Custom Orchestration: Build your own data platforms or internal tools on top of Ilum.
- اتمته : Replace manual
شرارة تقديمCLI commands with reliable, code-driven API calls.
REST API vs. Spark CLI
| ميزة | واجهة برمجة تطبيقات REST | Spark CLI (شرارة تقديم ) |
|---|---|---|
| Primary Use Case | Automation, CI/CD, Web Apps | Ad-hoc testing, Local development |
| Client Requirement | حليقه or HTTP Client | Spark Binaries & Java installed |
| Feedback Loop | JSON Response (Job ID) | Console Logs (Streamed) |
| Firewall Friendly | Yes (Single HTTP Port) | No (Requires Random Ports) |
In this guide, you will learn how to:
- Submit a Spark job using the
بيانات متعددة الأجزاء / النماذجendpoint. - Monitor the job's status via the API.
المتطلبات المسبقه
To follow this example, you will need the حليقه command-line tool and a sample Spark JAR file.
- Download Example JAR: spark-examples_2.12-3.5.7.jar
Accessing the API
The Ilum Core API is exposed by default on port 9888 . Depending on your environment, you can access it using one of the following methods:
In the examples below, replace http://localhost:9888 with your actual Ilum Core address.
1. Port Forwarding (Development)
If you are running the API on your local machine using a Kubernetes cluster (like Minikube or MicroK8s), you can use kubectl ميناء إلى الأمام to access it locally:
المنفذ إلى الأمام SVC / ILUM الأساسي 9888: 9888
The API will then be available at http://localhost:9888/api/v1 .
2. NodePort
If your Ilum installation is configured with a NodePort service type, you can access it via any Kubernetes node IP:
# Get the node IP
kubectl الحصول على العقد -o واسعة
# Get the assigned NodePort
kubectl get svc ilum-core
Access the API at http://<NODE_IP>:<NODE_PORT>/api/v1.
3. Ingress (Production)
For production environments, use an Ingress controller to expose the API. This allows you to use a custom domain and SSL/TLS encryption.
- مسار : /api/v1/(.*)
pathType: ImplementationSpecific
backend:
خدمة :
اسم : إيلوم - لب
ميناء :
number: 9888
Access the API at https://your-domain.com/api/v1.
Which Method Should I Use?
| Method | أفضل ل | Requirement |
|---|---|---|
| Port Forwarding | Local development, one-off tests | كوبيكتل access to the cluster |
| NodePort | Internal lab environments, simple setups | Access to Kubernetes Node IPs |
| Ingress | Production, Team collaboration, CI/CD | Ingress Controller (Nginx, Traefik, etc.) |
Submit Apache Spark Jobs Programmatically
To submit a new Spark application, use the POST /api/v1/job/submit endpoint. This endpoint accepts بيانات متعددة الأجزاء / النماذج requests, allowing you to upload your application JAR or Python script along with the job configuration. This method is the programmatic equivalent of شرارة تقديم .
Example: Submitting MiniReadWriteTest
The following حليقه command submits the MiniReadWriteTest example job (from the downloaded JAR). This job writes a file and then reads it back to verify the setup.
curl -X POST "http://localhost:9888/api/v1/job/submit" \
-F "name=MiniReadWriteTest" \
-F "clusterName=default" \
-F "language=SCALA" \
-F "jobClass=org.apache.spark.examples.MiniReadWriteTest" \
-F "jobConfig=spark.executor.instances=2" \
-F "args=/opt/spark/examples/src/main/resources/kv1.txt" \
-F "jars=@spark-examples_2.12-3.5.7.jar"
Parameter Reference
| Parameter | نوع | وصف | Required | مثل |
|---|---|---|---|---|
اسم | خيط | A unique identifier for your job. | نعم | MiniReadWriteTest |
clusterName | خيط | The name of the Kubernetes cluster registered in Ilum. | نعم | افتراضي |
اللغة | خيط | The programming language of the job (SCALAأو PYTHON). | نعم | SCALA |
فئة الوظيفة | خيط | سكالا : The fully qualified main class name. بايثون : The script filename (without extension). | نعم | org.apache.spark.examples.MiniReadWriteTest |
jobConfig | خيط | Semicolon-separated List of Spark configuration properties in key=value format. | لا | spark.executor.instances=2 |
أرجس | خيط | Semicolon-separated list of arguments to pass to the job's main method. | لا | /path/to/input.txt |
الجرار | ملف | The application JAR file. Use the @ prefix in curl to upload the file. | Yes (for Scala) | @app.jar |
pyFiles | ملف | The main Python script or ZIP package. | Yes (for Python) | @job.py |
For a complete list of all available parameters and their detailed descriptions, refer to the وثائق واجهة برمجة تطبيقات Ilum .
Monitor Spark Job Status
Upon successful submission, the API returns a JSON response containing the معرف الوظيفة . You can use this ID to poll for the job's completion status, making it easy to build wait-logic into your automation scripts.
{
"معرف العمل" : "20251222-0931-f56pqk5y1ap"
}
You can use this معرف الوظيفة to check the current status of your job:
curl "http://localhost:9888/api/v1/job/{jobId}"
The response provides a comprehensive overview of the job's configuration, state, and execution timing.
{
"معرف العمل" : "20251222-0931-f56pqk5y1ap",
"اسم الوظيفة" : "MiniReadWriteTest",
"نوع الوظيفة" : "أعزب" ,
"اللغة" : "سكالا" ,
"appId" : "spark-92b3da7ee0fa4d1e965b521ba356544c",
"الدولة" : "انتهى" ,
"وقت الإرسال" : 1766395898079,
"وقت البدء" : 1766395899941,
"نهاية الوقت" : 1766395905785,
"jobConfig" : {
"spark.executor.instances": "2" ,
"spark.kubernetes.namespace": "افتراضي" ,
"spark.eventLog.enabled": "صحيح" ,
"..." : "..."
}
}
Key fields to monitor include:
حالة: The current lifecycle phase (e.g.,SUBMITTED,RUNNING,FINISHED,FAILED).appId: The Spark Application ID assigned by the cluster manager.startTime/endTime: Epoch timestamps (ms) for performance tracking.error: If the state isFAILED, this field will contain the error message or stack trace.
Troubleshooting Common Issues
If you encounter issues while submitting jobs, refer to the table below for common error codes and solutions.
| HTTP Code | Error | Possible Cause & Solution |
|---|---|---|
| 400 | Bad Request | Missing Parameters: Ensure فئة الوظيفة , clusterName و الجرار (for Scala) are provided correctly in the form data. |
| 401 | Unauthorized | Auth Failure: Check if your cluster requires an API Token or Basic Auth header. |
| 404 | Not Found | Invalid Cluster: The clusterName specified does not exist. Verify active clusters via GET /api/v1/cluster. |
| 500 | Internal Server Error | Cluster Connection: Ilum cannot talk to the K8s API server. Check the إيلوم كور logs for connectivity issues. |
Frequently Asked Questions (FAQ)
Can I upload Python dependencies?
Yes. For PySpark jobs, use the pyFiles parameter to upload your .py script or a .zip archive containing your Python modules.
How do I secure the API?
We recommend placing the Ilum API behind an Ingress Controller with Basic Auth or OAuth2 enabled. You can then pass the credentials via standard HTTP headers.
What is the maximum JAR size?
The default limit is usually 100MB (configured in your Ingress or Spring Boot settings). For larger JARs, we recommend uploading them to S3/HDFS first and referencing them via spark.jars config, rather than uploading directly.