تخطي إلى المحتوى الرئيسي

Run Interactive Spark Job Service on Kubernetes

ال Job Type Service in Ilum is designed for scalable batch processing and production data pipelines on Kubernetes. Unlike Code type services that maintain persistent sessions, Job type services execute predefined artifacts (JARs or Python scripts) in isolated runs. This ensures complete repeatability and a clean state for each execution—making them ideal for scheduled transformations, automated ETL pipelines, and reliable production workloads.

You can use the Python example script from this Python Spark Job Example.

Interactive Demo

الدليل في وضع ملء الشاشة

Service Creation Methods

Step-by-Step Guide: How to Run a Spark Job Service

Here is a step-by-step tutorial to setting up an interactive Spark job service using the Ilum UI. This guide covers configuration, resource allocation, and job execution.

  1. Navigate to Services: Access the 'Services' section in your Ilum dashboard.

  2. Create New Service: Click the "New Service +" button to start setting up your interactive service environment.

  3. Configure General Settings:

    • اسم : Enter a unique name for your service (e.g., MyJobService)
    • عنقود : Select the Kubernetes cluster where you want to deploy the service.
    • نوع : Select مهمة for batch processing with isolated executions.
    • اللغة : Choose بايثون (or Scala/Java).

    Configure General Settings for Spark Job Service

  4. Configure Resources:

    • انتقل إلى الزر موارد التبويب.
    • Add Python Script: In the PyFiles section, click "Click To Upload" and select your Python scriptملف.
    • Requirement: The script must contain a class that extends Ilum Job with a ركض method accepting شراره و التكوين parameters.

    Upload Python Script for Spark Job

    All configuration options are explained in detail in the Service Configuration Reference section below.

  5. Activate the Service: Submit the form to deploy the service.

  6. Verify Deployment: Once activated, locate your service in the Services list. Wait for the status to change from "Starting" to "Running", then click on the service to view its dashboard.

Execution Methods

  1. Execute the Spark Job:

    • Class Name: Enter the fully qualified class name: Ilum_interactive_spark_pi. SparkPiInteractiveExample .
    • البارامترات : Provide job parameters in JSON format (e.g., {"partitions": "5"}).

    Execute Interactive Spark Job in Ilum UI

    معلومات

    Unlike Code type services, each execution triggers a fresh, isolated run of your job on the cluster.

  2. Run and Optimize:

    • نقر أعدم . The first run may experience a "cold start" delay as resources are allocated.
    • Subsequent runs will benefit from caching if resources are kept alive (depending on dynamic allocation settings).
  3. Iterate with Parameters:

    • Adjust the JSON parameters and re-execute to test different scenarios without redeploying the service.
  1. المراقبة والتعديلات :
    • Navigate to the service details to review all requests sent to the service.
    • افحص معلمات ونتائج كل طلب محدد.
    • راقب المخطط الزمني للتنفيذ واستخدام الذاكرة لكل منفذ في قسم المنفذين.
    • تحقق من السجلات للحصول على معلومات التنفيذ التفصيلية.

استنتاج : Congratulations on successfully setting up and running your interactive service in Ilum!

ملاحظه

This guide demonstrates the UI workflow for learning and testing purposes. In production environments, job executions are typically automated through the واجهة برمجة تطبيقات Ilum , allowing integration with orchestration tools, schedulers, and CI/CD pipelines. The API provides programmatic access to all service operations shown in this guide.

Frequently Asked Questions (FAQ)

Details

How is the Ilum Job Service different from شرارة تقديم ? Ilum Job Service wraps Spark applications in a managed Kubernetes service. Unlike raw شرارة تقديم , Ilum provides a REST API for execution, automatic resource management, history tracking, and the ability to re-run jobs with different parameters without redeploying artifacts.

Details

Can I schedule these interactive jobs? Yes. Since every job execution is triggered via a REST API call, you can easily integrate Ilum Job Services with orchestrators like Apache Airflow, Dagster, or simple cron jobs to schedule your data pipelines.

Details

Does the Job Service maintain state between runs? No. The Job Service is designed for stateless batch processing. Each execution starts in a clean environment to ensure reproducibility. If you need to share state (e.g., DataFrames) between steps, consider using the Interactive Code Service.

For further information or support, reach out to us at [البريد الإلكتروني محمي] .

What is a Job Type Service?

A Job type service is designed for batch processing with isolated, repeatable executions. Here's how it works:

  • No Persistent Session: You prepare an artifact (JAR or Python script), execution parameters, and Spark configuration. Ilum starts a fresh set of Kubernetes pods (driver + executors) only for the duration of the task.
  • Clean State: After completion, everything shuts down, and no state is carried over to subsequent runs. Each execution starts with a clean slate, ensuring full repeatability.
  • Cold Start: You accept a cold start with each invocation in exchange for complete isolation and reproducibility.
  • Perfect For: Scheduled transformations, large predefined processing tasks, reliable data pipelines that run cyclically without manual supervision, and production ETL workflows.

Understanding Services in Ilum

In Ilum, a خدمة is an abstraction that defines a computational environment—essentially a scalable group of Spark jobs with a shared language (e.g., PySpark or Scala), image or environment with necessary libraries, and Spark configuration (number of executors, memory, cores). Services can optionally include additional files or JARs.

Under the hood, each service consists of Kubernetes pods: one driver pod that manages Spark work, receives code, divides it into tasks, and collects results, plus executor pods that perform computations in parallel. The speed and scale of your work depends on the configured number of pods, memory, and CPU.

Service Types in Ilum

Ilum offers two types of services, each designed for different use cases:

Job Type Service (This Guide)

ال مهمة type is what this guide focuses on—designed for batch processing and production pipelines:

  • Isolated Executions: Each run is completely independent, starting fresh and shutting down after completion
  • Repeatability: Every execution begins with a clean state, ensuring consistent, reproducible results
  • Artifact-Based: You submit complete JARs or Python scripts with all dependencies
  • API Operations: Execute synchronously (wait for result) or asynchronously (track with jobInstanceId)
  • حالات الاستخدام : Scheduled ETL, data pipelines, batch transformations, production workloads

Code Type Service

ال رمز type provides an interactive REPL environment (covered in the Interactive Code Service guide):

  • Persistent Session: Maintains running pods with preserved state between executions
  • Stateful: DataFrames, variables, and libraries remain available across requests
  • Snippet-Based: Send code fragments on demand via UI or API
  • حالات الاستخدام : Exploratory analysis, notebooks (Jupyter/Zeppelin), rapid prototyping

Comparison: Spark Job Service vs. Interactive Code Service

The following table compares the two service types to help you choose the right execution model for your workload.

الجانب Job Type (This Guide)Code Type
Session ModelStateless; fresh start for every executionPersistent session; always running
State RetentionIsolated runs; no shared statePreserves DataFrames, variables, and libraries
Startup LatencyCold start (pod creation) per runWarm start; immediate execution
Ideal Use CaseBatch processing, scheduled ETL, production pipelinesInteractive analysis, notebooks, rapid prototyping
Resource EfficiencyConsumes resources only during executionContinuous consumption (unless paused)
Input MethodPre-compiled artifacts (JAR/Script)Ad-hoc code snippets

Use Job Type when:

  • You have predefined, repeatable batch processing tasks
  • You need complete isolation between runs for reproducibility
  • You're running scheduled or automated data pipelines
  • You want to minimize resource consumption (only run when needed)
  • You need audit trails and reproducible production workflows

Use Code Type when:

  • You need to explore data interactively and iterate quickly
  • You're developing and testing code in a notebook environment
  • You want to maintain context between multiple operations
  • You need immediate feedback without cold start delays

Benefits of Job Type Services

Job type services provide key advantages for production data workflows:

  1. Complete Isolation and Repeatability:

    • Each execution starts with a clean slate, ensuring consistent, reproducible results
    • No state leakage between runs eliminates unexpected behavior
    • Perfect for production environments requiring audit trails and deterministic outcomes
  2. Resource Efficiency:

    • Resources are allocated only during job execution, then released
    • No idle resource consumption when jobs aren't running
    • Cost-effective for scheduled or infrequent workloads
  3. Production-Ready Workflows:

    • Designed for scheduled transformations and automated pipelines
    • Integrates seamlessly with orchestration tools and schedulers
    • Reliable execution with comprehensive logging and monitoring
  4. قابلية التوسع :

    • Easy to run multiple job instances in parallel
    • Built-in load distribution across available cluster resources
    • Handles large-scale batch processing efficiently

Service Configuration Reference

General Tab

  • اسم: A unique identifier for the interactive service. This name will be used to track the service's status and logs within the Ilum dashboard.
  • Cluster: The specific cluster where the service's resources will be allocated. Ensure the selected cluster has sufficient capacity for your interactive session.
  • Scale: The number of replicas to launch for this service. Increasing the scale allows for load balancing and higher concurrency if multiple users are accessing the same service.
  • Type: The operational mode of the service. Select مهمة to run a specific job class interactively with isolated executions, or رمز for an interactive REPL environment with persistent session state.