تخطي إلى المحتوى الرئيسي

Scheduling Apache Spark Jobs

Scheduling in Ilum allows you to automate the execution of Apache Spark jobs on كوبرنيتيس clusters at specified intervals using CRON expressions. This is essential for setting up reliable ETL pipelines, regular data analysis, or maintenance tasks that need to run without manual intervention.

Example JAR Source

يمكنك استخدام ملف jar مع أمثلة شرارة من هذا رابط .

Step-by-Step Guide: Scheduling a Spark Job

  1. Navigate to Schedules: Access the جداول section in your Ilum dashboard.

  2. Create New Schedule: Click the New Schedule + button to start setting up your automated job.

  3. Fill Out Schedule Details:

    • علامة التبويب عام:

      • اسم: دخل ScheduledMiniReadWriteTest
      • Cluster: Select your target cluster
      • فصل: دخل org.apache.spark.examples.MiniReadWriteTest
      • Language: Select سكالا
    • علامة تبويب التوقيت:

      • CRON Expression: Select the تقليد التبويب.
      • Custom expression:دخل @daily
      Timing

      This configuration will trigger the job to run once every day at midnight. You can adjust this to any valid CRON expression (e.g., 0 */12 * * * for every 12 hours).

    • Configuration Tab:

      • Arguments:دخل /opt/spark/examples/src/main/resources/kv1.txt
    • علامة تبويب الموارد:

      • Jars: Upload the JAR file from the link above.
    • Memory Tab:

      • Leave all settings at their default values for this example.
  4. Submit and Monitor:

    • نقر إرسال to create the schedule.
    • You can see your new schedule in the list.
    • When the scheduled time arrives, a new job instance will be launched automatically. You can view these instances in the وظائف section.

Schedule Configuration Reference

Below is a detailed breakdown of all available settings, organized by tab as they appear in the UI.

Parameterوصف
اسم A unique identifier for the schedule.
عنقود The target cluster where the scheduled jobs will be executed.
فصل The fully qualified class name of the application (e.g., org.apache.spark.examples.SparkPi) or the filename for Python scripts.
اللغة The programming language used for the job (سكالا أو بايثون ).
وصف An optional description to explain the purpose of this schedule.
Max RetriesThe maximum number of times Ilum will attempt to restart the job if it fails.

الأسئلة المتكررة

Details

Can I schedule PySpark jobs using Ilum? Yes, Ilum fully supports scheduling for both Scala/Java (JARs) and Python (PySpark) jobs. Simply select "Python" as the language in the General tab and provide your script.

Details

How does the retry mechanism work? If a scheduled job fails, Ilum can automatically attempt to restart it based on the "Max Retries" configuration. This ensures transient issues don't break your pipelines.

Details

What CRON formats are supported? Ilum supports standard Unix-style CRON expressions (e.g., 0 12 * * *) as well as predefined macros like @daily, @hourly, etc.