تخطي إلى المحتوى الرئيسي

Interactive Spark Code Service on Kubernetes

Running interactive أباتشي سبارك sessions on Kubernetes can be complex. Ilum simplifies this by providing a robust Interactive Code Service for real-time code execution and data analysis.

Designed to facilitate an iterative development workflow, this service maintains a persistent interpreter session (REPL). It empowers data scientists and developers to execute Spark code fragments on-the-fly while retaining context and variables between steps. This is ideal for deep data exploration, rapid prototyping, and integration with notebooks like Jupyter or Zeppelin.

الدليل التفاعلي

الدليل في وضع ملء الشاشة

Creating Interactive Spark Services

You can create interactive Spark sessions through the Ilum UI or programmatically via REST API.

Step-by-Step Guide: Setting Up an Interactive Spark Session

Here's a guide to launching an interactive Spark environment using the Ilum dashboard.

  1. Navigate to Services: Access the 'Services' section in your Ilum dashboard.

  2. Create New Service: Click the "New Service +" button to start setting up your interactive service environment.

  3. Configure General Settings:

    • اسم : Enter a unique name for your service (e.g., MyCodeService)
    • عنقود : Select the Kubernetes cluster where you wish to run your interactive service.
    • نوع : Select رمز to initialize the interactive REPL environment.
    • اللغة : Choose سكالا أو بايثون (PySpark).

    Creating an Interactive Spark Service

  4. Optional Configuration: Navigate through the tabs to configure resources, Spark configurations, and dependencies. See the Interactive Service Configuration Reference below for details.

  5. Activate the Service: Click خلق to launch the persistent Spark session.

  6. Locate Your Service: Once the service is activated, find it in the Services list. The service may take a few moments to start up and will show a "Starting" status. Wait until the status shows "Running" before proceeding. Click on the service to view the service details.

Execution Methods

Once your service is running, you can start executing code snippets interactively.

Using the Interactive Execution Panel

  1. Open the Execution Panel: Click on the "Execute Job" button in the service details view.

  2. Write Code: Enter your Scala or Python code in the editor.

  3. أعدم : Run the code to see immediate results.

    Interactive Spark Job Execution

    The execution panel allows you to run Spark code and view results in real-time.

  4. Verify Results: Output is displayed below the editor.

    Interactive Result Example

استنتاج : Congratulations! You've successfully set up and run an Interactive Code Service in Ilum. This tool is perfect for developers and data scientists looking for a flexible, interactive environment to work directly with Spark in real-time.

What is an Interactive Spark Service?

An Interactive Spark Service (Code Type) is a persistent environment on Kubernetes that starts immediately and maintains an active session for real-time interaction. Here's how it works:

  • Persistent Session: Once created, the service launches Kubernetes pods (one driver + multiple executors) that remain running, creating a ready computational engine waiting for your commands.
  • Stateful Execution: You send code fragments (e.g., PySpark or Scala snippets) via the UI or API, and the engine executes them within the ongoing session context. Previously loaded DataFrames, defined variables, and imported libraries remain available across requests.
  • No Cold Start: Since the session is always warm, you get immediate execution without waiting for pod initialization on each request.
  • Perfect For: Exploratory data analysis, iterative development, rapid prototyping, and seamless integration with tools like Jupyter or Zeppelin.

Interactive Code vs Batch Jobs

الجانب Interactive Service (Code)Batch Job
SessionPersistent session, always runningNo session, fresh start each time
StatePreserves DataFrames, variables, librariesIsolated runs, no state between executions
StartupWarm, ready to execute immediatelyCold start for each execution
أفضل ل Interactive analysis, notebooks, prototypingBatch processing, scheduled pipelines, production ETL
Resource UsageContinuous (can be paused when idle)Resources only during execution
Execution ModelSend code snippets on demandSubmit complete artifacts (JAR/script)

Use Code Type when:

  • You need to explore data interactively and iterate quickly
  • You're developing and testing code in a notebook environment
  • You want to maintain context between multiple operations
  • You need immediate feedback without cold start delays
  • You're prototyping or experimenting with data transformations

Use Job Type when:

  • You have predefined, repeatable batch processing tasks
  • You need complete isolation between runs for reproducibility
  • You're running scheduled or automated data pipelines
  • You want to minimize resource consumption (only run when needed)

Benefits of Code Type Services

Code type services provide key advantages for interactive data work:

  1. Persistent Session State:

    • DataFrames, variables, and libraries remain available across requests
    • No need to reload data or reinitialize environments between operations
    • Seamless continuation of work where you left off
  2. Immediate Execution:

    • Warm session means no cold start delays
    • Get instant feedback on code changes
    • Perfect for iterative development and experimentation
  3. Notebook Integration:

    • Seamlessly integrates with Jupyter, Zeppelin, and other notebook environments
    • Send code cells directly to the service via API
    • Collaborative data exploration across teams
  4. Flexible Exploration:

    • Test hypotheses quickly without waiting for job initialization
    • Refine analyses on the fly based on intermediate results
    • Ideal for data scientists and analysts doing exploratory work
  5. تحسين الموارد :

    • Auto-pause feature conserves resources during inactivity
    • Quick resume when you're ready to continue
    • Balance between always-available and cost-effective

Interactive Service Configuration Reference

General Tab

  • اسم: A unique identifier for the interactive service. This name will be used to track the service's status and logs within the Ilum dashboard.
  • Cluster: The specific cluster where the service's resources will be allocated. Ensure the selected cluster has sufficient capacity for your interactive session.
  • وصف: A brief explanation of the service's purpose. This helps other users understand what the service is used for.
  • Type: The operational mode of the service. Select رمز to establish an interactive REPL environment for running ad-hoc code snippets, or مهمة to run a specific job class interactively.
  • Language: The programming language used for the service (سكالا أو بايثون ). This determines the runtime environment for your code.
  • Scale: The number of replicas (instances) to launch for this service. Increasing the scale allows for load balancing and higher concurrency if multiple users are accessing the same service.
  • Auto Pause: Automatically scales the service to zero replicas after a specified period of inactivity. This helps conserve resources when the service is not in use.

Common Questions (FAQ)

How do I persist variables between executions?

In an Interactive Code Service, the Spark session remains active between requests. Any variable defined or DataFrame loaded in one execution step is stored in the memory of the driver or executors and is available for subsequent steps, just like in a Jupyter notebook.

Can I use external libraries?

Yes. You can attach JARs (for Scala/Java) or PyFiles/Requirements (for Python) in the موارد tab during service creation. These dependencies are distributed to all Spark executors.

Does the service consume resources when idle?

Yes, because the pods are kept running to ensure immediate execution. However, you can configure Auto Pause in the General tab to automatically scale down the service to zero replicas after a set period of inactivity, saving costs.