تخطي إلى المحتوى الرئيسي

Create and Connect a Remote GKE Cluster to Data Lakehouse

مقدمة

Ilum empowers you to manage a powerful multi-cluster setup from a single, central control plane. While Ilum automates the deployment and configuration of its core components, setting up the underlying infrastructure requires precise coordination.

This guide provides a comprehensive walkthrough for setting up a multi-cluster architecture on محرك جوجل Kubernetes (GKE) . You will learn how to:

  1. Provision a central control plane (Master Cluster).
  2. Set up a dedicated execution environment (Remote Cluster).
  3. Establish secure communication between them using client certificates and ingress rules.

This guide walks you through the steps required to launch your first Ilum Job on a remote cluster. We use Google Kubernetes Engine (GKE) as an example, but you can follow the same flow with any Kubernetes distribution.

المتطلبات المسبقه

Before starting the tutorial, make sure you have:

  1. Access to a Google Cloud project with billing enabled.
  2. كوبيكتل installed and configured on your machine (version compatible with your GKE cluster).
  3. دومان installed (v3+).
  4. ال Google Cloud CLI (gcloud) installed and initialized (you can run gcloud auth loginو gcloud config list without errors).
  5. ال gke-gcloud-auth-plugin installed and available in your PATH so that كوبيكتل can authenticate to GKE clusters.
  6. Permissions in the target Google Cloud project to:
    • create and manage GKE clusters (e.g. Kubernetes Engine Cluster Admin or equivalent),
    • create and use Cloud Storage buckets if you plan to use GCS for data.

What you'll accomplish in this guide:

StepTaskPurpose
1 Create two GKE clustersSet up master (control plane) and remote (job execution)
2 Install Ilum on masterDeploy Ilum's core components
3 Set up authenticationCreate secure credentials for remote cluster access
4 Register remote clusterAdd cluster to Ilum's management interface
5 Configure networkingEnable communication between clusters
6 Run your first jobVerify the multi-cluster setup works

Step 1. Provision Master and Remote GKE Clusters

The foundation of a multi-cluster setup consists of two distinct entities:

  • Master Cluster: Hosts the Ilum control plane (UI, API, Scheduler).
  • Remote Cluster: dedicated environment where Ilum executes the Spark Jobs dispatched from the master.

إنشاء مشروع

  • Open Google Cloud Console.
  • Click the Project selector in the top-left corner.
  • نقر New Project.
  • Enter a project name and (if applicable) select an Organization/Folder.
  • نقر خلق .
  • Select the newly created project in the project selector.

Enable Google Kubernetes Engine API

  • In the Console search bar, type محرك Kubernetes .
  • Open محرك Kubernetes .
  • نقر تمكين to enable the Google Kubernetes Engine API for the selected project.

Switch to the chosen project in gcloud

  • In the Console, open the project selector and copy the Project ID.
  • In your terminal, set this project as active:
Set Project
مجموعة تكوين gCloud PROJECT_ID المشروع 
  • (Optional) If you plan to create clusters in a specific region often, set a default region:
Set Default Region
gcloud config set compute/region europe-central2

This avoids errors requiring --region/ --zone.

إنشاء نظام مجموعة

Create the master cluster first:

Create Master Cluster
تقوم مجموعات حاويات gCloud بإنشاء مجموعة رئيسية \ 
- نوع الآلة = N1 - قياسي - 8 \
--num-nodes=1

Create the remote cluster with a different name:

Create Remote Cluster
gcloud container clusters create remote-cluster \
--machine-type=n1-standard-4 \
--num-nodes=1
important

Resource Requirements & Architecture:

Why two clusters? The master cluster runs Ilum's control plane (UI, API, scheduler). The remote cluster executes your Spark jobs. This separation allows independent scaling and multi-cluster management from one interface.

Sizing:

  • Master cluster: This example uses n1-standard-8 (8 vCPU, 30 GB RAM) for testing only. Minimum recommended: 12 vCPUs and 48 GB RAM (e.g., n1-standard-12). Production environments with many users need significantly more.
  • Remote cluster: This example uses n1-standard-4 (4 vCPU, 15 GB RAM) for testing only. Production workloads require larger machines (e.g., n1-standard-16+) and multiple nodes depending on your Spark job requirements.

Step 2. Install Ilum Control Plane on Master Cluster

Once your clusters are running, the next step is to deploy the Ilum platform on the master cluster.

قم بالتبديل إلى تكوين نظام المجموعة الرئيسي في kubectl

Switch Context
# احصل على سياقات kubectl وابحث عن السياق الذي يحتوي على مجموعتك 
kubectl config get-contexts
# switch to it using this command
kubectl config use-context MASTER_CLUSTER_CONTEXT

إنشاء مساحة اسم والتبديل إليها في kubectl

Setup Namespace
kubectl إنشاء مساحة اسم ilum 
kubectl config set-context --current --namespace=ilum

Install Ilum using Helm charts:

تثبيت Ilum
خوذة الريبو إضافة ILUM https://charts.ilum.cloud 
helm install ilum -n ilum ilum/ilum
ملاحظه

What just happened? You've installed Ilum's control plane on the master cluster. You can now access Ilum's interface to manage jobs across multiple clusters.

Step 3. Configure Authentication for Remote Cluster Access

important

Goal: Generate secure credentials (client certificates) that authorize the Ilum control plane to deploy and manage resources on the نظام المجموعة البعيد .

مهم: All steps in this section must be performed while connected to the نظام المجموعة البعيد (not the master cluster).

Why this is needed:

When you created the remote cluster with gcloud, it was automatically added to your kubeconfig with your Google Cloud account credentials. However, Ilum doesn't have access to your Google account.

Ilum needs its own authentication method - client certificates - to connect to and manage the remote cluster. This section creates those certificates.

StepActionPurpose
1 Create Client KeyGenerate private key for authentication
2 Create CSRRequest identity verification from cluster
3 Register & Approve CSRCluster admin validates and signs request
4 Get Signed CertificateRetrieve the "ID card" for Ilum
5 Grant Admin PermissionsAuthorize Ilum to manage cluster resources
نتيجة Ilum can securely manage the cluster

This is similar to how كوبيكتل authenticates to clusters, but we're doing it programmatically for Ilum.

بقشيش

Before you start: Make sure you're connected to the نظام المجموعة البعيد context:

Switch to Remote Context
kubectl config use-context gke_<your-project>_<region>_remote-cluster

You can verify with: kubectl config current-context

Step 3.1: Create a Client Key

What we're doing: Generating a private key that will be used for authentication.

First, create a dedicated directory for the certificates and work there:

Create Workspace
mkdir -p ~/remote-cluster
cd ~/remote-cluster

Now generate the private key:

Generate Private Key
openssl genpkey -algorithm RSA -out client.key 
danger

Security: This private key is like a password. Keep client.key secure and never commit it to version control.

Step 3.2: Create Certificate Request

What we're doing: Creating a Certificate Signing Request (CSR) that asks the remote cluster to verify and sign our identity.

استبدل myuser with the username of your choice:

Create CSR
openssl req -new -key client.key -out csr.csr -subj "/ CN = myuser" 
بقشيش

What is a CSR? A Certificate Signing Request asks the Kubernetes cluster to create a signed certificate for your user. The cluster will verify and sign it, creating a trusted identity.

Step 3.3: Encode the CSR

What we're doing: Converting the CSR to base64 format, which is required by Kubernetes.

Encode CSR
csr.csr القط | قاعدة 64 | tr -d '\n' 

Copy the output - you'll need it in the next step.

Step 3.4: Register CSR in Kubernetes

What we're doing: Submitting the CSR to the remote cluster for approval.

خلق csr.yaml (replace myuser and paste your encoded CSR):

csr.yaml
apiVersion : certificates.k8s.io/v1 
نوع : شهادة التوقيع
البيانات الوصفية :
اسم : myuser # Replace with your username
المواصفات :
طلب : <paste your base64 encoded CSR here>
اسم الموقع : kubernetes.io/kube - خادم apiServer - عميل
استخدامات :
- مصادقة العميل

Apply the CSR to the نظام المجموعة البعيد :

Register CSR
kubectl تطبيق -f csr.yaml 

Step 3.5: Approve and Retrieve the Certificate

What we're doing: As cluster admin, we approve our own CSR and retrieve the signed certificate.

Approve the CSR:

Approve CSR
شهادة kubectl الموافقة على myuser 

Retrieve the signed certificate:

Retrieve Certificate
kubectl الحصول على CSR MyUser -o jsonpath='{.status.certificate}' | base64 - فك تشفير > client.crt 
ملاحظه

What happened? You've registered your certificate request with Kubernetes, approved it (as cluster admin), and retrieved the signed certificate. This certificate + private key pair is now your authentication credential for the remote cluster.

Step 3.6: Get CA Certificate

What we're doing: Extracting the cluster's CA certificate. This will be needed for the optional test in the next section, and later in Step 4 when adding the cluster to Ilum.

Get and save the CA certificate:

Get CA Certificate
kubectl config view --minify --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' | base64 --decode > ca.crt

Verify the certificate was saved correctly:

Verify CA Certificate
openssl x509 -in ca.crt -text -noout | head -n 5
بقشيش

Why do we need this?

  • CA Certificate: Verifies the cluster's identity (prevents man-in-the-middle attacks)
  • Flags: --minify gets only current context, --raw gets actual data instead of file paths

Step 3.7: Grant Permissions with Role Binding

What we're doing: Giving the certificate user (myuser) admin permissions on the remote cluster.

خلق rolebinding.yaml :

apiVersion : rbac.authorization.k8s.io/v1 
نوع : ClusterRoleBinding
البيانات الوصفية :
اسم : myuser - المشرف - الربط
المواضيع :
- نوع : مستخدم
اسم : myuser # Match your username
apiGroup : rbac.authorization.k8s.io
roleRef :
نوع : ClusterRole
اسم : عنقود - المشرف
apiGroup : rbac.authorization.k8s.io

Apply the role binding to the نظام المجموعة البعيد :

Apply Role Binding
kubectl تطبيق -f rolebinding.yaml 
تحذير

Security: We're granting cluster-admin for simplicity. In production, create a custom role with only the permissions Ilum needs:

  • Create/delete pods, services, configmaps
  • Read secrets
  • Manage persistent volumes

Step 3.8: Configure Service Account Permissions

What we're doing: Granting permissions to the ServiceAccount that Spark pods will use.

important

What is a Service Account? It's the Kubernetes equivalent of a user for pods. Spark driver pods use it to create executor pods and manage resources in the cluster.

By default, pods use the افتراضي ServiceAccount in their namespace, but it doesn't have sufficient privileges to create other pods.

خلق sa_role_binding.yaml :

sa_role_binding.yaml
apiVersion : rbac.authorization.k8s.io/v1 
نوع : ClusterRoleBinding
البيانات الوصفية :
اسم : إيلوم - افتراضي - المشرف - الربط
roleRef :
apiGroup : rbac.authorization.k8s.io
نوع : ClusterRole
اسم : عنقود - المشرف
المواضيع :
- نوع : ServiceAccount
اسم : افتراضي
Namespace : إيلوم # Namespace where Spark jobs will run

Apply the role binding to the نظام المجموعة البعيد :

Apply SA Role Binding
تطبيق kubectl -f sa_role_binding.yaml 
تحذير

Production Security: We're granting cluster-admin to the default ServiceAccount for simplicity. In production:

  1. Create a dedicated ServiceAccount (e.g., spark-driver) with minimal permissions
  2. Specify it in your cluster configuration:
    spark.kubernetes.authenticate.driver.serviceAccountName=spark-driver
  3. Grant only the permissions needed: create/delete pods, read configmaps/secrets, etc.

(Optional) Test the Client Certificates

بقشيش

Why this step? This optional step verifies that the certificates work correctly before we give them to Ilum in Step 4.

We'll create a temporary kubectl context that uses the certificates (instead of your Google Cloud credentials) to confirm they have the correct permissions.

1. Verify certificates are in the directory:

All certificate files should already be in ~/remote-cluster/ from the previous steps:

List Workspace Content
ls ~/remote-cluster/
# You should see: ca.crt client.crt client.key csr.csr csr.yaml

2. Get your remote cluster name:

Get Cluster Name
kubectl config view -o jsonpath='{.clusters[*].name}' | tr ' ' '\n' | grep remote

You should see something like: gke_my-project_europe-central2_remote-cluster

Copy this cluster name.

3. Add the certificate-based user to kubectl:

Set Credentials
kubectl config set-credentials myuser-cert \
--client-certificate=$HOME/remote-cluster/client.crt \
--client-key=$HOME/remote-cluster/client.key

This creates a new user in your kubeconfig that uses certificates instead of your Google account.

4. Create a test context:

Set Context
kubectl config set-context test-remote-certs \
--cluster=<paste-your-remote-cluster-name-here> \
--namespace=default \
--user=myuser-cert

استبدل <paste-your-remote-cluster-name-here> with the cluster name from step 2.

5. Test the certificate-based authentication:

Test Authentication
kubectl config use-context test-remote-certs
kubectl الحصول على القرون

Expected output:

لم يتم العثور على موارد في مساحة الاسم الافتراضية. 

If you see this (or a list of pods), the certificates work! ✅

6. Switch back to your normal context:

Revert Context
# Find your original context
kubectl config get-contexts

# Switch back (replace with your actual context name)
kubectl config use-context gke_<your-project>_<region>_remote-cluster
ملاحظه

What we just verified: The client certificates can successfully authenticate to the remote cluster with admin permissions. In Step 4, we'll provide these same certificates to Ilum so it can manage the cluster.


Step 4. Register the Remote Cluster in Ilum UI

With your certificates generated and permissions granted, you are now ready to connect the remote cluster to the Ilum control plane.

What you'll provide: All the credentials and connection details you created in Step 3:

ItemFile/ValuePurpose
CA Certificateكاليفورنيا CRT Verifies cluster identity
Client Certificateclient.crt Your authentication credential
Client Keyclient.keyPrivate key for authentication
عنوان URL للخادم From kubectlCluster API endpoint
Usernamemyuser User identity

Access Ilum UI

Before you can add the remote cluster to Ilum, you need to access the Ilum web interface.

1. Switch to the master cluster context:

Switch Context
kubectl config use-context gke_<your-project>_<region>_master-cluster

You can verify with: kubectl config current-context

2. Set up port forwarding to Ilum UI:

Port Forward UI
kubectl port-forward -n ilum svc/ilum-ui 9777:9777

Keep this terminal window open - it needs to run continuously.

3. Open Ilum in your browser:

Open http://localhost:9777 in your web browser.

4. Log in:

  • Username: المشرف
  • Password: المشرف

Now you're ready to add the remote cluster to Ilum!


Demo: Adding a Kubernetes Cluster

الدليل في وضع ملء الشاشة

انتقل إلى صفحة إنشاء نظام المجموعة

  • انتقل إلى قسم المجموعات في حمل العمل
  • نقر مجموعة جديدة زر

حدد الإعدادات العامة:

  • اختر اسما ووصفا تريدهما.
  • تعيين نوع نظام المجموعة إلى كوبرنيتيس
  • اختر إصدار شرارة

يتم تحديد إصدار Spark عن طريق تحديد صورة لمهام Spark.

يتطلب Ilum استخدام صوره الخاصة ، والتي تتضمن جميع التبعيات الضرورية المثبتة مسبقا.

فيما يلي قائمة بجميع الصور المتاحة:

Ilum Spark Versions Selection Screen

تحديد تكوينات الشرارة

سيتم تطبيق تكوينات Spark التي تحددها على مستوى نظام المجموعة على كل وظيفة Ilum فردية تم نشرها على تلك المجموعة

يمكن أن يكون هذا مفيدا إذا كنت تريد تجنب تكرار التكوينات. على سبيل المثال، إذا كنت تستخدم Iceberg كتالوج Spark الخاص بك، فيمكنك تكوينه مرة واحدة على مستوى نظام المجموعة، ثم ستستخدم جميع مهام Ilum المنشورة على نظام المجموعة هذه التكوينات تلقائيا.

Important: Set the namespace for Spark jobs

في المربع البارامترات section, add:

spark.kubernetes.namespace: ilum
ملاحظه

This ensures all Spark jobs run in the إيلوم namespace where you created the external services (ilum-core, ilum-grpc, ilum-minio) in Step 5. Without this, jobs would run in افتراضي namespace and couldn't connect to master cluster services.

إضافة مساحات تخزين

Default cluster storage will be used by Ilum to store all the files required to run Ilum Jobs both provided by user and by Ilum itself.

You can choose any type of storage: S3 , جي سي إس , WASBS , HDFS .

بقشيش

For detailed instructions on setting up storage (especially GCS), see the إنشاء مساحة تخزين guide.

Quick setup in the UI:

  • انقر فوق الزر "إضافة مساحة تخزين" زر
  • Specify the name and choose the storage type (S3, GCS, etc.)
  • اختار دلو شرارة - الحاوية الرئيسية المستخدمة لتخزين ملفات Ilum Jobs
  • اختار دلو البيانات - مطلوب في حالة استخدام تنسيق شرارة جداول Ilum
  • Specify storage endpoint and credentials
  • انقر فوق الزر "إرسال" زر

يمكنك إضافة أي عدد من التخزينات، وسيتم تكوين مهام Ilum لاستخدام كل منها.

انتقل إلى قسم Kubernetes وحدد تكوينات kubernetes

هنا ، نزود Ilum بجميع التفاصيل المطلوبة للاتصال بمجموعة Kubernetes الخاصة بنا. تشبه العملية الاتصال بنظام المجموعة من خلال أداة kubectl ، ولكن هنا يتم ذلك من خلال واجهة المستخدم.

You will need to provide the following:

FieldWhat to provideFile location
UrlCluster API endpoint (text field)⚠️ Switch to remote cluster first, then: kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}'
CaCertUpload CA certificate file~/remote-cluster/ca.crt (from Step 3.6)
ClientKeyUpload client private key file~/remote-cluster/client.key (from Step 3.1)
ClientCertUpload client certificate file~/remote-cluster/client.crt (from Step 3.5)
UsernameUsername (text field)myuser (or whatever you used in Step 3.2)

Once you have specified these items, Ilum will be able to access your Kubernetes cluster.

بالإضافة إلى ذلك ، يمكنك:

  • تكوين Kubernetes لطلب ملف شعار للمستخدم الخاص بك. في هذه الحالة ، ستحتاج إلى تحديد كلمة المرور هنا في واجهة المستخدم.
  • إضافة عبارة المرور عند إنشاء مفتاح العميل. إذا قمت بذلك ، فستحتاج إلى تحديد عبارة المرور هنا في واجهة المستخدم.
  • حدد الزر خوارزمية رئيسية تستخدم في client.key. هذا ليس إلزاميا ، حيث يتم تخزين هذه المعلومات عادة داخل المفتاح نفسه. ومع ذلك ، قد تكون هناك حالات تحتاج فيها إلى تعريفه بشكل صريح.

أخيرا ، يمكنك النقر فوق الزر إرسال لإضافة نظام مجموعة.


أخيرا ، يمكنك النقر فوق الزر إرسال لإضافة نظام مجموعة.

Step 5. Configure Multi-Cluster Networking

تحذير

The Challenge: While Ilum can now dispatch jobs to the remote cluster, those jobs need a way to report their status, logs, and metrics back to the master cluster.

Why jobs need to communicate:

ComponentActionTarget Serviceمكان
وظائف سبارك Send status updatesIlum Core (gRPC)Master Cluster
وظائف سبارك Send metrics & logsMinIO Event LogMaster Cluster
Scheduled JobsTrigger new jobsإيلوم كور Master Cluster

The Solution: Expose Ilum's services from the master cluster and create DNS aliases in the remote cluster.

Step 5.1: Configure Firewall

بادئ ذي بدء ، تحتاج إلى السماح بحركة المرور الواردة إلى مجموعة GKE الرئيسية الخاصة بك. مشروع Google Cloud ، تتم إدارة الوصول إلى حركة المرور بواسطة جدار الحماية الخاص به. يمكن إدارة جدار الحماية بواسطة القواعد. للسماح لنظام المجموعة البعيد بالوصول إلى خدمات نظام المجموعة الرئيسية، نحتاج إلى إضافة هذه القاعدة إلى مشروع مجموعتنا الرئيسية:

Create Firewall Rule
قواعد جدار حماية الحوسبة GCLOUD تنشئ السماح بحركة المرور \ 
--الشبكة الافتراضية \
- اتجاه الدخول \
--الإجراء السماح \
- قواعد TCP: 80 ، TCP: 443 \
- نطاقات المصدر 0.0.0.0 / 0 \
--description "Allow HTTP/HTTPS"
danger

Security Warning: --source-ranges 0.0.0.0/0 allows traffic from anywhere. In production, restrict this to your remote cluster's IP range:

--source-ranges 10.0.0.0/8  # Example: your cluster's CIDR

Step 5.2: Expose Services from Master Cluster

ملاحظه

What is LoadBalancer? It's a Kubernetes service type that provisions a public IP address, making the service accessible from outside the cluster (including from your remote cluster).

تحتاج إلى عرض الخدمات في نظام المجموعة الرئيسي للعالم الخارجي. لتحقيق ذلك ، قم بتغيير نوعها إلى LoadBalancer. يجعل Ilum هذه العملية واضحة باستخدام تكوينات Helm.

To expose Ilum Core, gRPC, and MinIO to the outside world, run the following command in your terminal:

Expose Services
helm upgrade ilum -n ilum ilum/ilum \
--set ilum-core.service.type="LoadBalancer" \
--set ilum-core.grpc.service.type="LoadBalancer" \
--set minio.service.type="LoadBalancer" \
--إعادة استخدام القيم

بعد ذلك يجب أن تنتظر بضع دقائق ثم تتحقق من خدماتك عن طريق التشغيل

Check Services
kubectl احصل على الخدمات 

لرؤية هذا:

GKE LoadBalancer Service Configuration showing External IPs

Here you can notice the ilum-core, ilum-grpc and minio services changed their type to LoadBalancer and got public IP. Now you can go to http://<public-ip>:9888/api/v1/group للتحقق مما إذا كان كل شيء على ما يرام.

Step 5.3: Create External Services in Remote Cluster

بقشيش

What is ExternalName? It creates a local DNS alias in the remote cluster. When a job tries to connect to ILUM النواة: 9888 , Kubernetes redirects it to the public IP. Jobs use familiar service names as if everything was in one cluster.

Switch to remote cluster context:

Before creating external services, make sure you're working on the نظام المجموعة البعيد :

Switch to Remote Cluster
kubectl config use-context gke_<your-project>_<region>_remote-cluster

خلق external_services.yaml in the remote cluster:

ملاحظه

Use the EXTERNAL-IP addresses from the kubectl احصل على الخدمات output in your master cluster (the public IPs shown in the screenshot above). Replace the placeholders below with these actual IP addresses.

external_services.yaml
apiVersion : الإصدار 1 
نوع : خدمة
البيانات الوصفية :
اسم : إيلوم - لب
Namespace : إيلوم # Must match the namespace where jobs will run
المواصفات :
نوع : الاسم الخارجي
الاسم الخارجي : 34.118.72.123 # Replace with actual EXTERNAL-IP from master cluster
الموانئ :
- ميناء : 9888
targetPort : 9888

---
apiVersion : الإصدار 1
نوع : خدمة
البيانات الوصفية :
اسم : إيلوم - جي بي سي
Namespace : إيلوم # Must match the namespace where jobs will run
المواصفات :
نوع : الاسم الخارجي
الاسم الخارجي : 34.118.72.124 # Replace with actual EXTERNAL-IP from master cluster
الموانئ :
- ميناء : 9999
targetPort : 9999

---
apiVersion : الإصدار 1
نوع : خدمة
البيانات الوصفية :
اسم : إيلوم - مينيو
Namespace : إيلوم # Must match the namespace where jobs will run
المواصفات :
نوع : الاسم الخارجي
الاسم الخارجي : 34.118.72.125 # Replace with actual EXTERNAL-IP from master cluster
الموانئ :
- ميناء : 9000
targetPort : 9000
important

Before applying:

  1. Replace the example IPs (34.118.72.123, etc.) with your actual EXTERNAL-IP addressesمن kubectl احصل على الخدمات in the master cluster
  2. Make sure the namespace: ilum matches the namespace where your Spark jobs will run in the remote cluster
  3. If you haven't created the إيلوم namespace in the remote cluster yet, create it first:
    Create Namespace
    kubectl إنشاء مساحة اسم ilum 

Apply the external services:

Apply External Services
kubectl apply -f external_services.yaml
ملاحظه

Multi-Cluster Bridge Complete: Jobs running on the remote cluster can now reach Ilum's services on the master cluster using familiar service names (ilum-core, ilum-grpc, ilum-minio), even though they're actually connecting over the internet via public IPs.

Components in Multi-Cluster Architecture

Components compatible with multi-cluster (with additional networking):

ComponentPurposeRequires Exposure
Hive Metastore Metadata management for tables✅ Yes
ماركيز Data lineage tracking✅ Yes
خادم التاريخ Spark application history✅ Yes
الجرافيت Metrics collection✅ Yes

Components restricted to single-cluster:

Componentسبب
كوبي بروميثيوس المكدس Prometheus needs direct pod access for metrics scraping. Challenging with dynamic pods across clusters.
لوكي وبرومتيل Promtail collects logs similarly to Prometheus. Same multi-cluster limitations.
معلومات

All other Ilum services not listed above are cluster-independent and work in both single- and multi-cluster setups.


Step 6. Verify the Multi-Cluster Setup

The final step is to validate your configuration by running a real Spark job on the remote cluster.

Step 1: Access Ilum UI

  1. Switch to master cluster context:

    kubectl config use-context gke_<your-project>_<region>_master-cluster
  2. Set up port-forwarding:

    kubectl port-forward -n ilum svc/ilum-ui 9777:9777

    Keep this terminal window open.

  3. Open Ilum in browser:

    • Navigate to http://localhost:9777
    • Login with default credentials: المسؤول:مشرف

Step 2: Create a Test Job

  1. Navigate to Jobs section in Ilum UI

  2. Click "New Job +"زر

  3. Configure the job:

    • اسم: RemoteClusterTest
    • Job Type: Spark Job
    • Cluster: Select your نظام المجموعة البعيد (not master)
    • فصل: org.apache.spark.examples.SparkPi
    • Language: سكالا
  4. Add Resources:

    • الانتقال إلى موارد tab
    • Jars: Upload this jar
  5. Submit the job

Step 3: Verify Execution

If everything is configured correctly:

Job starts successfully - Pods are created in the remote cluster

Logs appear - You can see Spark initialization and execution logs

Job completes - Final output shows: Pi is roughly 3.14...

Check remote cluster pods:

Check Remote Pods
kubectl config use-context gke_<your-project>_<region>_remote-cluster
kubectl الحصول على القرون -n ilum

You should see Spark driver and executor pods running or completed.

بقشيش

For detailed job configuration options, see the Run Simple Spark Job guide.


Troubleshooting & FAQ

Here are solutions to common issues you might encounter when connecting a remote GKE cluster.

Why is my Spark Job stuck in "Pending" state?

This usually happens if the remote cluster lacks resources or can't pull images.

  • Check Resources: Ensure your node pool has enough CPU/RAM.
  • Check Events: Run kubectl get events -n ilum on the remote cluster to see scheduling errors.
Why can't the remote cluster connect to Ilum Core?

If the job runs but fails to report status:

  • Verify Firewall: Ensure the master cluster's firewall allows ingress on ports 9888 (Core), 9999 (gRPC), and 9000 (MinIO).
  • Check DNS: Verify the الاسم الخارجي services in the remote cluster resolve to the master's public IP.
Can I use a private GKE cluster?

Yes, but you will need to configure VPC Peering or a VPN between the master and remote networks instead of using public LoadBalancers.


الخطوات التالية

Congratulations! You've successfully set up a robust multi-cluster Ilum environment. You can now:

  1. Deploy Spark jobs to your remote cluster from the Ilum UI
  2. Monitor job execution across all clusters from one interface
  3. Scale horizontally by adding more remote clusters using the same process
  4. Optimize costs by using different machine types for different workloads
important

Production Checklist:

  • Restrict firewall rules to specific IP ranges
  • Create custom RBAC roles instead of cluster-admin
  • Set up TLS certificates for LoadBalancer services
  • Configure resource quotas and limits
  • تمكين monitoring and alerting
  • Document your cluster configurations