تخطي إلى المحتوى الرئيسي

Configure Cloud Object Storage (GCS, S3, Azure) for Data Lake

Ilum allows you to link جي سي إس , S3 , WASBS و HDFS storages to your clusters. Linking storage allows Ilum to automatically configure all your jobs to use your cloud data lakes seamlessly, eliminating the need for manual Spark parameter configuration.

Supported Storage Providers

Providerنوع وصف
التخزين السحابي من Google جي سي إس Native integration for GCP projects.
أمازون S3 S3 Standard S3 and S3-compatible storage support.
تخزين Azure Blob WASBS/ABFSIntegration for Azure data lakes.
HDFS HDFS Connect to existing Hadoop Distributed File Systems.

التخزين السحابي من Google (GCS)

Step 1: Create a GCS Bucket

عرض:

الدليل في وضع ملء الشاشة

  1. Create a Google Cloud Project

    • Open Google Cloud Consoleوانتقل إلى محدد المشروع / Manage Resources.
    • نقر New Project/ Create Project.
    • Enter a Project name, choose Organizationو مكان .
  2. Create a GCS Bucket

    • In the Console, navigate to التخزين السحابي Buckets.
    • نقر خلق .
    • Enter a globally unique Bucket name (e.g., my-ilum-bucket) and select your Region.
    ملاحظه

    Remember the bucket name you created - you will need it when adding this storage to Ilum.

  3. Create a Service Account and JSON Key

    • الانتقال إلى IAM & AdminService Accounts.
    • نقر Create Service Account, fill in details, and grant Storage Admin roles.
    • Click the created email, go to the Keys tab, and Create new key (JSON).
    • Save the downloaded JSON file securely.
    important

    Organization Policy Update: In new organizations, creating service account keys might be disabled by default. Contact your administrator if you cannot create keys.

Step 2: Add GCS to Ilum Cluster

عرض:

الدليل في وضع ملء الشاشة

  1. Navigate to عبء العمل العناقيد حرر خزن Add Storage.

  2. Configure General Settings:

ParameterValue Exampleوصف
اسم my-gcs-storageUnique name for this storage config.
نوع جي سي إس Select GCS provider.
دلو شرارة my-ilum-bucketBucket for Spark logs/events.
دلو البيانات my-ilum-bucketBucket for your data.
  1. Configure GCS Authorization: Open your JSON key file and copy the values:
ParameterSource Keyوصف
Client Emailclient_email Service account email address.
Private Keyprivate_key Full key including -----BEGIN....
Private Key IDprivate_key_id Key ID string.
  1. نقر إرسال to save.

Step 3: Verify Connection

To ensure your storage is correctly configured, run a simple Spark job.

  1. Create a Code Service:

    • الانتقال إلى عبء العمل خدمات New Service +.
    • Select نوع : رمز , اللغة : سكالا , and your عنقود .
  2. Execute Test Code: Paste and run the following Scala code:

    Test Storage Connection
    // Write test data
    valبيانات = Seq( ( "Alice", 34) , ( "Bob", 45) )
    valمدافع = شراره . createDataFrame ( بيانات ) . toDF( "الاسم" , "age")

    // Replace with your bucket path (e.g., gs://..., s3a://..., wasbs://...)
    valمسار = "gs://my-ilum-bucket/output/"

    مدافع . يكتب . طريقة ( "الكتابة فوق" ) . format( "csv") . save( مسار )

    // Read back data
    شراره . قرأ . format( "csv") . load( مسار ) . عرض ( )
  3. Check Results: If the job completes and displays the data table, your storage connection is active.


Common Issues & FAQ

Why do I get a "Permission Denied" error?

سبب: The Service Account or User doesn't have permissions to access the bucket. حل:

  1. Go to your cloud provider's console (e.g., Google Cloud Console).
  2. Navigate to the bucket's اذونات التبويب.
  3. Grant your service account the Storage Adminأو Storage Object Admin role.

Why does it say "Bucket does not exist"?

سبب: The bucket name in your code doesn't match the actual bucket name, or the region is incorrect. حل:

  1. Verify the bucket exists in your cloud console.
  2. Check that the bucket name in your code matches exactly (names are often case-sensitive).

Why do I get "Invalid credentials"?

سبب: The keys (JSON or Access Keys) were not copied correctly. حل:

  1. Re-open your key file.
  2. Carefully copy the values again. For GCS, ensure you include the -----ابدأ المفتاح الخاص----- و ----- المفتاح الخاص النهائي----- lines.
  3. Re-save the storage configuration in Ilum.