تخطي إلى المحتوى الرئيسي

Cost metrics configuration

When observability is enabled, Ilum attributes the cost of every Spark run across the datasets the job actually touched. Each job detail view gains a Cost tab showing a per-table breakdown, and a top-level Cost route aggregates across jobs.

The breakdown is composed of cost dimensions. Some dimensions are built in and always tracked; others are counted request metrics that an operator defines, controlling which storage and RPC operations are counted toward a job's cost. This guide covers both and how to configure them.

Built-in cost dimensions

The following dimensions are always tracked for every traced job. They are derived from the per-stage metrics Spark already reports, so they require no configuration:

DimensionUnitWhat it measures
Executor run timeالثوانيTotal executor wall-clock time, the basis for the modeled compute cost.
Input bytesbytesBytes read by the job's stages.
Output bytesbytesBytes written by the job's stages.
Remote shuffle readbytesShuffle data fetched from other executors over the network.
Shuffle writebytesShuffle data written during redistribution.
Disk spillbytesData spilled to disk when a stage exceeded available memory.
GC timemillisecondsTime spent in JVM garbage collection.

Each dimension is apportioned across the datasets a stage read or wrote, so the Cost tab can answer "which table drove the I/O and compute on this run".

Counted request metrics

Beyond the built-in dimensions, an operator can define counted request metrics: metrics that count how many times the job invoked a particular set of storage or RPC operations. This is how object-storage request cost (which cloud providers bill per request, not only per byte) becomes visible per table.

A metric definition consists of:

  • Key — a stable identifier (for example s3.put_requests). The key becomes the column name in the Cost tab and the cost rollup.
  • Label و unit — display text shown in the UI (for example "S3 PUT requests", requests).
  • Match type — which span attribute the metric reads. By default this is the RPC method name (rpc.method), which carries the operation issued against object storage or a remote service.
  • Match values — the set of operations to count. Every operation in the run whose method is in this set adds one to the metric.

Ilum ships two such metrics by default, which an operator can keep, edit, or remove:

  • s3.put_requests — counts write operations: PutObject, CompleteMultipartUploadو UploadPart.
  • s3.get_requests — counts read operations: GetObject, ListObjectsV2و HeadObject.

These illustrate the typical "writes" and "reads" pattern: a metric that groups the methods a provider uses for a logical action so the request count surfaces as a single number per table. The same approach extends to other providers — for example a metric matching a GCS or Azure write method vocabulary — without any code change, since a metric definition is pure configuration.

ملاحظه

A metric with one or more match values is counted as a request counter (one increment per matching operation). A metric defined with no match values is instead treated as a byte gauge: the numeric value of the named attribute is summed rather than counted. The two built-in metrics above are request counters.

Configuring metrics in the UI

Counted request metrics are managed in the Cost Settings view, alongside the pricing rate card. To add or change a metric:

  1. Open the Cost Settings view.
  2. Add a metric definition, supplying a key, a label and unit for display, a match type (the span attribute to read, the RPC method by default), and the set of operations (match values) to count.
  3. Save. New and edited definitions apply to traced jobs going forward.

Changes made in the UI are persisted and take precedence over the seeded defaults — Ilum never overwrites them on a later upgrade.

Configuring metrics via Helm

To ship a different default set across a fresh deployment, define the metrics under the إيلوم كور chart's observabilityDefaults.metricDefinitions values. Each entry mirrors the fields above:

إيلوم كور:
observabilityDefaults:
metricDefinitions:
- مفتاح: "s3.put_requests"
label: "S3 PUT requests"
unit: "requests"
صفة: "rpc.method"
matchValues: ["PutObject", "CompleteMultipartUpload", "UploadPart"]
- مفتاح: "s3.get_requests"
label: "S3 GET requests"
unit: "requests"
صفة: "rpc.method"
matchValues: ["GetObject", "ListObjectsV2", "HeadObject"]
ملاحظه

The Helm values seed the metric set only on first boot, when no settings have been saved yet. Once metrics exist (seeded or edited in the UI), the saved set is authoritative and subsequent Helm upgrades do not clobber it. To change metrics on a running deployment, edit them in the Cost Settings view rather than in Helm values.

Where configured metrics appear

Once defined, each metric becomes a column in the cost breakdown:

  • في المربع Cost tab of every job detail view, each metric appears as a per-table value alongside the built-in dimensions, so a single table's executor time, bytes, and request counts sit on the same row.
  • في المربع cost rollup that backs the cross-job Cost route, the metric key becomes a stored column, so the same counts aggregate across jobs over time.

Removing a metric stops it being counted on future runs; historical rows already written to the rollup are unaffected.