بطة دي بي
DuckDB is an embedded analytical database that runs in-process inside إيلوم كور . It provides zero-overhead, single-node SQL execution for small-to-medium data and ad-hoc exploration. Combined with the DuckLake catalog, DuckDB is a first-class option for fast local analytics over object storage.
DuckDB is ممكن افتراضيا in Ilum.
When to use DuckDB
DuckDB is the right engine for:
- Quick queries on small-to-medium datasets.
- Ad-hoc exploration where pod startup latency would be a bottleneck.
- Analytics over DuckLake-managed tables.
- Single-user, single-node workloads.
- Rapid prototyping before scaling out to Spark or Trino.
For distributed workloads on large data, prefer أباتشي سبارك . For interactive analytics on large data with concurrent users, prefer الثلاثي .
Execution model
DuckDB runs in-processمع إيلوم كور :
- No driver pod, no executor pods, no network round-trips for query execution.
- Single-node parallelism via DuckDB's vectorized execution engine.
- Direct reads from object storage (MinIO, S3, GCS, Azure Blob, HDFS) without copying data into a cluster.
This model delivers sub-second response times on small queries that would otherwise be dominated by Spark or Trino startup overhead.
DuckLake catalog
DuckLake is a DuckDB-native catalog enabled by default in Ilum. Tables created through DuckLake are stored on S3-compatible object storage and accessible through DuckDB SQL with no additional configuration.
DuckLake is the default catalog for new DuckDB workloads. Hive Metastore tables remain accessible to DuckDB through standard catalog connectors.
Supported table formats
DuckDB reads and writes:
- Parquet: Native, with predicate pushdown and zone maps.
- CSV , JSON: Direct read with schema inference.
- DuckLake-managed tables: ACID writes through DuckLake.
- بحيرة دلتا و مثلجة : Read access through DuckDB extensions.
تكوين
DuckDB and DuckLake are enabled out of the box. The relevant Helm values:
إيلوم كور :
SQL :
duckdb:
تمكين : صحيح
idleTimeout: 1h
ducklake:
تمكين : صحيح
DuckLake table data is stored in MinIO (or any configured S3-compatible backend) at a path configurable through ilum-core.sql.duckdb.ducklake.path.
Extension management
The Ilum image ships with DuckDB extensions pre-baked so that runtime sessions never reach the DuckDB extension registry. Two mechanisms coexist:
- Pre-populated extension cache at
~/.duckdb/extensions/inside theإيلوم كورcontainer. The standard extensionshttpfs,مثلجة,postgres_scannerوducklakeare placed here at image build time. DuckDB's autoload mechanism picks them up transparently the first time a session touches ans3://path, an Iceberg table, a PostgresATTACH, or a DuckLake catalog — noINSTALLأوLOADis required, and no outbound call is made. - Local extension repository at
/duckdbExtinside the container, holdinghive_metastoreوduck_lineage. These are loaded explicitly by Ilum when a Hive metastore or Marquez lineage backend is configured.
| Extension | Source | How it loads |
|---|---|---|
httpfs | Pre-populated cache (~/.duckdb) | Autoload on first s3:/// https:// access or INSTALL httpfs; LOAD httpfs;. |
مثلجة | Pre-populated cache (~/.duckdb) | Autoload on first iceberg_scan(...) call or INSTALL iceberg; LOAD iceberg;. |
postgres_scanner | Pre-populated cache (~/.duckdb) | Autoload on first Postgres ATTACH (including DuckLake's catalog) or INSTALL postgres_scanner; LOAD postgres_scanner;. |
ducklake | Pre-populated cache (~/.duckdb) | Autoload on ATTACH 'ducklake:...'أو INSTALL ducklake; LOAD ducklake;. |
hive_metastore | Local repository (/duckdbExt) | Explicit INSTALL hive_metastore FROM '/duckdbExt'; LOAD hive_metastore; when a Hive metastore is configured. |
duck_lineage | Local repository (/duckdbExt) | Explicit INSTALL duck_lineage FROM '/duckdbExt'; LOAD duck_lineage; when Marquez is configured. |
The bare DuckDB form INSTALL <extension_name>; LOAD <extension_name>; continues to work for all of these. For the cache-backed set DuckDB resolves locally; for hive_metastoreو duck_lineage DuckDB would otherwise reach community-extensions.duckdb.org, so the explicit FROM '/duckdbExt' form is used internally in air-gapped and MITM-restricted deployments.
Adding custom extensions
To bundle extensions beyond the default set — a custom community extension, a private build, or a community extension that Ilum does not pre-stage — use ilum-core.sql.duckdb.extraExtensions. Exactly one source must be configured: a PersistentVolumeClaim (recommended) or a node hostPath. Files must follow the DuckDB layout v<duckdb-version>/<platform>/<name>.duckdb_extension (for example v1.5.1/linux_amd64/myext.duckdb_extension).
The full schema of sql.duckdb.extraExtensions.* is documented on the ilum-core chart parameters page on ArtifactHub.
Source comparison
| Source | Holds the full v<ver>/<arch>/ tree? | Notes |
|---|---|---|
PersistentVolumeClaim | Yes — backed by a filesystem. | Recommended for production. Multi-arch / multi-version friendly. |
hostPath | Yes — backed by a node filesystem. | Single-node only; binds the deployment to a specific node. |
Example: PVC source
Create a small PVC in the same namespace as إيلوم كور :
apiVersion : الإصدار 1
نوع : PersistentVolumeClaim
البيانات الوصفية :
اسم : إيلوم - duckdb- extra- extensions
Namespace : إيلوم
المواصفات :
accessModes: [ القراءة الكتابة مرة واحدة ]
موارد :
requests:
storage: 200Mi
Populate it once via a temporary loader Pod that mounts the same claim. The Pod stays running long enough for the operator to copy files in, then is deleted:
apiVersion : الإصدار 1
نوع : Pod
البيانات الوصفية :
اسم : duckdb- ext- loader
Namespace : إيلوم
المواصفات :
restartPolicy: Never
حاويات :
- اسم : shell
صورة : busybox: 1.36
command: [ "sleep", "1800"]
volumeMounts :
- اسم : ext
mountPath : /data
وحدات التخزين :
- اسم : ext
persistentVolumeClaim:
claimName: إيلوم - duckdb- extra- extensions
Copy the extension into the PVC and clean up:
kubectl -n ilum exec duckdb-ext-loader -- mkdir -p /data/v1.5.1/linux_amd64
kubectl -n ilum cp ./myext.duckdb_extension duckdb-ext-loader:/data/v1.5.1/linux_amd64/myext.duckdb_extension
kubectl -n ilum delete pod duckdb-ext-loader
The loader Pod must be deleted before إيلوم كور rolls out the new revision that mounts the PVC. القراءة الكتابة مرة واحدة (the default access mode on most storage classes) allows only one Pod to attach the claim at a time.
Reference the PVC from helm_aio values:
إيلوم كور :
SQL :
duckdb:
extraExtensions:
تمكين : صحيح
mountPath : "/duckdbExt-extra"
existingClaim: "ilum-duckdb-extra-extensions"
After ترقية Helm , reference the extension from SQL:
INSTALL myext من '/duckdbExt-extra';
LOAD myext;
Example: hostPath source
For single-node clusters (development, edge deployments) where a PVC is overkill, hostPath mounts a directory directly from the Kubernetes node's filesystem into the إيلوم كور container.
Prepare the directory on the node where إيلوم كور will run, with the extensions laid out in DuckDB's expected v<duckdb-version>/<platform>/ structure:
# On the Kubernetes node (e.g. via SSH):
sudo mkdir -p /srv/duckdb-extra/v1.5.1/linux_amd64
sudo cp ./myext.duckdb_extension /srv/duckdb-extra/v1.5.1/linux_amd64/
# Make the files readable by the ilum-core container (UID 1001 by default):
sudo chmod -R a+rX /srv/duckdb-extra
Reference the host path from helm values:
إيلوم كور :
SQL :
duckdb:
extraExtensions:
تمكين : صحيح
mountPath : "/duckdbExt-extra"
hostPath: "/srv/duckdb-extra"
After ترقية Helm , the SQL form is identical to the PVC case:
INSTALL myext من '/duckdbExt-extra';
LOAD myext;
hostPath mounts pin the deployment to the node holding the files. If إيلوم كور is rescheduled to a different node, the mount will fail and the Pod will not start. Use a node selector or affinity rule to keep إيلوم كور on the prepared node, or migrate to a PVC for multi-node clusters.
This is the recommended mechanism in MITM-restricted or air-gapped environments where the DuckDB extension registry (extensions.duckdb.org) is unreachable. To make DuckDB's native HTTPS extensions (such as httpfsو aws) also trust an internal Certificate Authority in the same environment, follow the corporate MITM proxy walkthrough. For the broader deployment context, see the Air-gapped Installation Guide.
Selecting DuckDB in the SQL Editor
In the Ilum SQL Editor, the Engine Selector dropdown lets you choose DuckDB for any query. The engine status indicator confirms the in-process engine is ready.
When the automatic engine router is enabled, DuckDB is selected automatically for queries that target small datasets, DuckLake-managed tables, or ad-hoc exploration patterns.
Limitations
- DuckDB is single-node; it does not scale horizontally across executors.
- Query concurrency is bounded by the resources allocated to
إيلوم كور. - Long-running queries should use Spark or Trino instead, both for resource isolation and for failure recovery.