Iceberg REST Gateway (Floecat)¶
This module (protocol-gateway/iceberg-rest) implements the Apache Iceberg REST catalog contract on top of Floecat’s gRPC services. The gateway translates HTTP requests into the existing table, namespace, view, snapshot, planning, and connector APIs so Iceberg clients (Trino, DuckDB, Spark, custom services) can manage Floecat tables without native gRPC bindings.
Scope and Goals¶
- Protocol adapter: expose the official Iceberg REST catalog surface while reusing Floecat’s authentication, tenancy, logging, and connector infrastructure.
- Parity with Iceberg REST spec: support namespace CRUD, table CRUD/commit/register, view CRUD/commit, scan planning (
/plan+/tasks), table-level credentials, and transactional commit via Iceberg table-change payloads. - Single catalog focus: one Floecat catalog per REST prefix (multi-catalog ACID transactions remain out of scope).
- Reusability: keep Iceberg-specific plumbing isolated, allowing additional protocols to reuse the same staging, metadata, and connector services.
Out of scope:
- Serving manifests/files directly from the gateway.
- Async/streaming scan planning.
- Stats persistence (metrics payloads are validated and logged, not stored).
REST Surface vs Floecat gRPC¶
| Iceberg REST surface | Floecat service(s) | Notes |
|---|---|---|
/v1/config |
Gateway config only | Synthesizes prefixes, default properties, supported endpoints; warehouse is optional and defaults to the configured prefix. |
| Namespace CRUD | NamespaceLookupSupport, NamespaceMutationSupport, NamespacePropertyUpdateSupport |
Includes property mutation and existence checks (HEAD); create returns 200 with CreateNamespaceResponse. |
| Table CRUD, commit, register | TableService, SnapshotService, connector services |
Stage-create uses staged metadata; commit delegates to transactional commit orchestration. Register imports Iceberg metadata via TableMetadataImportService; overwrite-register reuses the existing table identity and snapshot lineage instead of forcing a fresh metadata snapshot during registration. |
| Table rename/move | TableService.UpdateTable |
Uses field masks for namespace + display name changes. |
/tables/{table}/plan, /tasks |
QueryService, PlanTaskManager |
Runs synchronous planning, persists result, and exposes per-task payloads; failures return error responses (not 200). |
/tables/{table}/credentials |
StorageCredentialAuthority + StorageAuthorities gRPC service |
Returns vended credentials for the resolved table storage prefix when access delegation requests them. |
/tables/{table}/metrics |
Logging only | Validates payloads and logs them; the Iceberg REST contract does not map them into Floecat stats APIs. |
/tables/rename, /transactions/commit |
Table services + transaction service | Validates requirement/update payloads and commits all table changes in one backend transaction (with idempotent replay). |
| View CRUD/commit/rename | ViewService + ViewMetadataService |
Maintains Iceberg view schemas, versions, and summaries. |
/oauth/tokens |
Disabled | Floecat uses existing auth headers; endpoint returns OAuth error unsupported_grant_type (400). |
/register-view |
ViewService + ViewMetadataService |
Registers an Iceberg view from a metadata location. |
| Snapshots/history endpoints | Not in OpenAPI | Snapshot metadata is surfaced via table load/commit responses; gRPC snapshot APIs are used internally for writes. |
| Schema fetch endpoints | Not in OpenAPI | Schema gRPC stubs exist but no REST endpoint calls them today. |
Module & Package Layout¶
protocol-gateway/iceberg-rest/
├── src/main/java/ai/floedb/floecat/gateway/iceberg/rest
│ ├── api/ # Request/response DTOs & serializers
│ ├── catalog/ # Catalog/resource resolver records and lookup helpers
│ ├── common/ # Cross-cutting helpers (filters, error mappers, metadata utils)
│ ├── resources/ # JAX-RS controllers (config, namespace, table, view, system)
│ └── services/ # Adapters and workflows (accounts, catalog, table, metadata, planning, staging, clients)
└── src/test/java/... # RestAssured tests, unit tests mirroring the same package structure
Key packages under services:
services.catalog– low-level helpers (table lifecycle, connector wiring, requirement enforcement, metadata sync).services.table– top-level table workflows plus subpackages forload,materialization,metadata, andtransaction.services.view/services.namespace– high-level view and namespace use cases.services.metadata– metadata import/materialization (Iceberg metadata files, snapshot transforms).services.planning– scan planning orchestration plusPlanTaskManager.services.staging– staged payload repository and TTL management.services.client– shared gRPC facade/wrappers used by the higher-level services.
Tests mirror this layout so package-private collaborators (e.g., staged table repositories, planners) are still accessible without widening visibility.
Runtime Architecture Overview¶
- Request entry: Quarkus REST controllers receive Iceberg REST requests.
AccountHeaderFilterenforces tenant/auth headers and optionally rewrites “prefix-less” paths to a configured default. - Context resolution:
ResourceResolverbuilds the catalog/resource references used throughout the gateway (CatalogRef,NamespaceRef,TableRef,ViewRef). - Service orchestration: Controllers delegate to
services.table/*,services.view/*,services.namespace/*, etc. Table orchestration is further split by concern intoload,materialization,metadata, andtransaction. - gRPC translation: service orchestration flows use
GrpcServiceFacadeso backend calls inherit Floecat’s auth context and telemetry consistently. - Response mapping:
TableResponseMapper,ViewResponseMapper,NamespaceResponseMapper, and metadata builders synthesize the Iceberg contract (schemas, specs, refs, history) from Floecat responses. They also inject config overrides (e.g.,write.metadata.path, storage credentials). - Connectors & credentials: registered Iceberg tables provision a Floecat-backed connector during commit. Storage authorities are the source of truth for client-safe storage config and temporary credential vending, and later reconciliation is owned by the service scheduler rather than immediate gateway follow-up.
- Plan/task caching:
PlanTaskManagerpersists planning results with TTL (default 10 minutes) and chunk size limits, exposing read-once task IDs for/tasks.
Authentication Notes¶
- The gateway uses
floecat.gateway.auth-mode=oidcto require a JWT and derives the account from the configured claim (floecat.gateway.account-claim, defaultaccount_id). - The JWT is read from
floecat.gateway.authHeader(defaultauthorization) and is validated by Quarkus OIDC. - For OIDC validation you must enable/configure Quarkus OIDC in the gateway:
quarkus.oidc.tenant-enabled=true- One of:
quarkus.oidc.auth-server-url=...orquarkus.oidc.public-key=... - Optional:
quarkus.oidc.token.audience=... - If you customize the auth header, continue sending
Bearer <jwt>as the value.
Stage-create & Commit Flow¶
Stage-create (POST /v1/{prefix}/namespaces/{namespace}/tables with stage-create=true)¶
- Validate schema, spec, write order, and properties. Normalize namespace/table identifiers.
- Compute metadata location, connector configuration, and storage config/credentials. Storage
authority resolution prefers the concrete table
location, then the current snapshotmetadata_location-derived table root, and only falls back toupstream.uriwhen that URI is a real storage URI such ass3://.... - Persist a
StagedTableEntrykeyed by account + catalog + namespace + table + stage-id (provided viaIceberg-Transaction-Id/Idempotency-Keyor generated).StagedTableServiceenforces TTL and idempotency. - Return
StageCreateResponse(stage-id, requirements, config overrides, storage credentials). No catalog mutation occurs yet.
Commit (POST /tables/{table})¶
- Resolve catalog/namespace/table context and reject unsupported commit modes (for example Delta read-only tables).
- Wrap the table commit payload into a single-entry
TransactionCommitRequestand delegate toTransactionCommitService. TransactionCommitServicebegins/loads a backend transaction, validates idempotency + request-hash replay semantics, validates requirements/updates, and plans table/snapshot pointer changes with optimistic preconditions.- Metadata materialization and backend snapshot
metadata_locationupdates are prepared before backend apply so pointer updates commit atomically with table state. - Backend
prepareTransaction+commitTransactionapply all prepared changes atomically; success returns HTTP 204 from the transactional layer. - The table endpoint then builds and returns
CommitTableResponseDto(HTTP 200) from committed state.
Idempotency behavior:
- Idempotency-Key is the request replay key used by commit orchestration.
- Same key + same payload replays the prior response.
- Same key + different payload returns 409 Conflict.
- IN_PROGRESS records are guarded with timeout; stale records can be retried.
- Prior 5xx failures are retryable; prior 4xx failures are terminal and replayed.
/transactions/commit¶
Receives Iceberg’s transaction payload (table-changes with identifier, requirements, updates).
The gateway validates each change, builds one backend transaction containing all table and snapshot
pointer mutations, and commits atomically. The endpoint returns:
204only when backend state isTS_APPLIED.409for deterministic conflicts/failed preconditions.5xxwhen commit state is unknown.
For ambiguous backend outcomes (for example retryable/aborted ambiguity), the gateway performs a short bounded confirmation poll before returning unknown-state errors. Poll behavior is configurable:
floecat.gateway.commit.confirm.max-attempts(default6)floecat.gateway.commit.confirm.initial-sleep-ms(default25)floecat.gateway.commit.confirm.max-sleep-ms(default200)
Scan Planning & Task Consumption¶
TablePlanServiceresolves the table ID, applies snapshot filters (start/end snapshot, stats fields, filter expressions), and issuesBeginQuery+FetchScanBundleagainst Floecat’sQueryService.- The resulting plan bundle is immediately completed (no async state today).
PlanTaskManagerregisters the descriptor (plan-id, namespace, table, credentials, delete files) and chunks file scan tasks into deterministic task IDs ({planId}-task-{n}) based on configured chunk size. /tables/{table}/planreturns the plan descriptor, aggregated file scan tasks, delete files, storage credentials, and the list ofplanTasks./tables/{table}/tasksaccepts aplanTaskID and consumes it exactly once, returning only the payload for that task. Invalid/consumed IDs return Iceberg-style 404 responses./tables/{table}/plan/{planId}GET/DELETE expose cached descriptors and allow clients to cancel a plan (which also cancels the underlying query if still open).
Limits/Follow-ups:
- Plans are returned as "completed" today; failures return Iceberg error responses instead of status=failed.
- TTL (default 10 minutes) and chunk size (default 128 files per task) are configurable via application.properties.
View Semantics¶
ViewMetadataServicebuilds Iceberg-compatible view metadata blobs (schemas, versions, version logs, representations) and stores them along with user properties.- REST view requests/responses mirror the OpenAPI contract (SQL text, schema JSON, properties, requirements/updates).
- View rename/move paths map to
ViewService.UpdateViewby updatingnamespace_id+display_name. POST /register-viewpopulates the same canonical backend fields as normal create, including SQL definitions, creation search path, and output columns.- View commit updates keep the backend’s canonical SQL/search-path/output-column fields in sync with the Iceberg metadata payload when that information is present.
- Floecat’s backend view record is current-state only. Iceberg-style version history is stored in the metadata JSON property for REST compatibility, not as first-class backend view versions.
Delta Compatibility Layer¶
The gateway supports loading Floecat Delta tables through the Iceberg REST surface so engines
like DuckDB can query examples.delta.<table> via the same REST attach used for native Iceberg
tables.
How it works¶
- On Delta table load, the gateway translates Delta table/snapshot/schema state into Iceberg
metadata JSON (including
snapshot-log,refs, and Iceberg-compatible primitive type names). - For each returned Delta snapshot that lacks a manifest list, the gateway materializes Iceberg compat artifacts:
- data manifest:
<table-root>/metadata/<snapshot-id>-compat-m0.avro - delete manifest (when Delta delete vectors exist):
<table-root>/metadata/<snapshot-id>-compat-d0.avro - position-delete files (generated from Delta DV bitmaps):
<table-root>/metadata/<snapshot-id>-compat-pd-*.avro - manifest list:
<table-root>/metadata/snap-<snapshot-id>-compat.avro - On each Delta load/query, compat artifacts are resolved by deterministic snapshot path:
- existing
snap-<snapshot-id>-compat.avro: reuse - missing
snap-<snapshot-id>-compat.avro: regenerate manifest + manifest-list from the original Delta snapshot state at read time
This gives "refresh-on-read" behavior without requiring clients to know anything about Delta, and no marker file/state is required.
Load responses follow Iceberg REST snapshots semantics:
- snapshots=all returns all valid snapshots
- snapshots=refs returns only snapshots currently referenced by branches/tags (empty if no refs)
ETags for load responses are representation-aware and vary by snapshots mode.
Configuration¶
floecat.gateway.delta-compat.enabled=trueenables Delta compatibility translation/materialization.floecat.gateway.delta-compat.read-only=truekeeps behavior read-only from the compatibility path.
Storage behavior¶
- Compat files are written to object storage under the Delta table’s own
metadata/prefix (same bucket/prefix family as the source Delta table), not served from in-memory-only state.
Current limitations¶
- Supported delete behavior:
- Delta
removeactions that fully remove parquet files are reflected correctly (removed files are absent from generated Iceberg data manifests). - Copy-on-write deletes (remove old parquet file, add rewritten parquet file) are reflected correctly from the active Delta snapshot file set.
- Only on-disk Delta deletion vectors are projected today; inline deletion vectors are currently skipped.
- Equality-delete projection is not implemented; compatibility materialization emits Iceberg position deletes.
Commit Guarantees (Current)¶
- Single-table core state: synchronous and strongly consistent within the request. Table and snapshot metadata needed for the next client commit or read is advanced in the core path.
- Post-core side effects: none in the gateway commit path. Reconciliation is scheduled separately.
- Multi-table
/transactions/commit: atomic backend transaction across all table changes in one request.
Testing¶
- REST contract tests:
*ResourceTest(RestAssured) validates namespace/table/view endpoints against mocked services. - Integration tests:
IcebergRestFixtureITboots real services (viaRealServiceTestResource) and exercises stage-create, commit, plan, and view flows end-to-end. - Unit tests: live under
src/test/java/.../services/*mirroring the main packages so service collaborators (planners, staged repositories, metadata builders) can be verified with Mockito. - Compose smoke:
make compose-smokeruns DuckDB table federation checks plus Trino table and view lifecycle checks in LocalStack mode. To exercise split reconciler deployment, runCOMPOSE_SMOKE_MODES=localstack-remote make compose-smoke, which starts the service inreconciler-controlmode and enables the remoteexecutorprofile. The Trino block creates and replaces an Iceberg view via SQL and verifies both definitions are queryable. The samelocalstack-remotesmoke run also exercises upstream Polaris Iceberg REST credential vending by verifying that PolarisloadTablereturns non-emptystorage-credentialswhen sentX-Iceberg-Access-Delegation: vended-credentials, then importing the upstream table through that vended-credentials path in the same stack. For the OIDC split deployment, runCOMPOSE_SMOKE_MODES=localstack-oidc-remote make compose-smoke, which adds Keycloak and exercises remote executor worker machine auth as well as user-propagated OIDC auth.
Operational Notes & Current Limitations¶
- Register IO scope:
POST /v1/{prefix}/namespaces/{namespace}/registertreats FileIO properties as request-scoped connector config. Use the register payloadpropertiesforio-impl,s3.endpoint,s3.region,s3.access-key-id,s3.secret-access-key,s3.path-style-access, etc. when non-default storage wiring is needed (for example LocalStack). Request-supplied FileIO properties are merged over any non-secret gateway defaults such as endpoint/path-style settings. These are request-scoped import credentials for register-time server-side reads. They are not persisted as storage-authority credentials and are not vended back to REST clients. - Register overwrite semantics:
POST /v1/{prefix}/namespaces/{namespace}/registerwithoverwrite=trueis intended to preserve the existing Floecat table identity and imported snapshot lineage for an already-registered table. The overwrite path reuses the current table and snapshot IDs instead of treating register as a brand-new create flow. - Registered Iceberg connectors: tables registered or committed through the gateway are wired
back to Floecat as gateway-managed REST connectors tagged with
floecat.connector.mode=capture-only. The upstream source may still be modeled withiceberg.source=rest, but capture-only enforcement comes from the explicit mode property rather than the source type. Steady-state discovery comes from Floecat’s own REST catalog, while the table record’slocationplus backend snapshotmetadata_locationremain the source of truth for Iceberg clients. - Credentials:
/tables/{table}/credentials, table load responses, and plan responses return vended credentials only when access delegation explicitly requests them and a matching storage authority is configured for the resolved storage prefix. Floecat does not vend static long-lived credentials; storage authorities are expected to resolve or mint temporary credentials. - Client-safe config: table load responses may include non-secret storage settings such as endpoint, region, and path-style access. Those values come from storage-authority client-safe config rather than from the upstream connector definition.
- Credential source separation: connector secrets are never used for Iceberg REST client credential vending. Storage-authority secrets are the only credential source used for client vending.
- Metrics persistence:
/tables/{table}/metricsvalidates and logs payloads but does not persist them toTableStatisticsService. - Async planning: plans are synchronous and completed inline; streaming manifests and async planning (
/plans/{id}) are not supported. - Multi-table ACID scope: atomic within a single
/transactions/commitbackend transaction; request validation rejects duplicate table identifiers. - Reconciliation freshness: stats and other connector-derived state are eventually refreshed by scheduled reconciliation after the commit has already succeeded.
- Manifest/file serving: the gateway does not serve manifests or data files directly; clients access storage through the credentials/config returned in REST responses.
Client Quick Start¶
DuckDB¶
Client-managed credentials example (no delegation):
INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;
INSTALL iceberg;
LOAD iceberg;
CREATE OR REPLACE SECRET floe_secret (
TYPE s3,
KEY_ID '<access-key>',
SECRET '<secret-key>',
SCOPE 's3://<bucket>/',
REGION '<AWS region>'
);
ATTACH 'analytics' AS iceberg_floecat
(TYPE iceberg,
ENDPOINT 'http://localhost:9200/',
AUTHORIZATION_TYPE none,
ACCESS_DELEGATION_MODE 'none');
CREATE TABLE iceberg_floecat.core.quark_events (event_id INTEGER);
INSERT INTO iceberg_floecat.core.quark_events VALUES (1), (2), (3), (4);
SELECT * FROM iceberg_floecat.core.quark_events;
This example is intentionally non-vended: DuckDB is configured with its own S3 secret and asks the
gateway for ACCESS_DELEGATION_MODE 'none'.
Trino¶
Client-managed credentials example (no delegation):
etc/catalog/analytics_rest.properties:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.prefix=analytics
iceberg.rest-catalog.uri=http://host.docker.internal:9200
iceberg.rest-catalog.warehouse=s3://my-warehouse/
iceberg.rest-catalog.view-endpoints-enabled=false
fs.native-s3.enabled=true
s3.aws-access-key=<access-key>
s3.aws-secret-key=<secret-key>
s3.region=<AWS region>
# OIDC (Keycloak)
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.credential=trino-client:trino-secret
iceberg.rest-catalog.oauth2.server-uri=http://host.docker.internal:8080/realms/floecat/protocol/openid-connect/token
iceberg.rest-catalog.oauth2.scope=openid
Note: if Trino runs on the same Docker network as Keycloak (docker_floecat), you can use
http://keycloak:8080/realms/floecat/protocol/openid-connect/token instead. If it does not share
the network, use host.docker.internal.
Restart Trino and run:
CREATE TABLE analytics.sales.stream_orders (
order_id BIGINT,
region VARCHAR
) WITH (
format = 'PARQUET',
location = 's3://my-warehouse/analytics/sales/stream_orders/'
);
INSERT INTO analytics.sales.stream_orders VALUES (1, 'east'), (2, 'west');
SELECT * FROM analytics.sales.stream_orders;
This example is also intentionally non-vended: Trino is configured with static object-store credentials. Use this mode when the client is expected to bring its own storage credentials.
Preferred model: storage-authority-backed credential vending
- Configure upstream catalog access on the Floecat connector as needed for Polaris, Glue, Unity, Delta, or other source systems.
- Configure storage authorities for the table prefixes you want Floecat to vend.
- Use client access delegation only on engines that support consuming Iceberg REST
storage-credentials.
In that preferred model, Floecat vends temporary object-store credentials from storage authorities; the client does not need static S3 keys in its catalog config.
References¶
- Iceberg REST catalog spec
- Floecat module:
protocol-gateway/iceberg-rest - Configuration:
protocol-gateway/iceberg-rest/src/main/resources/application.properties