Iceberg REST Gateway (Floecat)¶
This module (protocol-gateway/iceberg-rest) implements the Apache Iceberg REST catalog contract on top of Floecat’s gRPC services. The gateway translates HTTP requests into the existing table, namespace, view, snapshot, planning, and connector APIs so Iceberg clients (Trino, DuckDB, Spark, custom services) can manage Floecat tables without native gRPC bindings.
Scope and Goals¶
- Protocol adapter: expose the official Iceberg REST catalog surface while reusing Floecat’s authentication, tenancy, logging, and connector infrastructure.
- Parity with Iceberg REST spec: support namespace CRUD, table CRUD/commit/register, view CRUD/commit, scan planning (
/plan+/tasks), table-level credentials, and transactional commit via Iceberg table-change payloads. - Single catalog focus: one Floecat catalog per REST prefix (multi-catalog ACID transactions remain out of scope).
- Reusability: keep Iceberg-specific plumbing isolated, allowing additional protocols to reuse the same staging, metadata, and connector services.
Non-goals for the current release:
- Serving manifests/files directly from the gateway.
- Async/streaming scan planning.
- Stats persistence (metrics payloads are validated and logged, not stored).
REST Surface vs Floecat gRPC¶
| Iceberg REST surface | Floecat service(s) | Notes |
|---|---|---|
/v1/config |
Gateway config only | Synthesizes prefixes, default properties, supported endpoints; warehouse is optional and defaults to the configured prefix. |
| Namespace CRUD | NamespaceService |
Includes property mutation and existence checks (HEAD); create returns 200 with CreateNamespaceResponse. |
| Table CRUD, commit, register | TableService, SnapshotService, connector services |
Stage-create uses staged metadata; commit delegates to transactional commit orchestration. Register imports Iceberg metadata via TableMetadataImportService. |
| Table rename/move | TableService.UpdateTable |
Uses field masks for namespace + display name changes. |
/tables/{table}/plan, /tasks |
QueryService, PlanTaskManager |
Runs synchronous planning, persists result, and exposes per-task payloads; failures return error responses (not 200). |
/tables/{table}/credentials |
ConnectorClient + gateway defaults |
Returns vended credentials based on access delegation mode (defaults to gateway config). |
/tables/{table}/metrics |
Logging only | Validates payloads; wiring to TableStatisticsService is future work (spec has no stats surface). |
/tables/rename, /transactions/commit |
Table services + transaction service | Validates requirement/update payloads and commits all table changes in one backend transaction (with idempotent replay). |
| View CRUD/commit/rename | ViewService + ViewMetadataService |
Maintains Iceberg view schemas, versions, and summaries. |
/oauth/tokens |
Disabled | Floecat uses existing auth headers; endpoint returns OAuth error unsupported_grant_type (400). |
/register-view |
ViewService + ViewMetadataService |
Registers an Iceberg view from a metadata location. |
| Snapshots/history endpoints | Not in OpenAPI | Snapshot metadata is surfaced via table load/commit responses; gRPC snapshot APIs are used internally for writes. |
| Schema fetch endpoints | Not in OpenAPI | Schema gRPC stubs exist but no REST endpoint calls them today. |
Module & Package Layout¶
protocol-gateway/iceberg-rest/
├── src/main/java/ai/floedb/floecat/gateway/iceberg/rest
│ ├── api/ # Request/response DTOs & serializers
│ ├── common/ # Cross-cutting helpers (context factory, filters, error mappers, metadata utils)
│ ├── resources/ # JAX-RS controllers (config, namespace, table, view, system)
│ └── services/ # Adapters and workflows (accounts, catalog, table, metadata, planning, staging, clients)
└── src/test/java/... # RestAssured tests, unit tests mirroring the same package structure
Key packages under services:
services.catalog– low-level helpers (table lifecycle, connector wiring, requirement enforcement, metadata sync).services.table/services.view/services.namespace– high-level use cases (create, commit, register, delete, property updates, rename).services.metadata– metadata import/materialization (Iceberg metadata files, snapshot transforms).services.planning– scan planning orchestration plusPlanTaskManager.services.staging– staged payload repository and TTL management.services.client– typed gRPC clients (TableClient, NamespaceClient, ViewClient, SnapshotClient, QueryClient, ConnectorClient, etc.).
Tests mirror this layout so package-private collaborators (e.g., staged table repositories, planners) are still accessible without widening visibility.
Runtime Architecture Overview¶
- Request entry: Quarkus REST controllers receive Iceberg REST requests.
AccountHeaderFilterenforces tenant/auth headers and optionally rewrites “prefix-less” paths to a configured default. - Context resolution:
RequestContextFactoryresolves catalog prefixes, namespace paths, and table IDs by calling Floecat’sDirectoryServiceandTableLifecycleService. Context records travel with each request. - Service orchestration: Controllers delegate to
services.table/*,services.view/*,services.namespace/*, etc. These orchestrators build gRPC requests, enforce requirements, and interact with staging, metadata, connectors, or planning as needed. - gRPC translation: Typed clients (
TableClient,SnapshotClient,ViewClient, etc.) wrapGrpcWithHeadersso every call inherits Floecat’s auth context and telemetry. - Response mapping:
TableResponseMapper,ViewResponseMapper,NamespaceResponseMapper, and metadata builders synthesize the Iceberg contract (schemas, specs, refs, history) from Floecat responses. They also inject config overrides (e.g.,write.metadata.path, storage credentials). - Connectors & credentials: registered Iceberg tables provision a Floecat-backed connector during commit, and later reconciliation is owned by the service scheduler rather than immediate gateway follow-up.
- Plan/task caching:
PlanTaskManagerpersists planning results with TTL (default 10 minutes) and chunk size limits, exposing read-once task IDs for/tasks.
Authentication Notes¶
- The gateway uses
floecat.gateway.auth-mode=oidcto require a JWT and derives the account from the configured claim (floecat.gateway.account-claim, defaultaccount_id). - The JWT is read from
floecat.gateway.authHeader(defaultauthorization) and is validated by Quarkus OIDC. - For OIDC validation you must enable/configure Quarkus OIDC in the gateway:
quarkus.oidc.tenant-enabled=true- One of:
quarkus.oidc.auth-server-url=...orquarkus.oidc.public-key=... - Optional:
quarkus.oidc.token.audience=... - If you customize the auth header, continue sending
Bearer <jwt>as the value.
Stage-create & Commit Flow¶
Stage-create (POST /v1/{prefix}/namespaces/{namespace}/tables with stage-create=true)¶
- Validate schema, spec, write order, and properties. Normalize namespace/table identifiers.
- Compute metadata location, connector configuration, and default storage credentials.
- Persist a
StagedTableEntrykeyed by account + catalog + namespace + table + stage-id (provided viaIceberg-Transaction-Id/Idempotency-Keyor generated).StagedTableServiceenforces TTL and idempotency. - Return
StageCreateResponse(stage-id, requirements, config overrides, storage credentials). No catalog mutation occurs yet.
Commit (POST /tables/{table})¶
- Resolve catalog/namespace/table context and reject unsupported commit modes (for example Delta read-only tables).
- Wrap the table commit payload into a single-entry
TransactionCommitRequestand delegate toTransactionCommitService. TransactionCommitServicebegins/loads a backend transaction, validates idempotency + request-hash replay semantics, validates requirements/updates, and plans table/snapshot pointer changes with optimistic preconditions.- Metadata materialization and
metadata-locationupdate are prepared before backend apply so pointer updates commit atomically with table state. - Backend
prepareTransaction+commitTransactionapply all prepared changes atomically; success returns HTTP 204 from the transactional layer. - The table endpoint then builds and returns
CommitTableResponseDto(HTTP 200) from committed state.
Idempotency behavior:
- Idempotency-Key is the request replay key used by commit orchestration.
- Same key + same payload replays the prior response.
- Same key + different payload returns 409 Conflict.
- IN_PROGRESS records are guarded with timeout; stale records can be retried.
- Prior 5xx failures are retryable; prior 4xx failures are terminal and replayed.
/transactions/commit¶
Receives Iceberg’s transaction payload (table-changes with identifier, requirements, updates).
The gateway validates each change, builds one backend transaction containing all table and snapshot
pointer mutations, and commits atomically. The endpoint returns:
204only when backend state isTS_APPLIED.409for deterministic conflicts/failed preconditions.5xxwhen commit state is unknown.
For ambiguous backend outcomes (for example retryable/aborted ambiguity), the gateway performs a short bounded confirmation poll before returning unknown-state errors. Poll behavior is configurable:
floecat.gateway.commit.confirm.max-attempts(default6)floecat.gateway.commit.confirm.initial-sleep-ms(default25)floecat.gateway.commit.confirm.max-sleep-ms(default200)
Scan Planning & Task Consumption¶
TablePlanServiceresolves the table ID, applies snapshot filters (start/end snapshot, stats fields, filter expressions), and issuesBeginQuery+FetchScanBundleagainst Floecat’sQueryService.- The resulting plan bundle is immediately completed (no async state today).
PlanTaskManagerregisters the descriptor (plan-id, namespace, table, credentials, delete files) and chunks file scan tasks into deterministic task IDs ({planId}-task-{n}) based on configured chunk size. /tables/{table}/planreturns the plan descriptor, aggregated file scan tasks, delete files, storage credentials, and the list ofplanTasks./tables/{table}/tasksaccepts aplanTaskID and consumes it exactly once, returning only the payload for that task. Invalid/consumed IDs return Iceberg-style 404 responses./tables/{table}/plan/{planId}GET/DELETE expose cached descriptors and allow clients to cancel a plan (which also cancels the underlying query if still open).
Limits/Follow-ups:
- Plans are returned as "completed" today; failures return Iceberg error responses instead of status=failed.
- TTL (default 10 minutes) and chunk size (default 128 files per task) are configurable via application.properties.
View Semantics¶
ViewMetadataServicebuilds Iceberg-compatible view metadata blobs (schemas, versions, version logs, representations) and stores them along with user properties.- REST view requests/responses mirror the OpenAPI contract (SQL text, schema JSON, properties, requirements/updates).
- View rename/move paths map to
ViewService.UpdateViewby updatingnamespace_id+display_name. POST /register-viewpopulates the same canonical backend fields as normal create, including SQL definitions, creation search path, and output columns.- View commit updates keep the backend’s canonical SQL/search-path/output-column fields in sync with the Iceberg metadata payload when that information is present.
- Floecat’s backend view record remains current-state only. Iceberg-style version history is stored in the metadata JSON property for REST compatibility, not as first-class backend view versions.
Delta Compatibility Layer¶
The gateway now supports loading Floecat Delta tables through the Iceberg REST surface so engines
like DuckDB can query examples.delta.<table> via the same REST attach used for native Iceberg
tables.
How it works¶
- On Delta table load, the gateway translates Delta table/snapshot/schema state into Iceberg
metadata JSON (including
snapshot-log,refs, and Iceberg-compatible primitive type names). - For each returned Delta snapshot that lacks a manifest list, the gateway materializes Iceberg compat artifacts:
- data manifest:
<table-root>/metadata/<snapshot-id>-compat-m0.avro - delete manifest (when Delta delete vectors exist):
<table-root>/metadata/<snapshot-id>-compat-d0.avro - position-delete files (generated from Delta DV bitmaps):
<table-root>/metadata/<snapshot-id>-compat-pd-*.avro - manifest list:
<table-root>/metadata/snap-<snapshot-id>-compat.avro - On each Delta load/query, compat artifacts are resolved by deterministic snapshot path:
- existing
snap-<snapshot-id>-compat.avro: reuse - missing
snap-<snapshot-id>-compat.avro: regenerate manifest + manifest-list from the original Delta snapshot state at read time
This gives "refresh-on-read" behavior without requiring clients to know anything about Delta, and no marker file/state is required.
Load responses follow Iceberg REST snapshots semantics:
- snapshots=all returns all valid snapshots
- snapshots=refs returns only snapshots currently referenced by branches/tags (empty if no refs)
ETags for load responses are representation-aware and vary by snapshots mode.
Configuration¶
floecat.gateway.delta-compat.enabled=trueenables Delta compatibility translation/materialization.floecat.gateway.delta-compat.read-only=truekeeps behavior read-only from the compatibility path.
Storage behavior¶
- Compat files are written to object storage under the Delta table’s own
metadata/prefix (same bucket/prefix family as the source Delta table), not served from in-memory-only state.
Current limitations¶
- Supported delete behavior:
- Delta
removeactions that fully remove parquet files are reflected correctly (removed files are absent from generated Iceberg data manifests). - Copy-on-write deletes (remove old parquet file, add rewritten parquet file) are reflected correctly from the active Delta snapshot file set.
- Only on-disk Delta deletion vectors are projected today; inline deletion vectors are currently skipped.
- Equality-delete projection is not implemented; compatibility materialization emits Iceberg position deletes.
Commit Guarantees (Current)¶
- Single-table core state: synchronous and strongly consistent within the request. Table/snapshot metadata needed for the next client commit/read is advanced in the core path.
- Post-core side effects: none in the gateway commit path. Reconciliation happens later via the service scheduler.
- Multi-table
/transactions/commit: atomic backend transaction across all table changes in one request.
Testing¶
- REST contract tests:
*ResourceTest(RestAssured) validates namespace/table/view endpoints against mocked services. - Integration tests:
IcebergRestFixtureITboots real services (viaRealServiceTestResource) and exercises stage-create, commit, plan, and view flows end-to-end. - Unit tests: live under
src/test/java/.../services/*mirroring the main packages so service collaborators (planners, staged repositories, metadata builders) can be verified with Mockito. - Compose smoke:
make compose-smokeruns DuckDB table federation checks plus Trino table and view lifecycle checks in LocalStack mode. The Trino block creates and replaces an Iceberg view via SQL and verifies both definitions are queryable.
Operational Notes & Current Limitations¶
- Register IO scope:
POST /v1/{prefix}/namespaces/{namespace}/registernow treats FileIO properties as request-scoped connector config. Runtime/global storage wiring (floecat.storage.aws.*) is no longer required for register flows. Use the register payloadpropertiesforio-impl,s3.endpoint,s3.region,s3.access-key-id,s3.secret-access-key,s3.path-style-access, etc. when non-default storage wiring is needed (for example LocalStack). Request-supplied FileIO properties are merged over gateway defaults fromfloecat.gateway.storage-credential.properties.*. - Registered Iceberg connectors: tables registered or committed through the gateway are now
wired back to Floecat as ordinary
iceberg.source=restconnectors. Steady-state discovery comes from Floecat’s own REST catalog, while the table record’slocationandmetadata-locationremain the source of truth for Iceberg clients. - Credentials:
/tables/{table}/credentialsreturns vended credentials based on access delegation; per-request signing is not yet implemented. Auth resolution supportsaws.profileandaws.profile_pathwhen clients expect AWS SDK profile-based access. - Metrics persistence:
/tables/{table}/metricsvalidates and logs payloads but does not persist them toTableStatisticsService. - Async planning: plans are synchronous/completed only; streaming manifests and async planning (
/plans/{id}) are future work. - Multi-table ACID scope: atomic within a single
/transactions/commitbackend transaction; request validation rejects duplicate table identifiers. - Reconciliation freshness: stats and other connector-derived state are eventually refreshed by scheduled reconciliation after the commit has already succeeded.
- Manifest/file serving: the gateway does not serve manifests or data files directly; clients access storage through the credentials/config returned in REST responses.
Client Quick Start¶
DuckDB¶
INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;
INSTALL iceberg;
LOAD iceberg;
CREATE OR REPLACE SECRET floe_secret (
TYPE s3,
KEY_ID '<access-key>',
SECRET '<secret-key>',
SCOPE 's3://<bucket>/',
REGION '<AWS region>'
);
ATTACH 'analytics' AS iceberg_floecat
(TYPE iceberg,
ENDPOINT 'http://localhost:9200/',
AUTHORIZATION_TYPE none,
ACCESS_DELEGATION_MODE 'none');
CREATE TABLE iceberg_floecat.core.quark_events (event_id INTEGER);
INSERT INTO iceberg_floecat.core.quark_events VALUES (1), (2), (3), (4);
SELECT * FROM iceberg_floecat.core.quark_events;
Trino¶
etc/catalog/analytics_rest.properties:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.prefix=analytics
iceberg.rest-catalog.uri=http://host.docker.internal:9200
iceberg.rest-catalog.warehouse=s3://my-warehouse/
iceberg.rest-catalog.view-endpoints-enabled=false
fs.native-s3.enabled=true
s3.aws-access-key=<access-key>
s3.aws-secret-key=<secret-key>
s3.region=<AWS region>
# OIDC (Keycloak)
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.credential=trino-client:trino-secret
iceberg.rest-catalog.oauth2.server-uri=http://host.docker.internal:8080/realms/floecat/protocol/openid-connect/token
iceberg.rest-catalog.oauth2.scope=openid
Note: if Trino runs on the same Docker network as Keycloak (docker_floecat), you can use
http://keycloak:8080/realms/floecat/protocol/openid-connect/token instead. If it does not share
the network, use host.docker.internal.
Restart Trino and run:
CREATE TABLE analytics.sales.stream_orders (
order_id BIGINT,
region VARCHAR
) WITH (
format = 'PARQUET',
location = 's3://my-warehouse/analytics/sales/stream_orders/'
);
INSERT INTO analytics.sales.stream_orders VALUES (1, 'east'), (2, 'west');
SELECT * FROM analytics.sales.stream_orders;
Trino picks up the REST prefix/warehouse from the catalog properties, while the gateway injects consistent metadata paths and credentials.
References¶
- Iceberg REST catalog spec
- Floecat module:
protocol-gateway/iceberg-rest - Configuration:
protocol-gateway/iceberg-rest/src/main/resources/application.properties