Transactions (gRPC Core)¶
This document describes Floecat's core gRPC transaction mechanism used for Iceberg REST commits.
It is the backend used by both:
- POST /v1/{prefix}/transactions/commit (multi-table)
- POST /v1/{prefix}/namespaces/{namespace}/tables/{table} (single-table commit)
Overview¶
Transactions provide optimistic, pointer-CAS-based atomic commit across one or more tables:
BeginTransactioncreates a transaction inTS_OPEN.PrepareTransactionvalidates changes, writes intent records, then moves toTS_PREPARED.CommitTransactiontransitions throughTS_APPLYING, applies intents, then returns a terminal apply state.AbortTransactionmarksTS_ABORTEDand cleans up intents.
State Model¶
TS_OPEN: new transaction, no committed prepare yet.TS_PREPARED: intents are persisted and ready to apply.TS_APPLYING: apply phase started; non-abortable in-place.TS_ABORTED: transaction aborted explicitly or due to expiration handling.TS_APPLIED: all intent pointer updates applied.TS_APPLY_FAILED_RETRYABLE: apply failed with a retryable/storage conflict.TS_APPLY_FAILED_CONFLICT: apply failed with deterministic conflict semantics.
Request/Data Model¶
TxChange:
- Targets a table by table_id or table_fq.
- table_fq format is catalog.namespace1.namespace2.table (dot-separated, no escaping).
- Payload is one of: table, opaque payload, or intended_blob_uri.
- Optional pointer precondition.expected_version.
TransactionIntent:
- One intent per target pointer key.
- Stores target pointer key, blob URI, and optional expected version.
- On failed apply, stores diagnostic fields (apply_error_*).
Flow Details¶
BeginTransaction¶
- Authorize
table.write; require account ID. - Create transaction (
TS_OPEN) with TTL: - request TTL if provided
- otherwise default 600s
- Persist transaction and return it.
- If idempotency key is supplied, wrapped by
IdempotencyGuard.runOnce.
PrepareTransaction¶
- Load transaction by
tx_id. - Expired transaction:
- transition to
TS_ABORTED - return error (
transaction expired) - State handling:
TS_PREPARED=> validate incoming request shape against stored intents, then return existing transaction- non-
TS_OPEN=> error - Build intents from all changes:
- resolve table ID
- reject duplicate target pointer keys in one request
- read current pointer version
- validate
precondition.expected_version(if set) - materialize payload blob URI:
table: validate resource ID/account match, write content-addressed table blobpayload: write tx-scoped content-addressed blobintended_blob_uri: must be non-empty and within account prefix
- Before creating each intent, enforce target lock availability:
- if existing lock belongs to another tx and owner tx is stale/terminal/missing/expired, clean it up and continue
- if owner tx is still active (including
TS_APPLY_FAILED_RETRYABLE), fail with lock error - Create intents and transition transaction to
TS_PREPARED. - On failures after partial creation, delete created intent indices best-effort.
Note: prepared blob URIs are content-addressed; cleanup intentionally does not delete blobs on failure.
CommitTransaction¶
- Load transaction by
tx_id. - Expired transaction:
- only for non-
TS_APPLYINGstates: transition toTS_ABORTED, clean up intents, return error (transaction expired) TS_APPLYINGskips expiry teardown- State handling:
TS_APPLIED=> return as-isTS_APPLY_FAILED_CONFLICT=> return as-is- allowed apply states:
TS_PREPARED,TS_APPLYING,TS_APPLY_FAILED_RETRYABLE - other states => error
- Load intents by tx; empty intent set is an error; apply order is deterministic (sorted by target key).
- If not already
TS_APPLYING, transition fromTS_PREPARED/TS_APPLY_FAILED_RETRYABLEtoTS_APPLYING. - Apply intents via pointer batch CAS:
- table intents include by-id pointer plus table name-pointer ownership updates
- max pointer CAS ops per apply is 100
- Commit result mapping:
- apply success =>
TS_APPLIED - conflict:
- if pointers already match intent blob URIs, finalize
TS_APPLIED(recovery path) - otherwise annotate intents and return
TS_APPLY_FAILED_CONFLICT
- if pointers already match intent blob URIs, finalize
- retryable apply failure => annotate intents,
TS_APPLY_FAILED_RETRYABLE - On successful apply, intent index cleanup is best-effort; warnings are logged if not fully removed.
AbortTransaction¶
- Load transaction by
tx_id. - State handling:
- already
TS_ABORTED=> return as-is TS_APPLYING=> error (transaction apply is in progress)- terminal non-abortable states (
TS_APPLIED,TS_APPLY_FAILED_CONFLICT) => error TS_APPLY_FAILED_RETRYABLE=> abort is allowed to release intent locks- Transition to
TS_ABORTED. - Delete intents for that tx (best-effort via index deletes).
Idempotency¶
Idempotency keys are supported on:
- BeginTransaction
- PrepareTransaction
- CommitTransaction
AbortTransaction has no idempotency key in the proto API.
REST /transactions/commit Bridge (Current Behavior)¶
The Iceberg REST gateway uses this gRPC flow as follows:
- Begins a transaction and stores
iceberg.commit.request-hashin transaction properties. - If
Idempotency-Keyis absent, begin idempotency falls back toreq:<catalog>:<request-hash>so retries can resume the same backend transaction without cross-catalog key collisions. - Reads current transaction state:
TS_APPLIED=> returns HTTP 204 immediately.TS_APPLY_FAILED_CONFLICT=> returns HTTP 409 immediately.- For open/retryable paths, plans table changes, prepares intents, applies pre-commit snapshot ops, then calls commit.
- During prepare, each table change now includes
precondition.expected_versionsourced from tableMutationMeta.pointer_versionfetched at planning time. - REST returns 204 only when backend is
TS_APPLIED; deterministic failures map to 409. Unknown commit state maps are: - 503 for unavailable/retryable unknown state
- 502 for upstream unknown response
- 504 for deadline exceeded
- 500 fallback for other unknowns Auth failures map to:
- 401 for unauthenticated
- 403 for permission denied
- There are no gateway-managed post-commit side effects in this path; once the backend apply succeeds, later reconciliation is handled independently by the service scheduler.
- Stage-create materialization (
stage-id) is not supported in multi-table/transactions/commit; staged create should use single-table commit flow. - Unknown requirement types and update actions are rejected with HTTP 400 before commit orchestration, including replay (
TS_APPLIED) paths. - Ambiguous commit outcomes use a short bounded confirmation poll before returning unknown-state:
floecat.gateway.commit.confirm.max-attempts(default6)floecat.gateway.commit.confirm.initial-sleep-ms(default25)floecat.gateway.commit.confirm.max-sleep-ms(default200)
REST Single-Table Commit Bridge (Current Behavior)¶
Single-table commit uses the same backend transaction flow as multi-table commit, with a single planned table change:
TableCommitServiceforwards the incoming table commit request toTransactionCommitService.TransactionCommitServiceruns begin/prepare/commit against the same gRPC transaction APIs described above, using onetable-changeentry.- Atomicity guarantees are therefore identical at the backend CAS/apply layer; only the REST request/response envelope differs (single-table commit returns table metadata in its response).
Failure and Concurrency Notes¶
- Prepare may write blobs before intent creation; blob cleanup is intentionally skipped because blobs are content-addressed and may be shared.
- Intent target locks are reclaimed from stale owners during prepare (ownership-checked deletes).
TS_APPLY_FAILED_RETRYABLEowners are reclaimable once expired; before expiry they are treated as active.TS_APPLYINGis treated as in-flight and non-abortable/non-expirable in-place.- Commit conflict/retryable diagnostics are persisted onto each intent (
apply_error_*) when possible. - Intent index cleanup is retried and verified but remains best-effort.
- GC skips TTL collection for
TS_APPLYINGtransactions and uses stale-owner semantics for dangling target-intent cleanup (TS_ABORTED,TS_APPLIED,TS_APPLY_FAILED_CONFLICT, and expiredTS_APPLY_FAILED_RETRYABLE).
Relevant Code¶
- gRPC service:
service/src/main/java/ai/floedb/floecat/service/transaction/impl/TransactionsServiceImpl.java - Intent applier:
service/src/main/java/ai/floedb/floecat/service/transaction/impl/TransactionIntentApplierSupport.java - Intent repo:
service/src/main/java/ai/floedb/floecat/service/repo/impl/TransactionIntentRepository.java - REST bridge:
protocol-gateway/iceberg-rest/src/main/java/ai/floedb/floecat/gateway/iceberg/rest/services/table/TransactionCommitService.java - Single-table entrypoint:
protocol-gateway/iceberg-rest/src/main/java/ai/floedb/floecat/gateway/iceberg/rest/services/table/TableCommitService.java