Architecture¶
System Overview¶
Floecat tracks a resource graph anchored at Accounts and spanning Catalogs, Namespaces, Tables, Views, Snapshots, statistics, and query-lifecycle artifacts.
Account
└── Catalog (logical catalog-of-catalogs)
└── Namespace (hierarchical path)
├── Table (Iceberg/Delta metadata, upstream reference)
│ └── Snapshot (immutable upstream checkpoints)
└── View (stored definition)
Two storage primitives underpin every service:
- BlobStore – immutable protobuf payloads such as
catalog.pb,table.pb,snapshots/{snapshot_id}/snapshot/{sha}.pb, and stats blobs. Blobs are content-addressed via SHA256 ETAGs. - PointerStore – versioned key→blob indirection to support atomic compare-and-set (CAS),
hierarchical listing, and name→ID lookups. Keys use deterministic prefixes such as
/accounts/{account_id}/catalogs/by-name/{name}and/accounts/{account_id}/tables/{table_id}/snapshots/by-id/{snapshot_id}.
The gRPC service (Quarkus) enforces tenancy, authorization, and idempotency while orchestrating connectors that ingest upstream metadata, reconciling it into the canonical blob/pointer stores, and serving execution-ready scan bundles.
Components¶
The following modules compose the system (see linked docs for deep dives):
| Component | Responsibility |
|---|---|
proto/ |
All protobuf/gRPC contracts (catalog, query lifecycle, execution scans, connectors, statistics, types). |
service/ |
Quarkus runtime, resource repositories, query lifecycle service, GC, security, metrics. |
client-cli/ |
Interactive shell for humans; exercises every public RPC. |
core/connectors/spi/ |
Connector interfaces, stats engines, NDV helpers, auth shims. |
connectors/catalogs/iceberg/ |
Iceberg REST + AWS Glue connector implementation. |
connectors/catalogs/delta/ |
Unity Catalog/Delta Lake connector using Delta Kernel + Databricks APIs. |
core/connectors/common/ |
Shared connector utilities (Parquet stats, NDV sketches, planners). |
reconciler/ |
Connector scheduler/worker, reconciliation orchestration, job store. |
core/storage-spi/ |
Blob/pointer persistence contracts shared by service and GC. |
storage/memory/ |
In-memory dev/test stores (CAS semantics maintained). |
storage/aws/ |
Production DynamoDB pointer store + S3 blob store. |
types/ |
Logical type system utilities, coercions, min/max encoding. |
extensions/builtin/ |
Plugin architecture for engine-specific builtin catalogs (functions, operators, types, etc.). |
log/ |
Runtime log directory layout (service log + audit channel). |
Data & Control Flow¶
- Connectors (Delta/Iceberg) enumerate upstream namespaces, tables, snapshots, and file-level stats via the shared SPI.
- The Reconciler schedules connector runs, materializes local Tables/Snapshots/Stats through repository APIs, and records incremental NDV, histograms, and scan manifests. Reconcile execution is mode-split:
METADATA_ONLYfor table/snapshot stateSTATS_ONLYfor stats enrichment only (via stats control-plane / engine registry), with failure on fully non-captured batches and degraded success on partial captureMETADATA_AND_STATSfor metadata ingest plus queuedSTATS_ONLYfollow-up capture using table-scoped stats requests- The Service exposes CRUD RPCs for catalogs/namespaces/tables/views, plus query-lifecycle and
statistics APIs. Requests traverse interceptors that inject
PrincipalContext, correlation IDs, and optional query leases before hitting service implementations. - Repositories translate RPCs into pointer/blob mutations, enforce optimistic concurrency, and update idempotency records.
- Query lifecycle RPCs hand planners lease descriptors (snapshot pins, obligations) plus any connector-provided scan metadata needed before execution.
Consistency Model (Current)¶
- Core table state (single-table commit): synchronous request path updates table/snapshot state before returning success.
- Post-core side effects (connector/snapshot sync actions): best-effort after commit apply and not part of atomic commit state.
- Multi-table transaction endpoint: request-level atomic apply through backend transactions, with idempotent replay and optimistic preconditions.