Skip to content

Architecture

System Overview

Floecat tracks a resource graph anchored at Accounts and spanning Catalogs, Namespaces, Tables, Views, Snapshots, statistics, and query-lifecycle artifacts.

Account
 └── Catalog (logical catalog-of-catalogs)
      └── Namespace (hierarchical path)
           ├── Table (Iceberg/Delta metadata, upstream reference)
           │    └── Snapshot (immutable upstream checkpoints)
           └── View (stored definition)

Two storage primitives underpin every service:

  • BlobStore – immutable protobuf payloads such as catalog.pb, table.pb, snapshots/{snapshot_id}/snapshot/{sha}.pb, and stats blobs. Blobs are content-addressed via SHA256 ETAGs.
  • PointerStore – versioned key→blob indirection to support atomic compare-and-set (CAS), hierarchical listing, and name→ID lookups. Keys use deterministic prefixes such as /accounts/{account_id}/catalogs/by-name/{name} and /accounts/{account_id}/tables/{table_id}/snapshots/by-id/{snapshot_id}.

The gRPC service (Quarkus) enforces tenancy, authorization, and idempotency while orchestrating connectors that ingest upstream metadata, reconciling it into the canonical blob/pointer stores, and serving execution-ready scan bundles.

Components

The following modules compose the system (see linked docs for deep dives):

Component Responsibility
proto/ All protobuf/gRPC contracts (catalog, query lifecycle, execution scans, connectors, statistics, types).
service/ Quarkus runtime, resource repositories, query lifecycle service, GC, security, metrics.
client-cli/ Interactive shell for humans; exercises every public RPC.
core/connectors/spi/ Connector interfaces, stats engines, NDV helpers, auth shims.
connectors/catalogs/iceberg/ Iceberg REST + AWS Glue connector implementation.
connectors/catalogs/delta/ Unity Catalog/Delta Lake connector using Delta Kernel + Databricks APIs.
core/connectors/common/ Shared connector utilities (Parquet stats, NDV sketches, planners).
reconciler/ Connector scheduler/worker, reconciliation orchestration, job store.
core/storage-spi/ Blob/pointer persistence contracts shared by service and GC.
storage/memory/ In-memory dev/test stores (CAS semantics maintained).
storage/aws/ Production DynamoDB pointer store + S3 blob store.
types/ Logical type system utilities, coercions, min/max encoding.
extensions/builtin/ Plugin architecture for engine-specific builtin catalogs (functions, operators, types, etc.).
log/ Runtime log directory layout (service log + audit channel).

Data & Control Flow

  1. Connectors (Delta/Iceberg) enumerate upstream namespaces, tables, snapshots, and file-level stats via the shared SPI.
  2. The Reconciler schedules connector runs, materializes local Tables/Snapshots/Stats through repository APIs, and records incremental NDV, histograms, and scan manifests. Reconcile execution is mode-split:
  3. METADATA_ONLY for table/snapshot state
  4. STATS_ONLY for stats enrichment only (via stats control-plane / engine registry), with failure on fully non-captured batches and degraded success on partial capture
  5. METADATA_AND_STATS for metadata ingest plus queued STATS_ONLY follow-up capture using table-scoped stats requests
  6. The Service exposes CRUD RPCs for catalogs/namespaces/tables/views, plus query-lifecycle and statistics APIs. Requests traverse interceptors that inject PrincipalContext, correlation IDs, and optional query leases before hitting service implementations.
  7. Repositories translate RPCs into pointer/blob mutations, enforce optimistic concurrency, and update idempotency records.
  8. Query lifecycle RPCs hand planners lease descriptors (snapshot pins, obligations) plus any connector-provided scan metadata needed before execution.

Consistency Model (Current)

  • Core table state (single-table commit): synchronous request path updates table/snapshot state before returning success.
  • Post-core side effects (connector/snapshot sync actions): best-effort after commit apply and not part of atomic commit state.
  • Multi-table transaction endpoint: request-level atomic apply through backend transactions, with idempotent replay and optimistic preconditions.