Protobuf & RPC Contracts¶
Overview¶
Floecat's public surface is entirely gRPC. The core/proto/ module defines canonical protobuf
structures for resource identifiers, catalog services, query lifecycle metadata, connectors, statistics, and
helper schemas. Every other module depends on these contracts for serialization, validation, and
compatibility.
The contract files are organised by domain (common/, catalog/, query/, execution/,
connector/, account/, types/, statistics/). Generated Java stubs live under the ai.floedb.floecat.*.rpc
packages and are consumed by the Quarkus service, connectors, CLI, and reconciler.
Architecture & Responsibilities¶
common/common.proto– DefinesQueryInput,ResourceId,NameRef,SnapshotRef, pagination, rich error payloads,PrincipalContext, and idempotency/optimistic-concurrency helpers. Every other schema imports this file.catalog/*.proto– CRUD APIs for catalogs, namespaces, tables, views, snapshots, directory lookups, and table statistics. Each service exposes the same Create/List/Get/Update/Delete (CLGUD) lifecycle withPageRequest/PageResponsesupport; stats schemas also cover per-file statistics.connector/connector.proto– Connector management RPCs plus reconciliation job tracking and validation routines.query/lifecycle.proto– Query lifecycle (BeginQuery,RenewQuery,EndQuery,GetQuery) and the snapshot pin metadata sent down to the SQL planner.query/system_objects_registry.proto–GetSystemObjectsplus message definitions for builtin functions, operators, casts, collations, aggregates, and types loaded from static files.query/user_objects_bundle.proto–GetUserObjectsstreams resolved relation metadata for planner binding, including per-columnColumnResultoutcomes (READYwithColumnInfoorFAILEDwithColumnFailure).execution/scan.proto– Scan metadata (data/delete files + per-file stats) produced by connectors and consumed at execution time.types/types.proto– Logical type registry (Boolean/Decimal/etc.) and scalar encodings used by statistics and bundles.account/account.proto– Account CRUD service for multi-tenancy.
Public API / Surface Area¶
Core Services¶
| Service | Key RPCs | Inputs/Outputs |
|---|---|---|
CatalogService |
ListCatalogs, GetCatalog, CreateCatalog, UpdateCatalog, DeleteCatalog |
Accepts CatalogSpec, optional IdempotencyKey, Precondition, and FieldMask for partial updates. Returns Catalog + MutationMeta. |
NamespaceService |
ListNamespaces, GetNamespace, CreateNamespace, UpdateNamespace, DeleteNamespace |
Supports hierarchical selectors (path, recursive, children_only). |
TableService |
ListTables, GetTable, CreateTable, UpdateTable, DeleteTable |
TableSpec carries UpstreamRef with connector link, schema JSON, and partition info. |
ViewService |
Similar CRUD semantics, storing SQL definitions and metadata. | |
SnapshotService |
ListSnapshots, GetSnapshot, CreateSnapshot, DeleteSnapshot |
Pins upstream checkpoints and timestamps. |
TableStatisticsService |
GetTargetStats, ListTargetStats, client-streaming PutTargetStats |
Accepts per-snapshot target stats envelopes (table/column/expression/file). ListTargetStats supports target-kind filtering (currently at most one kind per request); streaming writes collapse multiple batches into a single call. |
TableConstraintsService |
GetTableConstraints, ListTableConstraints, PutTableConstraints, MergeTableConstraints, AppendTableConstraints, DeleteTableConstraints, AddTableConstraint, DeleteTableConstraint |
Snapshot-scoped constraints CRUD for user tables. PutTableConstraints is full-bundle upsert, MergeTableConstraints is server-side merge by constraint.name plus shallow merge of bundle properties (incoming keys win), AppendTableConstraints is server-side append-only (duplicate names rejected), and AddTableConstraint/DeleteTableConstraint are single-constraint partial mutations. All write operations require snapshot existence (NOT_FOUND when missing). |
DirectoryService |
Resolve* & Lookup* RPCs |
Translates between names and ResourceIds with pagination for batched lookups. |
AccountService |
Account CRUD. | |
Connectors |
Connector CRUD, ValidateConnector, StartCapture, GetReconcileJob. |
|
QueryService |
BeginQuery, RenewQuery, EndQuery, GetQuery, FetchScanBundle. |
|
PlannerStatsService |
GetTargetStats, GetTableConstraints |
Split planner-facing streams for target stats and table constraints; GetTargetStats(include_constraints=true) remains as a combined single-roundtrip convenience mode. |
UserObjectsService |
GetUserObjects |
Streams catalog metadata chunks (header → relations → end) as the service resolves each relation so planners can start binding earlier. |
| — Consumption pattern | Clients read UserObjectsBundleChunk in three phases: 1) header chunk (cheap metadata), 2) zero or more resolutions chunk batches where each RelationResolution carries input_index + FOUND/NOT_FOUND/ERROR, and 3) a single end chunk with summary counts. Use input_index to map back to planner TableReferenceCandidates and bind as soon as a FOUND arrives. For each RelationInfo, inspect columns[*].status: COLUMN_STATUS_OK exposes columns[*].column, while COLUMN_STATUS_FAILED exposes columns[*].failure with typed ColumnFailureCode plus details. Extension-defined failures must use COLUMN_FAILURE_CODE_ENGINE_EXTENSION and set extension_code_value; clients branch on extension_code_value inside the engine domain (for FloeDB, see FloeDecorationFailureCode in extensions/floedb/src/main/proto/engine_floe.proto). |
|
SystemObjectsService |
GetSystemObjects |
Returns the builtin catalog filtered by the x-engine-kind / x-engine-version headers supplied with the request. |
Each RPC requires a populated account_id within the ResourceIds; the Quarkus service checks this
before hitting repository storage.
Planner Lifecycle & Execution Scan Schemas¶
query/lifecycle.proto captures everything the planner needs to hold a lease:
- QueryDescriptor mirrors the live query context (IDs, expiry timestamps, snapshot pins, expansion
maps, and table obligations). Per-table scan manifests are retrieved lazily via
QueryService.FetchScanBundle, which returns the execution/scan.proto records for a specific
table.
execution/scan.proto describes the scan inputs that executors consume:
- ScanFile entries include the file path, size, record count, format, per-column stats, and whether
the file is data vs equality/position deletes.
- ScanFileContent enumerates the delete/data categories.
query/system_objects_registry.proto exposes immutable builtin metadata via SystemObjectsService.GetSystemObjects
so planners can hydrate functions/operators/types once per engine version. Clients send the
x-engine-kind and x-engine-version headers and always receive the filtered catalog for that
engine release.
Important Internal Details¶
- Field numbering – All proto files reserve low numbers for required identity fields and push
experimental metadata to
map<string,string> properties = 99. Adapt new fields by appending to the end to preserve wire compatibility. ResourceKindenforcement – Services verify that IDs have the expected kind (for example tables must beRK_TABLE). Clients should populate thekindenum to improve error messages.SnapshotRefsemantics –oneof which { snapshot_id | as_of | special }.specialcurrently allowsSS_CURRENT. Planner RPCs interpretas_oftimestamps when enumerating snapshots.- File-level stats –
FileTargetStatsanchors counts and sketches to a file path. File stats are written asTargetStatsRecordvalues withtarget.fileidentity viaPutTargetStats; the service enforces consistenttable_id/snapshot_idin a stream. - Stats vs constraints snapshot policy –
PutTargetStatscurrently accepts unknown snapshots (lenient ordering), whilePutTableConstraintsis strict and requires a materialized snapshot row before write. Rationale: stats keeps existing capture ordering compatibility, while constraints are modeled as snapshot-attached relational facts. - Planner split vs combined retrieval –
PlannerStatsService.GetTableConstraintsprovides a dedicated constraints-only stream, whileGetTargetStats(include_constraints=true)remains available as a combined convenience mode. Split mode is relation-scoped (table visibility pruning only) becauseFetchTableConstraintsRequestdoes not carry column projection context; combined mode can apply relation+column request-shape-aware pruning. For constraints lookups,provider_missingmeans no bundle exists, whileprovider_emptymeans a bundle exists and is explicitly empty. For planner client simplicity, both are currently surfaced asBUNDLE_RESULT_STATUS_NOT_FOUND(same aspruned_empty), withfailure.details.reasonpreserving the distinction. For CHECK masking to work correctly, connector constraint payloads should populateConstraintDefinition.columnswith referenced local column IDs. - FOREIGN KEY metadata model –
ConstraintDefinitionnow carries ANSI-style FK behavior fields (referenced_constraint_name,match_option,update_rule,delete_rule) soinformation_schema.referential_constraintscan be populated without connector-specific interpretation. Writers may omit these fields; scanners default unspecified rules toNONE/NO ACTION/NO ACTION. - Idempotency/Preconditions – Mutating RPCs accept
IdempotencyKeyorPrecondition(expected CAS version/ETag). Repository logic mirrors these fields, so clients should obey the same values when retrying. - Query Lifecycle –
QueryDescriptor.query_statusmoves throughSUBMITTED → COMPLETED/FAILEDdepending on connector planning success. Lease expirations are surfaced viaexpires_at. - AuthConfig – Connector auth now carries structured
credentials(for examplebearer,cli,client,token-exchange-*) plus free-form properties; the service resolves secrets and exchanges before connectors consume them.
Data Flow & Lifecycle¶
- Clients authenticate using the configured OIDC session/authorization headers (see
docs/service.md) and call gRPC endpoints. - Mutations include
IdempotencyKeyfor once-and-only-once semantics; the service persists a hash of the request along with the resultantMutationMetaso replays yield the previous payload. - Connectors written against the SPI return
ScanFileand stats payloads that exactly match the protos defined here; the reconciler pipes them back via the catalog/statistics services. -
Planners call
QueryService.BeginQueryto create query leases, optionally extend them viaRenewQuery, callFetchScanBundleper table when they need manifests, and close leases out viaEndQueryonce execution is complete. -
BeginQuery now allows clients to provide an optional
query_id(duplicates are rejected) and a list ofcommon.QueryInputrecords so the lifecycle service can pin snapshots and expansions at creation time for deterministic replay. Schema resolution and planning still occur in the downstream services.
State diagram for the query lease protocol:
[BeginQuery] --> (QueryContext: SUBMITTED)
| planning succeeds
v
(QueryContext: COMPLETED) --renew--> (extend expires_at)
| EndQuery(commit=true/false)
v
(ENDED_COMMIT or ENDED_ABORT) --grace--> [expiry]
Configuration & Extensibility¶
- Evolving protos – Prefer
optionalfields for new metadata. Keep enum values stable; add new entries to the end. Reserve field numbers explicitly if deprecating to avoid reuse. - Temporal precision –
types.LogicalType.temporal_precisionis optional. Absence means default microsecond precision, while an explicit0represents second precision. - Interval range –
types.LogicalType.interval_rangedistinguishesINTERVAL YEAR TO MONTHvsINTERVAL DAY TO SECOND. In the JVM model, absence is normalised toIR_UNSPECIFIED. Leading and fractional precisions live ininterval_leading_precisionandinterval_fractional_precision. - Custom properties – Many records expose
map<string,string> propertiesfor lightweight extensions. Document keys in the consuming module (for example connector-specific hints indocs/connectors-spi.md). - Query leases – Clients decide how aggressively to renew leases; planners should renew before
expires_atand callEndQueryeven on failure soQueryContextStorecan release pins eagerly.
Examples & Scenarios¶
Creating a table via gRPC¶
grpcurl -plaintext -d '{
"spec": {
"catalog_id": {"account_id":"T","id":"C","kind":"RK_CATALOG"},
"namespace_id": {"account_id":"T","id":"N","kind":"RK_NAMESPACE"},
"display_name": "events",
"schema_json": "{...Iceberg schema...}",
"upstream": {
"connector_id": {"account_id":"T","id":"conn","kind":"RK_CONNECTOR"},
"uri": "s3://warehouse",
"namespace_path": ["prod"],
"table_display_name": "events"
}
},
"idempotency": {"key": "create-events"}
}' localhost:9100 ai.floedb.floecat.catalog.TableService/CreateTable
Beginning a query lifecycle lease¶
grpcurl -plaintext -d '{
"inputs": [
{"name": {"catalog":"demo","path":["sales"],"name":"events"}}
]
}' localhost:9100 ai.floedb.floecat.query.QueryService/BeginQuery
Cross-References¶
- Service runtime, interceptors, and repository adapters:
docs/service.md - Connector SPI implementations consuming these protos:
docs/connectors-spi.md,docs/connectors-iceberg.md,docs/connectors-delta.md - Query lifecycle internals:
docs/service.md#query-lifecycle-service