System Objects¶
System objects are the catalog-level row sources that live inside the _system account: information_schema.*, pg_catalog.*, and any engine/plugin-provided tables or views. Unlike user tables, these objects are not backed by persisted blobs – they are synthesised from the builtin catalog metadata, cached graph snapshots, and bespoke scanners. The builtin catalog load/caching pipeline that feeds these snapshots is described in Builtin catalog architecture, and EngineSystemCatalogExtension is the single SPI that now ships both the builtin data and the corresponding system-object definitions.
Client contract for GetSystemObjects¶
GetSystemObjects never returns namespace/table/view metadata – it only streams the function/operator/type/cast/collation/aggregate definitions plus any registry hints. The _system overlay exposes the actual relations (namespaces, tables, views, SystemTableNodes, etc.) for scanning and planner resolution, so duplicate metadata would confuse downstream consumers and could expose synthetic tables twice. Use the _system overlay (SystemGraph/CatalogOverlay) whenever you need the actual relation graph; rely on GetSystemObjects solely for the recursive builtin registry data.
System resource identifiers¶
Every system node exposes a ResourceId whose id field looks like a UUID but is deterministically computed. The canonical identity string for the object (namespace-qualified name plus any versioned signatures) is hashed with SHA-256, truncated to 16 bytes, forced to carry the fixed two-byte marker 0xABCD, and then XORed with an engine-specific mask. The marker distinguishes system-generated IDs from arbitrary UUIDs, and swapping masks lets the resolver translate a per-engine ID back to the default floecat_internal catalog without any extra maps. Because the 0xABCD marker survives the XOR translation, the resolver only applies the fallback when it first sees a UUID stamped by this generator; random or user-generated UUIDs are left untouched. The generated bytes also obey the RFC-4122 variant 4 layout (version nibble 4, variant bits 10), so serializers and tooling can treat them like canonical UUIDs while the engine mask still exposes a deterministic default ID. Every engine contributes exactly one system catalog container (the catalog whose engineKind equals the engine name), so ResourceKind=RK_CATALOG always refers to the engine catalog container. Use SystemNodeRegistry.systemCatalogContainerId(engineKind) whenever you need that identifier during builtin graph construction or tests.
System catalog translation helpers¶
Metagraph and system-graph callers never rewrite _system IDs themselves. Instead we now have a dedicated SystemCatalogTranslator that knows how to:
* normalize any UUID carrying the system marker so its account_id is _system before the graph cache looks it up,
* map user-provided NameRefs into the current engine’s catalog context (rewriting the catalog to EngineContext.effectiveEngineKind() before a namespace or table lookup),
* alias the matched system name back to the user catalog so responses still show examples.information_schema.pg_class.
These translator helpers keep ID/name wrangling centralized, which makes it obvious why the overlay can resolve _system objects even when the CLI is using a different account or catalog. Callers simply run user inputs through SystemCatalogTranslator before hitting SystemGraph/MetaGraph, and every system snapshot is still keyed by the _system account in the registry.
Naming requirements for builtin relations¶
Every builtin object (namespaces, tables, views, functions, operators, etc.) must be namespace-qualified. Definitions without a namespace (no dot in the canonical name) are rejected by catalog validation in normal provider flows, and the graph build still keeps a defensive skip fallback for malformed data that bypasses validation (for example, ad-hoc test catalogs). This keeps namespace buckets deterministic and avoids accidentally seeding the graph with tables that should live in the root namespace. When designing engine-specific catalog fragments, always include the namespace path (e.g., pg_catalog.pg_class, not just pg_class).
SystemNodeRegistry uses that canonical qualified name for identity (ResourceId) and namespace resolution, but display labels are intentionally independent. By default the registry materializes the leaf name (name.name) for function/operator/type/collation/aggregate node displayNames, while namespace/table/view display labels still come from the catalog definition. Providers may still override display labels explicitly when they need qualified names for their own UX.
Namespace-scoping validation defaults are defined in SystemCatalogValidator.NamespaceScopePolicy.defaultPolicy() and are intentionally explicit for OSS plugin authors:
- scoped by default (must be namespace-qualified and reference a known namespace): function, type, table, view
- not scoped by default (no namespace requirement unless a stricter policy is selected): operator, cast, collation, aggregate
Both ServiceLoaderSystemCatalogProvider and FloecatInternalProvider now fail fast on Severity.ERROR validation issues during catalog load, so extension authors should treat these defaults as part of the public contract.
Architecture overview¶
┌────────────────────────────────────────────┐
│ ServiceLoaderSystemCatalogProvider │
│ - discovers EngineSystemCatalogExtension │
│ - merges SystemObjectScannerProvider defs │
└────────────────────────────────────────────┘
↓
┌────────────────────────────────────────────┐
│ SystemCatalogData + SystemEngineCatalog │
└────────────────────────────────────────────┘
↓
┌────────────────────────────────────────────┐
│ SystemNodeRegistry → BuiltinNodes │
│ - filters EngineSpecific rules │
└────────────────────────────────────────────┘
↓
┌────────────────────────────────────────────┐
│ SystemGraph (GraphSnapshot cache) │
│ - reuses BuiltinNodes to build namespace │
│ buckets & SystemTableNode instances │
└────────────────────────────────────────────┘
↓
┌────────────────────────────────────────────┐
│ MetaGraph / CatalogOverlay │
│ (implements SystemObjectGraphView) │
│ - resolves `_system` nodes via SystemGraph │
│ - delegates to MetadataGraph for user data │
└────────────────────────────────────────────┘
↓
┌────────────────────────────────────────────┐
│ SystemObjectScanContext (ctx) │
│ - built with CatalogOverlay/SystemObjectGraph │
└────────────────────────────────────────────┘
↓
┌────────────────────────────────────────────┐
│ SystemObjectScanner → SystemObjectRow │
└────────────────────────────────────────────┘
The core pieces:
SystemObjectDef– Describes a system row source (NameRef,SchemaColumn[],scannerId,TableBackendKind, etc.). Implemented bySystemNamespaceDef,SystemTableDef,SystemViewDef.SystemObjectScanner(core/catalog,ai.floedb.floecat.systemcatalog.spi.scanner) – Exposesschema()and a lazyscan(SystemObjectScanContext)that returnsStream<SystemObjectRow>. Scanners must be allocation-light, respect the Arrow schema, and keep rows asObject[].SystemObjectScannerProvider– SPI for providers of definitions and scanners.definitions()lists everySystemObjectDef(no filtering).supportsEngine/supports(NameRef, engineKind)gate when a definition applies.provide(scannerId, engineKind, engineVersion)lets the runtime look up the scanner for a node’sscannerId. The SPI now exposes version-aware helpers (definitions(engineKind, engineVersion)andsupports(NameRef, engineKind, engineVersion)) so providers can evolve schemas per engine version. During catalog assemblySystemNodeRegistry.mergeCatalogDataseeds the map with thefloecat_internalbase fromFloecatInternalProvider, overlays the plugin catalog, and finally applies provider definitions so every overlay can override earlier entries.ServiceLoaderSystemCatalogProvider– DiscoversEngineSystemCatalogExtensions (which already extendSystemObjectScannerProvider), loads the catalog for each normalized engine kind, fingerprints it, and hands it toSystemDefinitionRegistry. It exposesinternalProvider()for the sharedfloecat_internallayer andproviders()for the extension-only overlays; provider merges happen later inSystemNodeRegistry.SystemNodeRegistry+SystemGraph– The registry filters the snapshot for the requested engine/version, materialisesGraphNodes (functions, types, aggregates) andSystemTableNodes (with theirscannerIds), and caches the result, seeding each merge with the sharedinformation_schemadefinitions provided byFloecatInternalProvider. Overlays are only applied whenEngineContext.enginePluginOverlaysEnabled()returns true. Unknown or missing headers fall back to the base view while still exposinginformation_schema.SystemGraphbuildsGraphSnapshots from those nodes and keeps them in an LRULinkedHashMapkeyed by(engineKind, engineVersion). Each snapshot stores namespace buckets, table relations, and anodesByIdmap for constant-time resolution.- System constraint catalog – Planner/system constraint lookups are backed by an immutable cache keyed by
(engineKind, engineVersion, systemRelationId)and built from the pbtxt-backed builtin table definitions loaded byFloecatInternalProvider/SystemNodeRegistry(not frominformation_schemascans). Constraints are explicit metadata (SystemTable.constraints) and are validated during builtin catalog load. The runtime currently de-duplicates onlyCT_NOT_NULLbetween explicit and nullable-derived implicit entries (and keeps other constraint kinds as declared), so semantic de-duplication of PK/UNIQUE/FK/CHECK is intentionally out of scope here. - Validator note: FK referenced-column target validation resolves through
referenced_tablewhen provided. - Column reference note: use
column_idonly when the correspondingSystemColumn.idis defined. If a column has noid, usecolumn_name(and constraint-localordinal) and omitcolumn_id. - pbtxt authoring example (common constraint types):
system_tables { name { path: "information_schema" name: "orders" } display_name: "orders" backend_kind: TABLE_BACKEND_KIND_FLOECAT floecat { scanner_id: "orders_scanner" } columns { name: "id" type { name: "BIGINT" } nullable: false ordinal: 1 id: 1 } columns { name: "customer_id" type { name: "BIGINT" } nullable: false ordinal: 2 id: 2 } columns { name: "order_no" type { name: "VARCHAR" } nullable: false ordinal: 3 id: 3 } columns { name: "total_amount" type { name: "DECIMAL" } nullable: false ordinal: 4 id: 4 } constraints { name: "pk_orders" type: CT_PRIMARY_KEY columns { column_id: 1 column_name: "id" ordinal: 1 } } constraints { name: "uq_orders_order_no" type: CT_UNIQUE columns { column_id: 3 column_name: "order_no" ordinal: 1 } } constraints { name: "ck_orders_total_non_negative" type: CT_CHECK check_expression: "total_amount >= 0" columns { column_id: 4 column_name: "total_amount" ordinal: 1 } } constraints { name: "fk_orders_customers" type: CT_FOREIGN_KEY columns { column_id: 2 column_name: "customer_id" ordinal: 1 } referenced_table { path: "information_schema" name: "customers" } referenced_columns { column_id: 1 column_name: "id" ordinal: 1 } } constraints { name: "nn_orders_total_amount" type: CT_NOT_NULL columns { column_id: 4 column_name: "total_amount" ordinal: 1 } } } - Namespace resolution – Objects are mapped to namespaces by canonical name, trimming everything after the final dot. That means
tresolves to namespacetrather than a default schema, so every system function/table/view must be defined with a fully qualified name (e.g.,foo.bar). If you need to expose unqualified identifiers you must provide a dedicated namespace node whose canonical path matches the identifier you intend to publish; otherwise the node will be skipped during merging becausefindNamespaceIdcan’t map it to an existing namespace. CatalogOverlay/MetaGraph– The overlay implementsSystemObjectGraphViewand exposes an immutable view overMetadataGraphplus the_systemsnapshot fromSystemGraph.SystemObjectScanContextreceives this view and uses it for every lookup, so scanners consistently benefit from the metadata graph’s caches.- Registry-level engine hints –
SystemObjectsRegistrynow carries arepeated EngineSpecific engine_specificsection that plugins can use to ship shared dictionaries (pg_opfamily/opclass/amop-like payloads) or other planner-only metadata once per(engine_kind, engine_version). These payloads are filtered byEngineSpecificRuleand travel alongside the node-level definitions, so the planner and scanners can deserialize them without adding new system tables (seeSystemObjectsRegistry.engine_specificincore/proto/src/main/proto/floecat/query/system_objects_registry.proto). - Each
payload_typeidentifies one dictionary; the validator enforces uniqueness on the(payload_type, engine_kind, min_version, max_version)tuple so you can still ship the same dictionary payload for disjoint version windows but duplicates with the same engine kind and window will be rejected. Treat schema changes as a newpayload_type(or add a suffix) so the planner knows when you actually mean a new payload.
Built-in information_schema¶
InformationSchemaProvider is the default SystemObjectScannerProvider. It unconditionally registers:
- The
information_schemanamespace (SystemNamespaceDef). - Tables
information_schema.tables,information_schema.columns,information_schema.schemata,information_schema.table_constraints,information_schema.key_column_usage,information_schema.referential_constraints,information_schema.check_constraints,information_schema.constraint_column_usage, andinformation_schema.constraint_table_usage(SystemTableDefs) with scanner identifiers.
FloecatInternalProvider wraps InformationSchemaProvider and is the first layer merged into every catalog; because SystemNodeRegistry seeds every merge with the floecat_internal definitions, information_schema is always available even when headers, plugins, or overlays are missing. Plugin/provider overlays run afterward and can override the same canonical names deterministically.
Each table wires to a lightweight scanner (TablesScanner, ColumnsScanner, SchemataScanner, plus constraint scanners). These scanners rely on SystemObjectScanContext for cached lookups:
ctx.listNamespaces()/ctx.listTables()reuse the overlay, so the catalog’s graph snapshot is only enumerated once per scan.columnsresolves schemas viactx.graph().tableSchema(table.id()), which delegates through the overlay/metadata graph caches so repeated scans avoid redundant schema materialization work.
Constraint view semantics are ANSI-oriented:
- key_column_usage includes PK/UNIQUE/FK key columns (not NOT NULL).
- constraint_column_usage includes columns referenced by each constraint definition:
PK/UNIQUE/NOT NULL/CHECK use local referenced columns, while FK uses referenced-target columns (not FK local key columns).
- For CHECK, rows are emitted only when ConstraintDefinition.columns is populated.
The scanner does not parse check_expression to infer referenced columns.
- referential_constraints maps FK metadata from ConstraintDefinition:
unique_constraint_* uses FK target table/catalog/schema plus referenced_constraint_name
when available, and match_option/update_rule/delete_rule default to NONE/NO ACTION/NO ACTION
when source metadata omits them.
- If referenced_constraint_name is omitted, the scanner infers it only when there is exactly one
matching PK/UNIQUE on the referenced columns in the current constraint index scope; otherwise
unique_constraint_* remains NULL.
Writing your own provider¶
- Implement
SystemObjectScannerProvider(or build this into yourEngineSystemCatalogExtension). Provide everySystemObjectDef/SchemaColumnpair, return the matching scanner fromprovide(...), and letsupports(...)gate whether your definition overrides a builtin. - Optional: implement
EngineSystemCatalogExtension– it already extendsSystemObjectScannerProvider, so you can ship both catalog definitions and system tables in one jar.SystemNodeRegistrynow merges your definitions (via the new version-aware hooks) and provider overlays with the basefloecat_internaldata so the resulting_systemgraph reflects the requested version. - Emit rows with
SystemObjectRow–SystemObjectRowis a cheap wrapper aroundObject[]. UseSystemObjectScanContextfor every graph lookup (catalog, namespace, table, schema) so you benefit fromCatalogOverlay’s caches. - Register via
META-INF/services/ai.floedb.floecat.systemcatalog.provider.SystemObjectScannerProvider(and/orEngineSystemCatalogExtension). CDI exposes the provider list throughServiceLoaderSystemCatalogProvider.providers(), so downstream services can resolve scanners by ID.
Provider version contract¶
Providers must keep their definitions and scanners aligned with the version they advertise. When emitting versioned metadata, override definitions(engineKind, engineVersion) and supports(NameRef, engineKind, engineVersion) so SystemNodeRegistry.mergeCatalogData can select the right overlays; remember that this merge is later replayed for scanners via the same (engineKind, engineVersion) tuple, so the scannerId in each definition must resolve to a scanner whose provide(scannerId, engineKind, engineVersion) honours the same version constraint. If your provider doesn’t need per-version behavior, rely on the simpler definitions()/supports(...) helpers but document that the merged overlay is still cached per (engineKind, engineVersion), so changing the schema without bumping the version string may leave caches stale until invalidated.
Performance & scalability notes¶
SystemGraphcaches snapshots in an access-orderedLinkedHashMapsized byfloecat.system.graph.snapshot-cache-size(default16). Each snapshot already buckets namespaces, relations, and node lookups, so repeated_systemscans never rebuild the graph.SystemObjectScanContextkeeps the catalog/namespaceResourceIds, reusesCatalogOverlayenumeration methods, and cacheslistNamespaces,listTables, andcolumnTypes. Scanners should not duplicate this caching logic – the context forwards to the cached overlay that already talks toMetadataGraph.SystemNodeRegistry.resourceIdnow generates deterministic UUIDs for engine/kind/signature tuples, so avoid parsingResourceId.idstrings and build every systemResourceIdvia the helper.CatalogOverlay/MetaGraphimplementSystemObjectGraphView(ai.floedb.floecat.systemcatalog.spi.scanner) so that the core catalog module stays unaware of the full metadata graph.SystemGraphanswers_systemrequests whileMetadataGraphhandles user objects, but both feed into the same overlay.SystemTableNode.scannerId()carries the bridge between metadata and row generation. Row-oriented components can look up the correctSystemObjectScannerby callingprovide(scannerId, engineKind, engineVersion)on the discovered providers list.- Keep scanners lazy and stateless. Every
SystemObjectScannershould stream rows, avoid boxing, and match itsSchemaColumn[]exactly. The sharedSystemObjectScanContextis the only place scanners should touch metadata – everything else (name resolution, schema parsing, column typing) is already cached.