System Objects¶

System objects are the catalog-level row sources that live inside the _system account: information_schema.*, pg_catalog.*, and any engine/plugin-provided tables or views. Unlike user tables, these objects are not backed by persisted blobs – they are synthesised from the builtin catalog metadata, cached graph snapshots, and bespoke scanners. The builtin catalog load/caching pipeline that feeds these snapshots is described in Builtin catalog architecture, and EngineSystemCatalogExtension is the single SPI that now ships both the builtin data and the corresponding system-object definitions.

Client contract for GetSystemObjects¶

GetSystemObjects never returns namespace/table/view metadata – it only streams the function/operator/type/cast/collation/aggregate definitions plus any registry hints. The _system overlay exposes the actual relations (namespaces, tables, views, SystemTableNodes, etc.) for scanning and planner resolution, so duplicate metadata would confuse downstream consumers and could expose synthetic tables twice. Use the _system overlay (SystemGraph/CatalogOverlay) whenever you need the actual relation graph; rely on GetSystemObjects solely for the recursive builtin registry data.

System resource identifiers¶

Every system node exposes a ResourceId whose id field looks like a UUID but is deterministically computed. The canonical identity string for the object (namespace-qualified name plus any versioned signatures) is hashed with SHA-256, truncated to 16 bytes, forced to carry the fixed two-byte marker 0xABCD, and then XORed with an engine-specific mask. The marker distinguishes system-generated IDs from arbitrary UUIDs, and swapping masks lets the resolver translate a per-engine ID back to the default floecat_internal catalog without any extra maps. Because the 0xABCD marker survives the XOR translation, the resolver only applies the fallback when it first sees a UUID stamped by this generator; random or user-generated UUIDs are left untouched. The generated bytes also obey the RFC-4122 variant 4 layout (version nibble 4, variant bits 10), so serializers and tooling can treat them like canonical UUIDs while the engine mask still exposes a deterministic default ID. Every engine contributes exactly one system catalog container (the catalog whose engineKind equals the engine name), so ResourceKind=RK_CATALOG always refers to the engine catalog container. Use SystemNodeRegistry.systemCatalogContainerId(engineKind) whenever you need that identifier during builtin graph construction or tests.

System catalog translation helpers¶

Metagraph and system-graph callers never rewrite _system IDs themselves. Instead we now have a dedicated SystemCatalogTranslator that knows how to: * normalize any UUID carrying the system marker so its account_id is _system before the graph cache looks it up, * map user-provided NameRefs into the current engine’s catalog context (rewriting the catalog to EngineContext.effectiveEngineKind() before a namespace or table lookup), * alias the matched system name back to the user catalog so responses still show examples.information_schema.pg_class.

These translator helpers keep ID/name wrangling centralized, which makes it obvious why the overlay can resolve _system objects even when the CLI is using a different account or catalog. Callers simply run user inputs through SystemCatalogTranslator before hitting SystemGraph/MetaGraph, and every system snapshot is still keyed by the _system account in the registry.

Naming requirements for builtin relations¶

Every builtin object (namespaces, tables, views, functions, operators, etc.) must be namespace-qualified. Definitions without a namespace (no dot in the canonical name) are rejected by catalog validation in normal provider flows, and the graph build still keeps a defensive skip fallback for malformed data that bypasses validation (for example, ad-hoc test catalogs). This keeps namespace buckets deterministic and avoids accidentally seeding the graph with tables that should live in the root namespace. When designing engine-specific catalog fragments, always include the namespace path (e.g., pg_catalog.pg_class, not just pg_class).

SystemNodeRegistry uses that canonical qualified name for identity (ResourceId) and namespace resolution, but display labels are intentionally independent. By default the registry materializes the leaf name (name.name) for function/operator/type/collation/aggregate node displayNames, while namespace/table/view display labels still come from the catalog definition. Providers may still override display labels explicitly when they need qualified names for their own UX.

Namespace-scoping validation defaults are defined in SystemCatalogValidator.NamespaceScopePolicy.defaultPolicy() and are intentionally explicit for OSS plugin authors: - scoped by default (must be namespace-qualified and reference a known namespace): function, type, table, view - not scoped by default (no namespace requirement unless a stricter policy is selected): operator, cast, collation, aggregate

Both ServiceLoaderSystemCatalogProvider and FloecatInternalProvider now fail fast on Severity.ERROR validation issues during catalog load, so extension authors should treat these defaults as part of the public contract.

Architecture overview¶

┌────────────────────────────────────────────┐
│ ServiceLoaderSystemCatalogProvider         │
│ - discovers EngineSystemCatalogExtension   │
│ - merges SystemObjectScannerProvider defs  │
└────────────────────────────────────────────┘
                    ↓
┌────────────────────────────────────────────┐
│ SystemCatalogData + SystemEngineCatalog     │
└────────────────────────────────────────────┘
                    ↓
┌────────────────────────────────────────────┐
│ SystemNodeRegistry → BuiltinNodes          │
│ - filters EngineSpecific rules             │
└────────────────────────────────────────────┘
                    ↓
┌────────────────────────────────────────────┐
│ SystemGraph (GraphSnapshot cache)           │
│ - reuses BuiltinNodes to build namespace    │
│   buckets & SystemTableNode instances       │
└────────────────────────────────────────────┘
                    ↓
┌────────────────────────────────────────────┐
│ MetaGraph / CatalogOverlay                  │
│ (implements SystemObjectGraphView)          │
│ - resolves `_system` nodes via SystemGraph  │
│ - delegates to MetadataGraph for user data  │
└────────────────────────────────────────────┘
                    ↓
┌────────────────────────────────────────────┐
│ SystemObjectScanContext (ctx)               │
│ - built with CatalogOverlay/SystemObjectGraph │
└────────────────────────────────────────────┘
                    ↓
┌────────────────────────────────────────────┐
│ SystemObjectScanner → SystemObjectRow       │
└────────────────────────────────────────────┘

The core pieces:

SystemObjectDef – Describes a system row source (NameRef, SchemaColumn[], scannerId, TableBackendKind, etc.). Implemented by SystemNamespaceDef, SystemTableDef, SystemViewDef.
SystemObjectScanner (core/catalog, ai.floedb.floecat.systemcatalog.spi.scanner) – Exposes schema() and a lazy scan(SystemObjectScanContext) that returns Stream<SystemObjectRow>. Scanners must be allocation-light, respect the Arrow schema, and keep rows as Object[].
SystemObjectScannerProvider – SPI for providers of definitions and scanners. definitions() lists every SystemObjectDef (no filtering). supportsEngine/supports(NameRef, engineKind) gate when a definition applies. provide(scannerId, engineKind, engineVersion) lets the runtime look up the scanner for a node’s scannerId. The SPI now exposes version-aware helpers (definitions(engineKind, engineVersion) and supports(NameRef, engineKind, engineVersion)) so providers can evolve schemas per engine version. During catalog assembly SystemNodeRegistry.mergeCatalogData seeds the map with the floecat_internal base from FloecatInternalProvider, overlays the plugin catalog, and finally applies provider definitions so every overlay can override earlier entries.
ServiceLoaderSystemCatalogProvider – Discovers EngineSystemCatalogExtensions (which already extend SystemObjectScannerProvider), loads the catalog for each normalized engine kind, fingerprints it, and hands it to SystemDefinitionRegistry. It exposes internalProvider() for the shared floecat_internal layer and providers() for the extension-only overlays; provider merges happen later in SystemNodeRegistry.
SystemNodeRegistry + SystemGraph – The registry filters the snapshot for the requested engine/version, materialises GraphNodes (functions, types, aggregates) and SystemTableNodes (with their scannerIds), and caches the result, seeding each merge with the shared information_schema definitions provided by FloecatInternalProvider. Overlays are only applied when EngineContext.enginePluginOverlaysEnabled() returns true. Unknown or missing headers fall back to the base view while still exposing information_schema. SystemGraph builds GraphSnapshots from those nodes and keeps them in an LRU LinkedHashMap keyed by (engineKind, engineVersion). Each snapshot stores namespace buckets, table relations, and a nodesById map for constant-time resolution.
System constraint catalog – Planner/system constraint lookups are backed by an immutable cache keyed by (engineKind, engineVersion, systemRelationId) and built from the pbtxt-backed builtin table definitions loaded by FloecatInternalProvider/SystemNodeRegistry (not from information_schema scans). Constraints are explicit metadata (SystemTable.constraints) and are validated during builtin catalog load. The runtime currently de-duplicates only CT_NOT_NULL between explicit and nullable-derived implicit entries (and keeps other constraint kinds as declared), so semantic de-duplication of PK/UNIQUE/FK/CHECK is intentionally out of scope here.
Validator note: FK referenced-column target validation resolves through referenced_table when provided.
Column reference note: use column_id only when the corresponding SystemColumn.id is defined. If a column has no id, use column_name (and constraint-local ordinal) and omit column_id.

pbtxt authoring example (common constraint types):

system_tables {
  name { path: "information_schema" name: "orders" }
  display_name: "orders"
  backend_kind: TABLE_BACKEND_KIND_FLOECAT
  floecat { scanner_id: "orders_scanner" }
  columns { name: "id" type { name: "BIGINT" } nullable: false ordinal: 1 id: 1 }
  columns { name: "customer_id" type { name: "BIGINT" } nullable: false ordinal: 2 id: 2 }
  columns { name: "order_no" type { name: "VARCHAR" } nullable: false ordinal: 3 id: 3 }
  columns { name: "total_amount" type { name: "DECIMAL" } nullable: false ordinal: 4 id: 4 }

  constraints {
    name: "pk_orders"
    type: CT_PRIMARY_KEY
    columns { column_id: 1 column_name: "id" ordinal: 1 }
  }
  constraints {
    name: "uq_orders_order_no"
    type: CT_UNIQUE
    columns { column_id: 3 column_name: "order_no" ordinal: 1 }
  }
  constraints {
    name: "ck_orders_total_non_negative"
    type: CT_CHECK
    check_expression: "total_amount >= 0"
    columns { column_id: 4 column_name: "total_amount" ordinal: 1 }
  }
  constraints {
    name: "fk_orders_customers"
    type: CT_FOREIGN_KEY
    columns { column_id: 2 column_name: "customer_id" ordinal: 1 }
    referenced_table { path: "information_schema" name: "customers" }
    referenced_columns { column_id: 1 column_name: "id" ordinal: 1 }
  }
  constraints {
    name: "nn_orders_total_amount"
    type: CT_NOT_NULL
    columns { column_id: 4 column_name: "total_amount" ordinal: 1 }
  }
}

Namespace resolution – Objects are mapped to namespaces by canonical name, trimming everything after the final dot. That means t resolves to namespace t rather than a default schema, so every system function/table/view must be defined with a fully qualified name (e.g., foo.bar). If you need to expose unqualified identifiers you must provide a dedicated namespace node whose canonical path matches the identifier you intend to publish; otherwise the node will be skipped during merging because findNamespaceId can’t map it to an existing namespace.
CatalogOverlay / MetaGraph – The overlay implements SystemObjectGraphView and exposes an immutable view over MetadataGraph plus the _system snapshot from SystemGraph. SystemObjectScanContext receives this view and uses it for every lookup, so scanners consistently benefit from the metadata graph’s caches.
Registry-level engine hints – SystemObjectsRegistry now carries a repeated EngineSpecific engine_specific section that plugins can use to ship shared dictionaries (pg_opfamily/opclass/amop-like payloads) or other planner-only metadata once per (engine_kind, engine_version). These payloads are filtered by EngineSpecificRule and travel alongside the node-level definitions, so the planner and scanners can deserialize them without adding new system tables (see SystemObjectsRegistry.engine_specific in core/proto/src/main/proto/floecat/query/system_objects_registry.proto).
Each payload_type identifies one dictionary; the validator enforces uniqueness on the (payload_type, engine_kind, min_version, max_version) tuple so you can still ship the same dictionary payload for disjoint version windows but duplicates with the same engine kind and window will be rejected. Treat schema changes as a new payload_type (or add a suffix) so the planner knows when you actually mean a new payload.

Built-in information_schema¶

InformationSchemaProvider is the default SystemObjectScannerProvider. It unconditionally registers:

The information_schema namespace (SystemNamespaceDef).
Tables information_schema.tables, information_schema.columns, information_schema.schemata, information_schema.table_constraints, information_schema.key_column_usage, information_schema.referential_constraints, information_schema.check_constraints, information_schema.constraint_column_usage, and information_schema.constraint_table_usage (SystemTableDefs) with scanner identifiers.

FloecatInternalProvider wraps InformationSchemaProvider and is the first layer merged into every catalog; because SystemNodeRegistry seeds every merge with the floecat_internal definitions, information_schema is always available even when headers, plugins, or overlays are missing. Plugin/provider overlays run afterward and can override the same canonical names deterministically.

Each table wires to a lightweight scanner (TablesScanner, ColumnsScanner, SchemataScanner, plus constraint scanners). These scanners rely on SystemObjectScanContext for cached lookups:

ctx.listNamespaces()/ctx.listTables() reuse the overlay, so the catalog’s graph snapshot is only enumerated once per scan.
columns resolves schemas via ctx.graph().tableSchema(table.id()), which delegates through the overlay/metadata graph caches so repeated scans avoid redundant schema materialization work.

Constraint view semantics are ANSI-oriented: - key_column_usage includes PK/UNIQUE/FK key columns (not NOT NULL). - constraint_column_usage includes columns referenced by each constraint definition: PK/UNIQUE/NOT NULL/CHECK use local referenced columns, while FK uses referenced-target columns (not FK local key columns). - For CHECK, rows are emitted only when ConstraintDefinition.columns is populated. The scanner does not parse check_expression to infer referenced columns. - referential_constraints maps FK metadata from ConstraintDefinition: unique_constraint_* uses FK target table/catalog/schema plus referenced_constraint_name when available, and match_option/update_rule/delete_rule default to NONE/NO ACTION/NO ACTION when source metadata omits them. - If referenced_constraint_name is omitted, the scanner infers it only when there is exactly one matching PK/UNIQUE on the referenced columns in the current constraint index scope; otherwise unique_constraint_* remains NULL.

Writing your own provider¶

Implement SystemObjectScannerProvider (or build this into your EngineSystemCatalogExtension). Provide every SystemObjectDef/SchemaColumn pair, return the matching scanner from provide(...), and let supports(...) gate whether your definition overrides a builtin.
Optional: implement EngineSystemCatalogExtension – it already extends SystemObjectScannerProvider, so you can ship both catalog definitions and system tables in one jar. SystemNodeRegistry now merges your definitions (via the new version-aware hooks) and provider overlays with the base floecat_internal data so the resulting _system graph reflects the requested version.
Emit rows with SystemObjectRow – SystemObjectRow is a cheap wrapper around Object[]. Use SystemObjectScanContext for every graph lookup (catalog, namespace, table, schema) so you benefit from CatalogOverlay’s caches.
Register via META-INF/services/ai.floedb.floecat.systemcatalog.provider.SystemObjectScannerProvider (and/or EngineSystemCatalogExtension). CDI exposes the provider list through ServiceLoaderSystemCatalogProvider.providers(), so downstream services can resolve scanners by ID.

Provider version contract¶

Providers must keep their definitions and scanners aligned with the version they advertise. When emitting versioned metadata, override definitions(engineKind, engineVersion) and supports(NameRef, engineKind, engineVersion) so SystemNodeRegistry.mergeCatalogData can select the right overlays; remember that this merge is later replayed for scanners via the same (engineKind, engineVersion) tuple, so the scannerId in each definition must resolve to a scanner whose provide(scannerId, engineKind, engineVersion) honours the same version constraint. If your provider doesn’t need per-version behavior, rely on the simpler definitions()/supports(...) helpers but document that the merged overlay is still cached per (engineKind, engineVersion), so changing the schema without bumping the version string may leave caches stale until invalidated.

Performance & scalability notes¶

SystemGraph caches snapshots in an access-ordered LinkedHashMap sized by floecat.system.graph.snapshot-cache-size (default 16). Each snapshot already buckets namespaces, relations, and node lookups, so repeated _system scans never rebuild the graph.
SystemObjectScanContext keeps the catalog/namespace ResourceIds, reuses CatalogOverlay enumeration methods, and caches listNamespaces, listTables, and columnTypes. Scanners should not duplicate this caching logic – the context forwards to the cached overlay that already talks to MetadataGraph. SystemNodeRegistry.resourceId now generates deterministic UUIDs for engine/kind/signature tuples, so avoid parsing ResourceId.id strings and build every system ResourceId via the helper.
CatalogOverlay/MetaGraph implement SystemObjectGraphView (ai.floedb.floecat.systemcatalog.spi.scanner) so that the core catalog module stays unaware of the full metadata graph. SystemGraph answers _system requests while MetadataGraph handles user objects, but both feed into the same overlay.
SystemTableNode.scannerId() carries the bridge between metadata and row generation. Row-oriented components can look up the correct SystemObjectScanner by calling provide(scannerId, engineKind, engineVersion) on the discovered providers list.
Keep scanners lazy and stateless. Every SystemObjectScanner should stream rows, avoid boxing, and match its SchemaColumn[] exactly. The shared SystemObjectScanContext is the only place scanners should touch metadata – everything else (name resolution, schema parsing, column typing) is already cached.