Operations¶
Docker Compose Modes¶
Docker build, compose modes, and AWS credential options live in docs/docker.md.
Testing¶
# Unit + integration tests across modules (in-memory)
make test
# Run full test suite against LocalStack (fixtures + catalog storage)
make test-localstack
# Only run unit or integration suites
make unit-test
make integration-test
Builtin Catalog Validator¶
Validate bundled or bespoke builtin catalog protobufs without running the service.
mvn -pl tools/builtin-validator package
Engine mode – load via ServiceLoader (extension JAR must be on the classpath):
java -jar tools/builtin-validator/target/builtin-validator.jar \
--engine example
Directory mode – point at a catalog directory; the validator reads _index.txt automatically:
java -jar tools/builtin-validator/target/builtin-validator.jar \
/path/to/builtins/example
File mode – validate a single merged protobuf file (binary or text format):
java -jar tools/builtin-validator/target/builtin-validator.jar \
/path/to/catalog.pbtxt
Flags:
--engine <kind>– load the registeredEngineSystemCatalogExtensionfor the given engine kind viaServiceLoaderinstead of reading a file.--json– emit machine-readable output (for CI or scripting).--strict– fail the run when warnings are present (warnings are currently reserved for future checks).
Observability & Operations¶
Outbound token endpoint allowlist¶
Floecat validates outbound token endpoint hosts before performing client credentials or token exchange flows on behalf of connectors or internal workers. This is an SSRF guard on the shared auth resolution path, not a connector-specific feature.
The relevant settings are:
FLOECAT_SECURITY_ALLOWED_TOKEN_ENDPOINT_DOMAINS– comma-separated host/domain allowlist for outbound token endpoints. Exact hosts match exactly;*.example.commatches subdomains only.*allows any token endpoint host.FLOECAT_SECURITY_ALLOW_PRIVATE_TOKEN_ENDPOINTS_FOR_ALLOWED_HOSTS=true– permits allowlisted HTTPS hosts that resolve to private or loopback addresses.FLOECAT_SECURITY_ALLOW_LOOPBACK_TOKEN_ENDPOINTS=true– permits loopback-only HTTP token endpoints for local development.
Important behavior:
- These settings restrict token endpoint hosts, not upstream catalog hosts in general.
FLOECAT_SECURITY_ALLOWED_TOKEN_ENDPOINT_DOMAINS=*only bypasses the host allowlist check. It does not disable the private-address or loopback HTTP guards.- The same shared validation applies to Delta/Unity, Iceberg REST, and any other connector auth flow that performs service-side token acquisition.
Reconciler deployment modes¶
The reconciler runs in three shapes from the same artifact:
- All-in-one: default profile; public APIs, durable queue ownership, and local executor polling stay in one runtime.
- Control plane:
QUARKUS_PROFILE=reconciler-control; owns the queue, automatic enqueue, public reconcile APIs, and executor-control RPCs. - Executor plane:
QUARKUS_PROFILE=reconciler-executor; disables local queue ownership and automatic scheduling, then leases work remotely from the control plane over gRPC.
The durable queue is intentionally split into domains:
- canonical job-index state
- ready-queue state
- lease-coordination state
- canonical payload-artifact references on job rows
- projection/root-summary observability state
The control plane owns canonical job-state transitions and the derived job-index plus ready-queue mutations that move with them transactionally. Executors participate through the separate lease-coordination domain when they lease, renew, cancel, and complete work. Projection/root summary maintenance is best-effort observability only and does not participate in queue correctness.
Key reconciler mode flags live in service/src/main/resources/application.properties:
floecat.reconciler.worker.mode
reconciler.max-parallelism
floecat.reconciler.executor.remote-planner.enabled
floecat.reconciler.executor.remote-default.enabled
floecat.reconciler.executor.remote-snapshot-planner.enabled
floecat.reconciler.executor.remote-file-group.enabled
floecat.reconciler.executor.snapshot-finalize.enabled
floecat.reconciler.authorization.header
floecat.reconciler.oidc.issuer
floecat.reconciler.oidc.client-id
floecat.reconciler.oidc.client-secret
floecat.reconciler.oidc.token-refresh-skew-seconds
floecat.reconciler.oidc.connect-timeout
floecat.reconciler.auto.execution-class
floecat.reconciler.auto.execution-lane
Recommended split deployment:
- Control plane:
QUARKUS_PROFILE=reconciler-control - Executor plane:
QUARKUS_PROFILE=reconciler-executor - Shared settings: same blob/kv backend, same reconciler OIDC worker principal configuration, executor nodes pointed at the control-plane gRPC host/port
- Control-plane-specific setting:
reconciler.max-parallelism=0 - Executor-plane-specific setting:
floecat.reconciler.worker.mode=remote
Worker gRPC auth boundary:
- Remote reconcile workers authenticate to
ReconcileExecutorControlwith an explicit bearer token attached by the worker client itself. - In OIDC mode, that bearer token comes from the configured reconciler worker service principal.
- Internal user-context fanout still uses propagated request metadata where appropriate, but that is separate from reconcile worker auth and is not a fallback for worker control-plane RPCs.
In the split model, the control plane owns top-level PLAN_CONNECTOR jobs and public reconcile
APIs, while executor-plane nodes primarily run child PLAN_TABLE, PLAN_VIEW, PLAN_SNAPSHOT,
and EXEC_FILE_GROUP work. CaptureNow uses the same plan-plus-child execution path. File-group workers submit results through
SubmitLeasedFileGroupExecutionResult, which requires result_id so the control plane can
enforce replay safety across worker retries.
For floecat.kv=dynamodb, the durable reconcile hot paths now use native queue-oriented storage
layouts rather than broad generic prefix scans:
- job-index queries are partitioned by their query slice
- ready rows are stored in due-ordered ready slices
- lease rows and lease-expiry scans are stored in dedicated lease partitions
- projection state is separate from queue ownership and is not part of lease/read repair
If PLAN_CONNECTOR jobs can be enqueued, at least one enabled executor must support that job kind.
Planner jobs also need to remain leaseable under the configured execution lane semantics: planner
executors advertise wildcard lane support, while child planning and file-group execution jobs carry
the concrete lane policy that remote or local workers enforce later.
To scale executors horizontally, add more executor-plane instances. They greedily lease eligible jobs from the shared durable queue, so no leader election is required at the executor layer.
- Logging – JSON console logs plus rotating files under
log/. Audit logs route gRPC request summaries tolog/audit.json; seedocs/log.md. - Metrics – Micrometer/Prometheus exporters expose gRPC, storage, and GC metrics at the
/q/metricsendpoint (see the telemetry hub contract indocs/telemetry/contract.md). - Tracing – OpenTelemetry (TraceContext propagator) is always enabled for MDC correlation
(
traceId/spanId). The OTLP exporter is built-in; activate thetelemetry-otlpprofile to ship spans to a collector. Seetelemetry-demo.mdfor the full Prometheus + Tempo + Loki + Grafana demo stack.
Telemetry hub configuration¶
The service uses the telemetry hub core + Micrometer backend. The following flags are available in
service/src/main/resources/application.properties (the telemetry-otlp profile toggles OTLP tracing/log exports):
telemetry.strict=false
telemetry.exporters=prometheus
telemetry.contract.version=v1
%dev.telemetry.strict=true
%test.telemetry.strict=true
%telemetry-otlp.telemetry.exporters=prometheus,otlp
These settings keep production lenient (dropped-tag counters are exposed) while blowing up early in dev/test when metrics violate their contract. Run the docgen tool if you change any metrics to regenerate the catalog:
mvn -pl telemetry-hub/tool-docgen -am process-classes
Documented metrics are listed in docs/telemetry/contract.md and generated JSON docs/telemetry/contract.json.
Configuration flags are documented per module (for example storage backend selection in
docs/storage-spi.md, GC cadence in docs/service.md,
Secrets Manager in docs/secrets-manager.md).
Telemetry exporter matrix¶
The telemetry.exporters flag (defined in service/src/main/resources/application.properties) tells the hub which backends to activate. Supported values are:
| Exporter | Description | Activation | Notes |
|---|---|---|---|
prometheus |
Micrometer Prometheus registry that exposes floecat.core.* and floecat.service.* metrics via /q/metrics. |
Enabled by default and controlled on the Micrometer side via quarkus.micrometer.export.prometheus.enabled=true. |
This exporter simply scrapes the Micrometer registry that Observability feeds. Keep telemetry.exporters set to include prometheus in dev/test to keep dashboards working. |
otlp |
OpenTelemetry exporter that forwards traces (and hub metrics if wired) to an OTLP collector via gRPC. | Enabled only with the telemetry-otlp profile (e.g., QUARKUS_PROFILE=telemetry-otlp mvn -pl service -am quarkus:dev). |
Configure the OTLP endpoint with quarkus.otel.exporter.otlp.endpoint (runtime property). The exporter itself is cdi (Quarkus built-in, build-time); do not set quarkus.otel.traces.exporter=otlp. Run the full telemetry demo stack (examples/telemetry/docker-compose.yml) to bring up the collector, Tempo, Loki, and Grafana. |
Drop an exporter by removing it from telemetry.exporters (or setting the property to the empty string), e.g., %dev.telemetry.exporters=prometheus keeps strict mode local without OTLP traffic. The hub simply skips wiring backends it isn’t asked for, so only the listed exporters get the meters/traces.
Spans emitted by the service carry custom attributes floecat.component and floecat.operation (set by GrpcTelemetryServerInterceptor), matching the component/operation labels on Prometheus metrics. The Grafana dashboard uses this bridge to build TraceQL links: clicking a series on an RPC panel opens Tempo Explore with span.floecat.operation = "<operation>". Standard OTel rpc.* attributes are also present on every gRPC span for ad-hoc queries. Logs expose the same values through floecat_component and floecat_operation MDC keys, plus traceId and spanId for trace ↔ log correlation. The Tempo datasource is provisioned with tracesToLogsV2 so you can click through from a trace span to the correlated Loki logs.
Metrics¶
Micrometer + Prometheus export is enabled by default. The scrape endpoint is:
GET http://<host>:<http-port>/q/metrics
See docs/telemetry/overview.md for the naming, tagging, and contribution rules, and view the generated catalog (docs/telemetry/contract.md and docs/telemetry/contract.json) for the current set of metrics. Regenerate the catalog any time you add or modify a metric:
mvn -pl telemetry-hub/tool-docgen -am process-classes