Skip to content

Stats Sync-First Resolution Policy

StatsOrchestrator.resolve() follows a sync-first policy: on a store miss it attempts a bounded synchronous capture before falling back to an async reconcile job. This lets queries receive fresh statistics within a latency budget rather than always degrading to no-stats-available on a cold cache.

Resolution flow

resolve(StatsCaptureRequest)
   │
   ├─ store read ──────────────────────────────────────── HIT  (stats returned, no capture)
   │
   └─ miss
       │
       ├─ executionMode == SYNC
       │  && latencyBudget present
       │  && floecat.stats.sync.enabled == true
       │      │
       │      ├─ enqueue CAPTURE_ONLY reconcile job
       │      │   poll every 100 ms until terminal or budget exceeded
       │      │      │
       │      │      ├─ job succeeded → re-read store
       │      │      │       ├─ record present ───────── CAPTURED (stats returned, no follow-up)
       │      │      │       └─ record absent ─────────  PARTIAL  (async follow-up enqueued)
       │      │      │
       │      │      ├─ budget exceeded ──────────────── TIMEOUT  (async follow-up enqueued)
       │      │      └─ job failed / no connector ────── FAILED   (async follow-up enqueued)
       │
       └─ otherwise (ASYNC mode / no budget / sync disabled)
               └────────────────────────────────────── SKIPPED  (async job enqueued)

Outcome reference

Outcome Stats present Async follow-up When
HIT yes no Record already in store
CAPTURED yes no Sync capture succeeded; record now in store
PARTIAL no yes Sync capture succeeded but record not yet visible
TIMEOUT no yes Sync capture exceeded latency budget
FAILED no yes Sync capture error or no upstream connector
SKIPPED no yes ASYNC mode, no budget, or sync disabled

Configuration

Key Default Description
floecat.stats.sync.enabled true Master switch for sync-first resolution
floecat.stats.sync.latency-budget 1s Per-request sync capture budget
floecat.stats.sync.max-latency-budget 10s Hard ceiling applied to any request budget

Environment variable overrides follow the pattern FLOECAT_STATS_SYNC_ENABLED, FLOECAT_STATS_SYNC_LATENCY_BUDGET, FLOECAT_STATS_SYNC_MAX_LATENCY_BUDGET.

Planner integration

StatsProviderFactory reads floecat.stats.sync.enabled and floecat.stats.sync.latency-budget at startup and sets executionMode(SYNC) + latencyBudget(...) on every StatsCaptureRequest it creates for query-time resolution. Set enabled=false to revert to async-only for all queries without changing other behavior.

PlannerStatsBundleService maps StatsResolutionResult outcomes to planner quality:

  • HIT / CAPTUREDFOUND
  • TIMEOUT / FAILED / SKIPPEDNOT_FOUND (planner proceeds without stats)

Metrics

Metric Type Tags Description
floecat.service.stats.sync_outcomes.total counter result, component, operation Count per sync outcome
floecat.service.stats.sync.latency timer (ms) result, component, operation End-to-end resolve latency

result values match StatsSyncOutcome names: HIT, CAPTURED, PARTIAL, TIMEOUT, FAILED, SKIPPED.

Troubleshooting

High TIMEOUT rate

The most common cause is the capture budget being shorter than actual connector round-trip time. Check floecat.service.stats.sync.latency (p95) against floecat.stats.sync.latency-budget. Increase the budget (up to max-latency-budget) or set floecat.stats.sync.enabled=false to stop blocking queries on sync attempts.

FAILED outcomes with no connector

Tables without an upstream connector cannot be captured synchronously. The orchestrator returns FAILED immediately and enqueues an async follow-up. This is expected for system tables and non-connected tables.

Disabling sync globally

Set floecat.stats.sync.enabled=false (or FLOECAT_STATS_SYNC_ENABLED=false). All queries fall back to SKIPPED + async enqueue, which matches the pre-sync-policy behavior.

Verifying counters

Query /q/metrics after triggering a stats miss. Look for lines containing floecat_service_stats_sync_outcomes_total.