From b72ce8ca426bf77670a242a4a8c8e2f53de230b9 Mon Sep 17 00:00:00 2001 From: Alexander Whitestone Date: Sat, 4 Apr 2026 17:31:46 -0400 Subject: [PATCH] docs: define operator command center requirements --- docs/operator-command-center-requirements.md | 251 +++++++++++++++++++ 1 file changed, 251 insertions(+) create mode 100644 docs/operator-command-center-requirements.md diff --git a/docs/operator-command-center-requirements.md b/docs/operator-command-center-requirements.md new file mode 100644 index 00000000..86af3888 --- /dev/null +++ b/docs/operator-command-center-requirements.md @@ -0,0 +1,251 @@ +# Sovereign Operator Command Center Requirements + +Status: requirements for #159 +Parent: #154 +Decision: v1 ownership stays in `timmy-config` + +## Goal + +Define the minimum viable operator command center for Timmy: a sovereign control surface that shows real system health, queue pressure, review load, and task state over a trusted network. + +This is an operator surface, not a public product surface, not a demo, and not a reboot of the archived dashboard lineage. + +## Non-goals + +- public internet exposure +- a marketing or presentation dashboard +- hidden queue mutation during polling or page refresh +- a second shadow task database that competes with Gitea or Hermes runtime truth +- personal-token fallback behavior hidden inside the UI or browser session +- developer-specific local absolute paths in requirements, config, or examples + +## Hard requirements + +### 1. Access model: local or Tailscale only + +The operator command center must be reachable only from: +- `localhost`, or +- a Tailscale-bound interface or Tailscale-gated tunnel + +It must not: +- bind a public-facing listener by default +- require public DNS or public ingress +- expose a login page to the open internet +- degrade from Tailscale identity to ad hoc password sharing + +If trusted-network conditions are missing or ambiguous, the surface must fail closed. + +### 2. Truth model: operator truth beats UI theater + +The command center exists to expose operator truth. That means every status tile, counter, and row must be backed by a named authoritative source and a freshness signal. + +Authoritative sources for v1 are: +- Gitea for issue, PR, review, assignee, and repo state +- Hermes cron state and Huey runtime state for scheduled work +- live runtime health checks, process state, and explicit agent heartbeat artifacts for agent liveness +- direct model or service health endpoints for local inference and operator-facing services + +Non-authoritative signals must never be treated as truth on their own. Examples: +- pane color +- old dashboard screenshots +- manually curated status notes +- stale cached summaries without source timestamps +- synthetic green badges produced when the underlying source is unavailable + +If a source is unavailable, the UI must say `unknown`, `stale`, or `degraded`. +It must never silently substitute optimism. + +### 3. Mutation model: read-first, explicit writes only + +The default operator surface is read-only. + +For MVP, the five required views below are read-only views. +They may link the operator to the underlying source-of-truth object, but they must not mutate state merely by rendering, refreshing, filtering, or opening detail drawers. + +If write actions are added later, they must live in a separate, explicit control surface with all of the following: +- an intentional operator action +- a confirmation step for destructive or queue-changing actions +- a single named source-of-truth target +- an audit trail tied to the action +- idempotent behavior where practical +- machine-scoped credentials, not a hidden fallback to a human personal token + +### 4. Repo boundary: visible world is not operator truth + +`the-nexus` is the visible world. It may eventually project summarized status outward, but it must not own the operator control surface. + +The operator command center belongs with the sidecar/control-plane boundary, where Timmy already owns: +- orchestration policy +- cron definitions +- playbooks +- sidecar scripts +- deployment and runtime governance + +That makes the v1 ownership decision: +- `timmy-config` owns the requirements and first implementation shape + +Allowed future extraction: +- if the command center becomes large enough to deserve its own release cycle, implementation code may later move into a dedicated control-plane repo +- if that happens, `timmy-config` still remains the source of truth for policy, access requirements, and operator doctrine + +Rejected owner for v1: +- `the-nexus`, because it is the wrong boundary for an operator-only surface and invites demo/UI theater to masquerade as truth + +## Minimum viable views + +Every view must show freshness and expose drill-through links or identifiers back to the source object. + +| View | Must answer | Authoritative sources | MVP mutation status | +|------|-------------|-----------------------|---------------------| +| Brief status | What is red right now, what is degraded, and what needs operator attention first? | Derived rollup from the four views below; no standalone shadow state | Read-only | +| Agent health | Which agents or loops are alive, stalled, rate-limited, missing, or working the wrong thing? | Runtime health checks, process state, agent heartbeats, active claim/assignment state, model/provider health | Read-only | +| Review queue | Which PRs are waiting, blocked, risky, stale, or ready for review/merge? | Gitea PR state, review comments, checks, mergeability, labels, assignees | Read-only | +| Cron state | Which scheduled jobs are enabled, paused, stale, failing, or drifting from intended schedule? | Hermes cron registry, Huey consumer health, last-run status, next-run schedule | Read-only | +| Task board | What work is unassigned, assigned, in progress, blocked, or waiting on review across the active repos? | Gitea issues, labels, assignees, milestones, linked PRs, issue state | Read-only | + +## View requirements in detail + +### Brief status + +The brief status view is the operator's first screen. +It must provide a compact summary of: +- overall health state +- current review pressure +- current queue pressure +- cron failures or paused jobs that matter +- stale agent or service conditions + +It must be computed from the authoritative views below, not from a separate private cache. +A red item in brief status must point to the exact underlying object that caused it. + +### Agent health + +Minimum fields per agent or loop: +- agent name +- current state: up, down, degraded, idle, busy, rate-limited, unknown +- last successful activity time +- current task or claim, if any +- model/provider or service dependency in use +- failure mode when degraded + +The view must distinguish between: +- process missing +- process present but unhealthy +- healthy but idle +- healthy and actively working +- active but stale on one issue for too long + +This view must reflect real operator concerns, not just whether a shell process exists. + +### Review queue + +Minimum fields per PR row: +- repo +- PR number and title +- author +- age +- review state +- mergeability or blocking condition +- sensitive-surface flag when applicable + +The queue must make it obvious which PRs require Timmy judgment versus routine review. +It must not collapse all open PRs into a vanity count. + +### Cron state + +Minimum fields per scheduled job: +- job name +- desired state +- actual state +- last run time +- last result +- next run time +- pause reason or failure reason + +The view must highlight drift, especially cases where: +- config says the job exists but the runner is absent +- a job is paused and nobody noticed +- a job is overdue relative to its schedule +- the runner is alive but the job has stopped producing successful runs + +### Task board + +The task board is not a hand-maintained kanban. +It is a projection of Gitea truth. + +Minimum board lanes for MVP: +- unassigned +- assigned +- in progress +- blocked +- in review + +Lane membership must come from explicit source-of-truth signals such as assignees, labels, linked PRs, and issue state. +If the mapping is ambiguous, the card must say so rather than invent certainty. + +## Read-only versus mutating surfaces + +### Read-only for MVP + +The following are read-only in MVP: +- brief status +- agent health +- review queue +- cron state +- task board +- all filtering, sorting, searching, and drill-down behavior + +### May mutate later, but only as explicit controls + +The following are acceptable future mutation classes if they are isolated behind explicit controls and audit: +- pause or resume a cron job +- dispatch, assign, unassign, or requeue a task in Gitea +- post a review action or merge action to a PR +- restart or stop a named operator-managed agent/service + +These controls must never be mixed invisibly into passive status polling. +The operator must always know when a click is about to change world state. + +## Truth versus theater rules + +The command center must follow these rules: + +1. No hidden side effects on read. +2. No green status without a timestamped source. +3. No second queue that disagrees with Gitea. +4. No synthetic task board curated by hand. +5. No stale cache presented as live truth. +6. No public-facing polish requirements allowed to override operator clarity. +7. No fallback to personal human tokens when machine identity is missing. +8. No developer-specific local absolute paths in requirements, config examples, or UI copy. + +## Credential and identity requirements + +The surface must use machine-scoped or service-scoped credentials for any source it reads or writes. + +It must not rely on: +- a principal's browser session as the only auth story +- a hidden file lookup chain for a human token +- a personal access token copied into client-side code +- ambiguous fallback identity that changes behavior depending on who launched the process + +Remote operator access is granted by Tailscale identity and network reachability, not by making the surface public and adding a thin password prompt later. + +## Recommended implementation stance for v1 + +- implement the operator command center as a sidecar-owned surface under `timmy-config` +- keep the first version read-only +- prefer direct reads from Gitea, Hermes cron state, Huey/runtime state, and service health endpoints +- attach freshness metadata to every view +- treat drill-through links to source objects as mandatory, not optional +- postpone write controls until audit, identity, and source-of-truth mapping are explicit + +## Acceptance criteria for this requirement set + +- the minimum viable views are fixed as: agent health, review queue, cron state, task board, brief status +- the access model is explicitly local or Tailscale only +- operator truth is defined and separated from demo/UI theater +- read-only versus mutating behavior is explicitly separated +- repo ownership is decided: `timmy-config` owns v1 requirements and implementation boundary +- no local absolute paths are required by this design +- no human-token fallback pattern is allowed by this design -- 2.43.0