[CONTROL SURFACE] define Tailscale-only operator command center requirements #172
251
docs/operator-command-center-requirements.md
Normal file
251
docs/operator-command-center-requirements.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# Sovereign Operator Command Center Requirements
|
||||
|
||||
Status: requirements for #159
|
||||
Parent: #154
|
||||
Decision: v1 ownership stays in `timmy-config`
|
||||
|
||||
## Goal
|
||||
|
||||
Define the minimum viable operator command center for Timmy: a sovereign control surface that shows real system health, queue pressure, review load, and task state over a trusted network.
|
||||
|
||||
This is an operator surface, not a public product surface, not a demo, and not a reboot of the archived dashboard lineage.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- public internet exposure
|
||||
- a marketing or presentation dashboard
|
||||
- hidden queue mutation during polling or page refresh
|
||||
- a second shadow task database that competes with Gitea or Hermes runtime truth
|
||||
- personal-token fallback behavior hidden inside the UI or browser session
|
||||
- developer-specific local absolute paths in requirements, config, or examples
|
||||
|
||||
## Hard requirements
|
||||
|
||||
### 1. Access model: local or Tailscale only
|
||||
|
||||
The operator command center must be reachable only from:
|
||||
- `localhost`, or
|
||||
- a Tailscale-bound interface or Tailscale-gated tunnel
|
||||
|
||||
It must not:
|
||||
- bind a public-facing listener by default
|
||||
- require public DNS or public ingress
|
||||
- expose a login page to the open internet
|
||||
- degrade from Tailscale identity to ad hoc password sharing
|
||||
|
||||
If trusted-network conditions are missing or ambiguous, the surface must fail closed.
|
||||
|
||||
### 2. Truth model: operator truth beats UI theater
|
||||
|
||||
The command center exists to expose operator truth. That means every status tile, counter, and row must be backed by a named authoritative source and a freshness signal.
|
||||
|
||||
Authoritative sources for v1 are:
|
||||
- Gitea for issue, PR, review, assignee, and repo state
|
||||
- Hermes cron state and Huey runtime state for scheduled work
|
||||
- live runtime health checks, process state, and explicit agent heartbeat artifacts for agent liveness
|
||||
- direct model or service health endpoints for local inference and operator-facing services
|
||||
|
||||
Non-authoritative signals must never be treated as truth on their own. Examples:
|
||||
- pane color
|
||||
- old dashboard screenshots
|
||||
- manually curated status notes
|
||||
- stale cached summaries without source timestamps
|
||||
- synthetic green badges produced when the underlying source is unavailable
|
||||
|
||||
If a source is unavailable, the UI must say `unknown`, `stale`, or `degraded`.
|
||||
It must never silently substitute optimism.
|
||||
|
||||
### 3. Mutation model: read-first, explicit writes only
|
||||
|
||||
The default operator surface is read-only.
|
||||
|
||||
For MVP, the five required views below are read-only views.
|
||||
They may link the operator to the underlying source-of-truth object, but they must not mutate state merely by rendering, refreshing, filtering, or opening detail drawers.
|
||||
|
||||
If write actions are added later, they must live in a separate, explicit control surface with all of the following:
|
||||
- an intentional operator action
|
||||
- a confirmation step for destructive or queue-changing actions
|
||||
- a single named source-of-truth target
|
||||
- an audit trail tied to the action
|
||||
- idempotent behavior where practical
|
||||
- machine-scoped credentials, not a hidden fallback to a human personal token
|
||||
|
||||
### 4. Repo boundary: visible world is not operator truth
|
||||
|
||||
`the-nexus` is the visible world. It may eventually project summarized status outward, but it must not own the operator control surface.
|
||||
|
||||
The operator command center belongs with the sidecar/control-plane boundary, where Timmy already owns:
|
||||
- orchestration policy
|
||||
- cron definitions
|
||||
- playbooks
|
||||
- sidecar scripts
|
||||
- deployment and runtime governance
|
||||
|
||||
That makes the v1 ownership decision:
|
||||
- `timmy-config` owns the requirements and first implementation shape
|
||||
|
||||
Allowed future extraction:
|
||||
- if the command center becomes large enough to deserve its own release cycle, implementation code may later move into a dedicated control-plane repo
|
||||
- if that happens, `timmy-config` still remains the source of truth for policy, access requirements, and operator doctrine
|
||||
|
||||
Rejected owner for v1:
|
||||
- `the-nexus`, because it is the wrong boundary for an operator-only surface and invites demo/UI theater to masquerade as truth
|
||||
|
||||
## Minimum viable views
|
||||
|
||||
Every view must show freshness and expose drill-through links or identifiers back to the source object.
|
||||
|
||||
| View | Must answer | Authoritative sources | MVP mutation status |
|
||||
|------|-------------|-----------------------|---------------------|
|
||||
| Brief status | What is red right now, what is degraded, and what needs operator attention first? | Derived rollup from the four views below; no standalone shadow state | Read-only |
|
||||
| Agent health | Which agents or loops are alive, stalled, rate-limited, missing, or working the wrong thing? | Runtime health checks, process state, agent heartbeats, active claim/assignment state, model/provider health | Read-only |
|
||||
| Review queue | Which PRs are waiting, blocked, risky, stale, or ready for review/merge? | Gitea PR state, review comments, checks, mergeability, labels, assignees | Read-only |
|
||||
| Cron state | Which scheduled jobs are enabled, paused, stale, failing, or drifting from intended schedule? | Hermes cron registry, Huey consumer health, last-run status, next-run schedule | Read-only |
|
||||
| Task board | What work is unassigned, assigned, in progress, blocked, or waiting on review across the active repos? | Gitea issues, labels, assignees, milestones, linked PRs, issue state | Read-only |
|
||||
|
||||
## View requirements in detail
|
||||
|
||||
### Brief status
|
||||
|
||||
The brief status view is the operator's first screen.
|
||||
It must provide a compact summary of:
|
||||
- overall health state
|
||||
- current review pressure
|
||||
- current queue pressure
|
||||
- cron failures or paused jobs that matter
|
||||
- stale agent or service conditions
|
||||
|
||||
It must be computed from the authoritative views below, not from a separate private cache.
|
||||
A red item in brief status must point to the exact underlying object that caused it.
|
||||
|
||||
### Agent health
|
||||
|
||||
Minimum fields per agent or loop:
|
||||
- agent name
|
||||
- current state: up, down, degraded, idle, busy, rate-limited, unknown
|
||||
- last successful activity time
|
||||
- current task or claim, if any
|
||||
- model/provider or service dependency in use
|
||||
- failure mode when degraded
|
||||
|
||||
The view must distinguish between:
|
||||
- process missing
|
||||
- process present but unhealthy
|
||||
- healthy but idle
|
||||
- healthy and actively working
|
||||
- active but stale on one issue for too long
|
||||
|
||||
This view must reflect real operator concerns, not just whether a shell process exists.
|
||||
|
||||
### Review queue
|
||||
|
||||
Minimum fields per PR row:
|
||||
- repo
|
||||
- PR number and title
|
||||
- author
|
||||
- age
|
||||
- review state
|
||||
- mergeability or blocking condition
|
||||
- sensitive-surface flag when applicable
|
||||
|
||||
The queue must make it obvious which PRs require Timmy judgment versus routine review.
|
||||
It must not collapse all open PRs into a vanity count.
|
||||
|
||||
### Cron state
|
||||
|
||||
Minimum fields per scheduled job:
|
||||
- job name
|
||||
- desired state
|
||||
- actual state
|
||||
- last run time
|
||||
- last result
|
||||
- next run time
|
||||
- pause reason or failure reason
|
||||
|
||||
The view must highlight drift, especially cases where:
|
||||
- config says the job exists but the runner is absent
|
||||
- a job is paused and nobody noticed
|
||||
- a job is overdue relative to its schedule
|
||||
- the runner is alive but the job has stopped producing successful runs
|
||||
|
||||
### Task board
|
||||
|
||||
The task board is not a hand-maintained kanban.
|
||||
It is a projection of Gitea truth.
|
||||
|
||||
Minimum board lanes for MVP:
|
||||
- unassigned
|
||||
- assigned
|
||||
- in progress
|
||||
- blocked
|
||||
- in review
|
||||
|
||||
Lane membership must come from explicit source-of-truth signals such as assignees, labels, linked PRs, and issue state.
|
||||
If the mapping is ambiguous, the card must say so rather than invent certainty.
|
||||
|
||||
## Read-only versus mutating surfaces
|
||||
|
||||
### Read-only for MVP
|
||||
|
||||
The following are read-only in MVP:
|
||||
- brief status
|
||||
- agent health
|
||||
- review queue
|
||||
- cron state
|
||||
- task board
|
||||
- all filtering, sorting, searching, and drill-down behavior
|
||||
|
||||
### May mutate later, but only as explicit controls
|
||||
|
||||
The following are acceptable future mutation classes if they are isolated behind explicit controls and audit:
|
||||
- pause or resume a cron job
|
||||
- dispatch, assign, unassign, or requeue a task in Gitea
|
||||
- post a review action or merge action to a PR
|
||||
- restart or stop a named operator-managed agent/service
|
||||
|
||||
These controls must never be mixed invisibly into passive status polling.
|
||||
The operator must always know when a click is about to change world state.
|
||||
|
||||
## Truth versus theater rules
|
||||
|
||||
The command center must follow these rules:
|
||||
|
||||
1. No hidden side effects on read.
|
||||
2. No green status without a timestamped source.
|
||||
3. No second queue that disagrees with Gitea.
|
||||
4. No synthetic task board curated by hand.
|
||||
5. No stale cache presented as live truth.
|
||||
6. No public-facing polish requirements allowed to override operator clarity.
|
||||
7. No fallback to personal human tokens when machine identity is missing.
|
||||
8. No developer-specific local absolute paths in requirements, config examples, or UI copy.
|
||||
|
||||
## Credential and identity requirements
|
||||
|
||||
The surface must use machine-scoped or service-scoped credentials for any source it reads or writes.
|
||||
|
||||
It must not rely on:
|
||||
- a principal's browser session as the only auth story
|
||||
- a hidden file lookup chain for a human token
|
||||
- a personal access token copied into client-side code
|
||||
- ambiguous fallback identity that changes behavior depending on who launched the process
|
||||
|
||||
Remote operator access is granted by Tailscale identity and network reachability, not by making the surface public and adding a thin password prompt later.
|
||||
|
||||
## Recommended implementation stance for v1
|
||||
|
||||
- implement the operator command center as a sidecar-owned surface under `timmy-config`
|
||||
- keep the first version read-only
|
||||
- prefer direct reads from Gitea, Hermes cron state, Huey/runtime state, and service health endpoints
|
||||
- attach freshness metadata to every view
|
||||
- treat drill-through links to source objects as mandatory, not optional
|
||||
- postpone write controls until audit, identity, and source-of-truth mapping are explicit
|
||||
|
||||
## Acceptance criteria for this requirement set
|
||||
|
||||
- the minimum viable views are fixed as: agent health, review queue, cron state, task board, brief status
|
||||
- the access model is explicitly local or Tailscale only
|
||||
- operator truth is defined and separated from demo/UI theater
|
||||
- read-only versus mutating behavior is explicitly separated
|
||||
- repo ownership is decided: `timmy-config` owns v1 requirements and implementation boundary
|
||||
- no local absolute paths are required by this design
|
||||
- no human-token fallback pattern is allowed by this design
|
||||
Reference in New Issue
Block a user