[RESILIENCE] Define per-agent fallback portfolios and routing doctrine (#170)

2026-04-04 21:40:36 +00:00
parent 2142d20129
commit ff7e22dcc8
3 changed files with 537 additions and 3 deletions
--- a/docs/fallback-portfolios.md
+++ b/docs/fallback-portfolios.md
@@ -0,0 +1,248 @@
+# Per-Agent Fallback Portfolios and Task-Class Routing
+
+Status: proposed doctrine for issue #155
+Scope: policy and sidecar structure only; no runtime wiring in `tasks.py` or live loops yet
+
+## Why this exists
+
+Timmy already has multiple model paths declared in `config.yaml`, multiple task surfaces in `playbooks/`, and multiple live automation lanes documented in `docs/automation-inventory.md`.
+
+What is missing is a declared resilience doctrine for how specific agents degrade when a provider, quota, or model family fails. Without that doctrine, the whole fleet tends to collapse onto the same fallback chain, which means one outage turns into synchronized fleet degradation.
+
+This spec makes the fallback graph explicit before runtime wiring lands.
+
+## Timmy ownership boundary
+
+`timmy-config` owns:
+- routing doctrine for Timmy-side task classes
+- sidecar-readable fallback portfolio declarations
+- capability floors and degraded-mode authority restrictions
+- the mapping between current playbooks and future resilient agent lanes
+
+`timmy-config` does not own:
+- live queue state or issue truth outside Gitea
+- launchd state, loop resurrection, or stale runtime reuse
+- ad hoc worktree history or hidden queue mutation
+
+That split matters. This repo should declare how routing is supposed to work. Runtime surfaces should consume that declaration instead of inventing their own fallback orderings.
+
+## Non-goals
+
+This issue does not:
+- fully wire portfolio selection into `tasks.py`, launch agents, or live loops
+- bless human-token or operator-token fallbacks as part of an automated chain
+- allow degraded agents to keep full authority just because they are still producing output
+
+## Role classes
+
+### 1. Judgment
+
+Use for work where the main risk is a bad decision, not a missing patch.
+
+Current Timmy surfaces:
+- `playbooks/issue-triager.yaml`
+- `playbooks/pr-reviewer.yaml`
+- `playbooks/verified-logic.yaml`
+
+Typical task classes:
+- issue triage
+- queue routing
+- PR review
+- proof / consistency checks
+- governance-sensitive review
+
+Judgment lanes may read broadly, but they lose authority earlier than builder lanes when degraded.
+
+### 2. Builder
+
+Use for work where the main risk is producing or verifying a change.
+
+Current Timmy surfaces:
+- `playbooks/bug-fixer.yaml`
+- `playbooks/test-writer.yaml`
+- `playbooks/refactor-specialist.yaml`
+
+Typical task classes:
+- bug fixes
+- test writing
+- bounded refactors
+- narrow docs or code repairs with verification
+
+Builder lanes keep patch-producing usefulness longer than judgment lanes, but they must lose control-plane authority as they degrade.
+
+### 3. Wolf / bulk
+
+Use for repetitive, high-volume, bounded, reversible work.
+
+Current Timmy world-state:
+- bulk and sweep behavior is still represented more by live ops reality in `docs/automation-inventory.md` than by a dedicated sidecar playbook
+- this class covers the work shape currently associated with queue hygiene, inventory refresh, docs sweeps, log summarization, and repetitive small-diff passes
+
+Typical task classes:
+- docs inventory refresh
+- log summarization
+- queue hygiene
+- repetitive small diffs
+- research or extraction sweeps
+
+Wolf / bulk lanes are throughput-first and deliberately lower-authority.
+
+## Routing policy
+
+1. If the task touches a sensitive control surface, route to judgment first even if the edit is small.
+2. If the task is primarily about merge authority, routing authority, proof, or governance, route to judgment.
+3. If the task is primarily about producing a patch with local verification, route to builder.
+4. If the task is repetitive, bounded, reversible, and low-authority, route to wolf / bulk.
+5. If a wolf / bulk task expands beyond its size or authority envelope, promote it upward; do not let it keep grinding forward through scope creep.
+6. If a builder task becomes architecture, multi-repo coordination, or control-plane review, promote it to judgment.
+7. If a lane reaches terminal fallback, it must still land in a usable degraded mode. Dead silence is not an acceptable terminal state.
+
+## Sensitive control surfaces
+
+These paths stay judgment-routed unless explicitly reviewed otherwise:
+- `SOUL.md`
+- `config.yaml`
+- `deploy.sh`
+- `tasks.py`
+- `playbooks/`
+- `cron/`
+- `memories/`
+- `skins/`
+- `training/`
+
+This mirrors the current PR-review doctrine and keeps degraded builder or bulk lanes away from Timmy's control plane.
+
+## Portfolio design rules
+
+The sidecar portfolio declaration in `fallback-portfolios.yaml` follows these rules:
+
+1. Every critical agent gets four slots:
+   - primary
+   - fallback1
+   - fallback2
+   - terminal fallback
+2. No two critical agents may share the same `primary + fallback1` pair.
+3. Provider families should be anti-correlated across critical lanes whenever practical.
+4. Terminal fallbacks must end in a usable degraded lane, not a null lane.
+5. At least one critical lane must end on a local-capable path.
+6. No human-token fallback patterns are allowed in automated chains.
+7. Degraded mode reduces authority before it removes usefulness.
+8. A terminal lane that cannot safely produce an artifact is not a valid terminal lane.
+
+## Explicit ban: synchronized fleet degradation
+
+Synchronized fleet degradation is forbidden.
+
+That means:
+- do not point every critical agent at the same fallback stack
+- do not let all judgment agents converge on the same first backup if avoidable
+- do not let all builder agents collapse onto the same weak terminal lane
+- do not treat "everyone fell back to the cheapest thing" as resilience
+
+A resilient fleet degrades unevenly on purpose. Some lanes should stay sharp while others become slower or narrower.
+
+## Capability floors and degraded authority
+
+### Shared slot semantics
+
+- `primary`: full role-class authority
+- `fallback1`: full task authority for normal work, but no silent broadening of scope
+- `fallback2`: bounded and reversible work only; no irreversible control-plane action
+- `terminal`: usable degraded lane only; must produce a machine-usable artifact but must not impersonate full authority
+
+### Judgment floors
+
+Judgment agents lose authority earliest.
+
+At `fallback2` and below, judgment lanes must not:
+- merge PRs
+- close or rewrite governing issues or PRs
+- mutate sensitive control surfaces
+- bulk-reassign the fleet
+- silently change routing policy
+
+Their degraded usefulness is still real:
+- classify backlog
+- produce draft routing plans
+- summarize risk
+- leave bounded labels or comments with explicit evidence
+
+### Builder floors
+
+Builder agents may continue doing useful narrow work deeper into degradation, but only inside a tighter box.
+
+At `fallback2`, builder lanes must be limited to:
+- single-issue work
+- reversible patches
+- narrow docs or test scaffolds
+- bounded file counts and small diff sizes
+
+At `terminal`, builder lanes must not:
+- touch sensitive control surfaces
+- merge or release
+- do multi-repo or architecture work
+- claim verification they did not run
+
+Their terminal usefulness may still include:
+- a small patch
+- a reproducer test
+- a docs fix
+- a draft branch or artifact for later review
+
+### Wolf / bulk floors
+
+Wolf / bulk lanes stay useful as summarizers and sweepers, not as governors.
+
+At `fallback2` and `terminal`, wolf / bulk lanes must not:
+- fan out branch creation across repos
+- mass-assign agents
+- edit sensitive control surfaces
+- perform irreversible queue mutation
+
+Their degraded usefulness may still include:
+- gathering evidence
+- refreshing inventories
+- summarizing logs
+- proposing labels or routes
+- producing repetitive, low-risk artifacts inside explicit caps
+
+## Usable terminal lanes
+
+A terminal fallback is only valid if it still does at least one of these safely:
+- classify and summarize a backlog
+- produce a bounded patch or test artifact
+- summarize a diff with explicit uncertainty
+- refresh an inventory or evidence bundle
+
+If the terminal lane can only say "model unavailable" and stop, the portfolio is incomplete.
+
+## Current sidecar reference lanes
+
+`fallback-portfolios.yaml` defines the initial implementation-ready structure for four named lanes:
+- `triage-coordinator` — judgment
+- `pr-reviewer` — judgment
+- `builder-main` — builder
+- `wolf-sweeper` — wolf / bulk
+
+These are the canonical resilience lanes for the current Timmy world-state.
+
+Current playbooks should eventually map onto them like this:
+- `playbooks/issue-triager.yaml` -> `triage-coordinator`
+- `playbooks/pr-reviewer.yaml` -> `pr-reviewer`
+- `playbooks/verified-logic.yaml` -> judgment lane family, pending a dedicated proof profile if needed
+- `playbooks/bug-fixer.yaml`, `playbooks/test-writer.yaml`, and `playbooks/refactor-specialist.yaml` -> `builder-main`
+- future sidecar bulk playbooks should inherit from `wolf-sweeper` instead of inventing independent fallback chains
+
+Until runtime wiring lands, unmapped playbooks should be treated as policy-incomplete rather than inheriting an implicit fallback chain.
+
+## Wiring contract for later implementation
+
+When this is wired into runtime selection, the selector should:
+- classify the incoming task into a role class
+- check whether the task touches a sensitive control surface
+- choose the named agent lane for that class
+- step through the declared portfolio slots in order
+- enforce the capability floor of the active slot before taking action
+- record when a fallback transition happened and what authority was still allowed
+
+The important part is not just choosing a different model. It is choosing a different authority envelope as the lane degrades.