[RESILIENCE] Define per-agent fallback portfolios and routing doctrine (#170)
This commit was merged in pull request #170.
This commit is contained in:
248
docs/fallback-portfolios.md
Normal file
248
docs/fallback-portfolios.md
Normal file
@@ -0,0 +1,248 @@
|
||||
# Per-Agent Fallback Portfolios and Task-Class Routing
|
||||
|
||||
Status: proposed doctrine for issue #155
|
||||
Scope: policy and sidecar structure only; no runtime wiring in `tasks.py` or live loops yet
|
||||
|
||||
## Why this exists
|
||||
|
||||
Timmy already has multiple model paths declared in `config.yaml`, multiple task surfaces in `playbooks/`, and multiple live automation lanes documented in `docs/automation-inventory.md`.
|
||||
|
||||
What is missing is a declared resilience doctrine for how specific agents degrade when a provider, quota, or model family fails. Without that doctrine, the whole fleet tends to collapse onto the same fallback chain, which means one outage turns into synchronized fleet degradation.
|
||||
|
||||
This spec makes the fallback graph explicit before runtime wiring lands.
|
||||
|
||||
## Timmy ownership boundary
|
||||
|
||||
`timmy-config` owns:
|
||||
- routing doctrine for Timmy-side task classes
|
||||
- sidecar-readable fallback portfolio declarations
|
||||
- capability floors and degraded-mode authority restrictions
|
||||
- the mapping between current playbooks and future resilient agent lanes
|
||||
|
||||
`timmy-config` does not own:
|
||||
- live queue state or issue truth outside Gitea
|
||||
- launchd state, loop resurrection, or stale runtime reuse
|
||||
- ad hoc worktree history or hidden queue mutation
|
||||
|
||||
That split matters. This repo should declare how routing is supposed to work. Runtime surfaces should consume that declaration instead of inventing their own fallback orderings.
|
||||
|
||||
## Non-goals
|
||||
|
||||
This issue does not:
|
||||
- fully wire portfolio selection into `tasks.py`, launch agents, or live loops
|
||||
- bless human-token or operator-token fallbacks as part of an automated chain
|
||||
- allow degraded agents to keep full authority just because they are still producing output
|
||||
|
||||
## Role classes
|
||||
|
||||
### 1. Judgment
|
||||
|
||||
Use for work where the main risk is a bad decision, not a missing patch.
|
||||
|
||||
Current Timmy surfaces:
|
||||
- `playbooks/issue-triager.yaml`
|
||||
- `playbooks/pr-reviewer.yaml`
|
||||
- `playbooks/verified-logic.yaml`
|
||||
|
||||
Typical task classes:
|
||||
- issue triage
|
||||
- queue routing
|
||||
- PR review
|
||||
- proof / consistency checks
|
||||
- governance-sensitive review
|
||||
|
||||
Judgment lanes may read broadly, but they lose authority earlier than builder lanes when degraded.
|
||||
|
||||
### 2. Builder
|
||||
|
||||
Use for work where the main risk is producing or verifying a change.
|
||||
|
||||
Current Timmy surfaces:
|
||||
- `playbooks/bug-fixer.yaml`
|
||||
- `playbooks/test-writer.yaml`
|
||||
- `playbooks/refactor-specialist.yaml`
|
||||
|
||||
Typical task classes:
|
||||
- bug fixes
|
||||
- test writing
|
||||
- bounded refactors
|
||||
- narrow docs or code repairs with verification
|
||||
|
||||
Builder lanes keep patch-producing usefulness longer than judgment lanes, but they must lose control-plane authority as they degrade.
|
||||
|
||||
### 3. Wolf / bulk
|
||||
|
||||
Use for repetitive, high-volume, bounded, reversible work.
|
||||
|
||||
Current Timmy world-state:
|
||||
- bulk and sweep behavior is still represented more by live ops reality in `docs/automation-inventory.md` than by a dedicated sidecar playbook
|
||||
- this class covers the work shape currently associated with queue hygiene, inventory refresh, docs sweeps, log summarization, and repetitive small-diff passes
|
||||
|
||||
Typical task classes:
|
||||
- docs inventory refresh
|
||||
- log summarization
|
||||
- queue hygiene
|
||||
- repetitive small diffs
|
||||
- research or extraction sweeps
|
||||
|
||||
Wolf / bulk lanes are throughput-first and deliberately lower-authority.
|
||||
|
||||
## Routing policy
|
||||
|
||||
1. If the task touches a sensitive control surface, route to judgment first even if the edit is small.
|
||||
2. If the task is primarily about merge authority, routing authority, proof, or governance, route to judgment.
|
||||
3. If the task is primarily about producing a patch with local verification, route to builder.
|
||||
4. If the task is repetitive, bounded, reversible, and low-authority, route to wolf / bulk.
|
||||
5. If a wolf / bulk task expands beyond its size or authority envelope, promote it upward; do not let it keep grinding forward through scope creep.
|
||||
6. If a builder task becomes architecture, multi-repo coordination, or control-plane review, promote it to judgment.
|
||||
7. If a lane reaches terminal fallback, it must still land in a usable degraded mode. Dead silence is not an acceptable terminal state.
|
||||
|
||||
## Sensitive control surfaces
|
||||
|
||||
These paths stay judgment-routed unless explicitly reviewed otherwise:
|
||||
- `SOUL.md`
|
||||
- `config.yaml`
|
||||
- `deploy.sh`
|
||||
- `tasks.py`
|
||||
- `playbooks/`
|
||||
- `cron/`
|
||||
- `memories/`
|
||||
- `skins/`
|
||||
- `training/`
|
||||
|
||||
This mirrors the current PR-review doctrine and keeps degraded builder or bulk lanes away from Timmy's control plane.
|
||||
|
||||
## Portfolio design rules
|
||||
|
||||
The sidecar portfolio declaration in `fallback-portfolios.yaml` follows these rules:
|
||||
|
||||
1. Every critical agent gets four slots:
|
||||
- primary
|
||||
- fallback1
|
||||
- fallback2
|
||||
- terminal fallback
|
||||
2. No two critical agents may share the same `primary + fallback1` pair.
|
||||
3. Provider families should be anti-correlated across critical lanes whenever practical.
|
||||
4. Terminal fallbacks must end in a usable degraded lane, not a null lane.
|
||||
5. At least one critical lane must end on a local-capable path.
|
||||
6. No human-token fallback patterns are allowed in automated chains.
|
||||
7. Degraded mode reduces authority before it removes usefulness.
|
||||
8. A terminal lane that cannot safely produce an artifact is not a valid terminal lane.
|
||||
|
||||
## Explicit ban: synchronized fleet degradation
|
||||
|
||||
Synchronized fleet degradation is forbidden.
|
||||
|
||||
That means:
|
||||
- do not point every critical agent at the same fallback stack
|
||||
- do not let all judgment agents converge on the same first backup if avoidable
|
||||
- do not let all builder agents collapse onto the same weak terminal lane
|
||||
- do not treat "everyone fell back to the cheapest thing" as resilience
|
||||
|
||||
A resilient fleet degrades unevenly on purpose. Some lanes should stay sharp while others become slower or narrower.
|
||||
|
||||
## Capability floors and degraded authority
|
||||
|
||||
### Shared slot semantics
|
||||
|
||||
- `primary`: full role-class authority
|
||||
- `fallback1`: full task authority for normal work, but no silent broadening of scope
|
||||
- `fallback2`: bounded and reversible work only; no irreversible control-plane action
|
||||
- `terminal`: usable degraded lane only; must produce a machine-usable artifact but must not impersonate full authority
|
||||
|
||||
### Judgment floors
|
||||
|
||||
Judgment agents lose authority earliest.
|
||||
|
||||
At `fallback2` and below, judgment lanes must not:
|
||||
- merge PRs
|
||||
- close or rewrite governing issues or PRs
|
||||
- mutate sensitive control surfaces
|
||||
- bulk-reassign the fleet
|
||||
- silently change routing policy
|
||||
|
||||
Their degraded usefulness is still real:
|
||||
- classify backlog
|
||||
- produce draft routing plans
|
||||
- summarize risk
|
||||
- leave bounded labels or comments with explicit evidence
|
||||
|
||||
### Builder floors
|
||||
|
||||
Builder agents may continue doing useful narrow work deeper into degradation, but only inside a tighter box.
|
||||
|
||||
At `fallback2`, builder lanes must be limited to:
|
||||
- single-issue work
|
||||
- reversible patches
|
||||
- narrow docs or test scaffolds
|
||||
- bounded file counts and small diff sizes
|
||||
|
||||
At `terminal`, builder lanes must not:
|
||||
- touch sensitive control surfaces
|
||||
- merge or release
|
||||
- do multi-repo or architecture work
|
||||
- claim verification they did not run
|
||||
|
||||
Their terminal usefulness may still include:
|
||||
- a small patch
|
||||
- a reproducer test
|
||||
- a docs fix
|
||||
- a draft branch or artifact for later review
|
||||
|
||||
### Wolf / bulk floors
|
||||
|
||||
Wolf / bulk lanes stay useful as summarizers and sweepers, not as governors.
|
||||
|
||||
At `fallback2` and `terminal`, wolf / bulk lanes must not:
|
||||
- fan out branch creation across repos
|
||||
- mass-assign agents
|
||||
- edit sensitive control surfaces
|
||||
- perform irreversible queue mutation
|
||||
|
||||
Their degraded usefulness may still include:
|
||||
- gathering evidence
|
||||
- refreshing inventories
|
||||
- summarizing logs
|
||||
- proposing labels or routes
|
||||
- producing repetitive, low-risk artifacts inside explicit caps
|
||||
|
||||
## Usable terminal lanes
|
||||
|
||||
A terminal fallback is only valid if it still does at least one of these safely:
|
||||
- classify and summarize a backlog
|
||||
- produce a bounded patch or test artifact
|
||||
- summarize a diff with explicit uncertainty
|
||||
- refresh an inventory or evidence bundle
|
||||
|
||||
If the terminal lane can only say "model unavailable" and stop, the portfolio is incomplete.
|
||||
|
||||
## Current sidecar reference lanes
|
||||
|
||||
`fallback-portfolios.yaml` defines the initial implementation-ready structure for four named lanes:
|
||||
- `triage-coordinator` — judgment
|
||||
- `pr-reviewer` — judgment
|
||||
- `builder-main` — builder
|
||||
- `wolf-sweeper` — wolf / bulk
|
||||
|
||||
These are the canonical resilience lanes for the current Timmy world-state.
|
||||
|
||||
Current playbooks should eventually map onto them like this:
|
||||
- `playbooks/issue-triager.yaml` -> `triage-coordinator`
|
||||
- `playbooks/pr-reviewer.yaml` -> `pr-reviewer`
|
||||
- `playbooks/verified-logic.yaml` -> judgment lane family, pending a dedicated proof profile if needed
|
||||
- `playbooks/bug-fixer.yaml`, `playbooks/test-writer.yaml`, and `playbooks/refactor-specialist.yaml` -> `builder-main`
|
||||
- future sidecar bulk playbooks should inherit from `wolf-sweeper` instead of inventing independent fallback chains
|
||||
|
||||
Until runtime wiring lands, unmapped playbooks should be treated as policy-incomplete rather than inheriting an implicit fallback chain.
|
||||
|
||||
## Wiring contract for later implementation
|
||||
|
||||
When this is wired into runtime selection, the selector should:
|
||||
- classify the incoming task into a role class
|
||||
- check whether the task touches a sensitive control surface
|
||||
- choose the named agent lane for that class
|
||||
- step through the declared portfolio slots in order
|
||||
- enforce the capability floor of the active slot before taking action
|
||||
- record when a fallback transition happened and what authority was still allowed
|
||||
|
||||
The important part is not just choosing a different model. It is choosing a different authority envelope as the lane degrades.
|
||||
Reference in New Issue
Block a user