[RESILIENCE] Define per-agent fallback portfolios and routing doctrine #170

Merged
Timmy merged 1 commits from timmy/issue-155-fallback-portfolios into main 2026-04-04 21:40:37 +00:00
3 changed files with 537 additions and 3 deletions

View File

@@ -13,6 +13,7 @@ timmy-config/
├── FALSEWORK.md ← API cost management strategy
├── DEPRECATED.md ← What was removed and why
├── config.yaml ← Hermes harness configuration
├── fallback-portfolios.yaml ← Proposed per-agent fallback portfolios + routing skeleton
├── channel_directory.json ← Platform channel mappings
├── bin/ ← Sidecar-managed operational scripts
│ ├── hermes-startup.sh ← Dormant startup path (audit before enabling)
@@ -28,14 +29,15 @@ timmy-config/
├── docs/
│ ├── automation-inventory.md ← Live automation + stale-state inventory
│ ├── ipc-hub-and-spoke-doctrine.md ← Coordinator-first, transport-agnostic fleet IPC doctrine
── coordinator-first-protocol.md ← Coordinator doctrine: intake → triage → route → track → verify → report
── coordinator-first-protocol.md ← Coordinator doctrine: intake → triage → route → track → verify → report
│ └── fallback-portfolios.md ← Routing and degraded-authority doctrine
└── training/ ← Transitional training recipes, not canonical lived data
```
## Boundary
`timmy-config` owns identity, conscience, memories, skins, playbooks, channel
maps, and harness-side orchestration glue.
`timmy-config` owns identity, conscience, memories, skins, playbooks, routing doctrine,
channel maps, fallback portfolio declarations, and harness-side orchestration glue.
`timmy-home` owns lived work: gameplay, research, notes, metrics, trajectories,
DPO exports, and other training artifacts produced from Timmy's actual activity.

248
docs/fallback-portfolios.md Normal file
View File

@@ -0,0 +1,248 @@
# Per-Agent Fallback Portfolios and Task-Class Routing
Status: proposed doctrine for issue #155
Scope: policy and sidecar structure only; no runtime wiring in `tasks.py` or live loops yet
## Why this exists
Timmy already has multiple model paths declared in `config.yaml`, multiple task surfaces in `playbooks/`, and multiple live automation lanes documented in `docs/automation-inventory.md`.
What is missing is a declared resilience doctrine for how specific agents degrade when a provider, quota, or model family fails. Without that doctrine, the whole fleet tends to collapse onto the same fallback chain, which means one outage turns into synchronized fleet degradation.
This spec makes the fallback graph explicit before runtime wiring lands.
## Timmy ownership boundary
`timmy-config` owns:
- routing doctrine for Timmy-side task classes
- sidecar-readable fallback portfolio declarations
- capability floors and degraded-mode authority restrictions
- the mapping between current playbooks and future resilient agent lanes
`timmy-config` does not own:
- live queue state or issue truth outside Gitea
- launchd state, loop resurrection, or stale runtime reuse
- ad hoc worktree history or hidden queue mutation
That split matters. This repo should declare how routing is supposed to work. Runtime surfaces should consume that declaration instead of inventing their own fallback orderings.
## Non-goals
This issue does not:
- fully wire portfolio selection into `tasks.py`, launch agents, or live loops
- bless human-token or operator-token fallbacks as part of an automated chain
- allow degraded agents to keep full authority just because they are still producing output
## Role classes
### 1. Judgment
Use for work where the main risk is a bad decision, not a missing patch.
Current Timmy surfaces:
- `playbooks/issue-triager.yaml`
- `playbooks/pr-reviewer.yaml`
- `playbooks/verified-logic.yaml`
Typical task classes:
- issue triage
- queue routing
- PR review
- proof / consistency checks
- governance-sensitive review
Judgment lanes may read broadly, but they lose authority earlier than builder lanes when degraded.
### 2. Builder
Use for work where the main risk is producing or verifying a change.
Current Timmy surfaces:
- `playbooks/bug-fixer.yaml`
- `playbooks/test-writer.yaml`
- `playbooks/refactor-specialist.yaml`
Typical task classes:
- bug fixes
- test writing
- bounded refactors
- narrow docs or code repairs with verification
Builder lanes keep patch-producing usefulness longer than judgment lanes, but they must lose control-plane authority as they degrade.
### 3. Wolf / bulk
Use for repetitive, high-volume, bounded, reversible work.
Current Timmy world-state:
- bulk and sweep behavior is still represented more by live ops reality in `docs/automation-inventory.md` than by a dedicated sidecar playbook
- this class covers the work shape currently associated with queue hygiene, inventory refresh, docs sweeps, log summarization, and repetitive small-diff passes
Typical task classes:
- docs inventory refresh
- log summarization
- queue hygiene
- repetitive small diffs
- research or extraction sweeps
Wolf / bulk lanes are throughput-first and deliberately lower-authority.
## Routing policy
1. If the task touches a sensitive control surface, route to judgment first even if the edit is small.
2. If the task is primarily about merge authority, routing authority, proof, or governance, route to judgment.
3. If the task is primarily about producing a patch with local verification, route to builder.
4. If the task is repetitive, bounded, reversible, and low-authority, route to wolf / bulk.
5. If a wolf / bulk task expands beyond its size or authority envelope, promote it upward; do not let it keep grinding forward through scope creep.
6. If a builder task becomes architecture, multi-repo coordination, or control-plane review, promote it to judgment.
7. If a lane reaches terminal fallback, it must still land in a usable degraded mode. Dead silence is not an acceptable terminal state.
## Sensitive control surfaces
These paths stay judgment-routed unless explicitly reviewed otherwise:
- `SOUL.md`
- `config.yaml`
- `deploy.sh`
- `tasks.py`
- `playbooks/`
- `cron/`
- `memories/`
- `skins/`
- `training/`
This mirrors the current PR-review doctrine and keeps degraded builder or bulk lanes away from Timmy's control plane.
## Portfolio design rules
The sidecar portfolio declaration in `fallback-portfolios.yaml` follows these rules:
1. Every critical agent gets four slots:
- primary
- fallback1
- fallback2
- terminal fallback
2. No two critical agents may share the same `primary + fallback1` pair.
3. Provider families should be anti-correlated across critical lanes whenever practical.
4. Terminal fallbacks must end in a usable degraded lane, not a null lane.
5. At least one critical lane must end on a local-capable path.
6. No human-token fallback patterns are allowed in automated chains.
7. Degraded mode reduces authority before it removes usefulness.
8. A terminal lane that cannot safely produce an artifact is not a valid terminal lane.
## Explicit ban: synchronized fleet degradation
Synchronized fleet degradation is forbidden.
That means:
- do not point every critical agent at the same fallback stack
- do not let all judgment agents converge on the same first backup if avoidable
- do not let all builder agents collapse onto the same weak terminal lane
- do not treat "everyone fell back to the cheapest thing" as resilience
A resilient fleet degrades unevenly on purpose. Some lanes should stay sharp while others become slower or narrower.
## Capability floors and degraded authority
### Shared slot semantics
- `primary`: full role-class authority
- `fallback1`: full task authority for normal work, but no silent broadening of scope
- `fallback2`: bounded and reversible work only; no irreversible control-plane action
- `terminal`: usable degraded lane only; must produce a machine-usable artifact but must not impersonate full authority
### Judgment floors
Judgment agents lose authority earliest.
At `fallback2` and below, judgment lanes must not:
- merge PRs
- close or rewrite governing issues or PRs
- mutate sensitive control surfaces
- bulk-reassign the fleet
- silently change routing policy
Their degraded usefulness is still real:
- classify backlog
- produce draft routing plans
- summarize risk
- leave bounded labels or comments with explicit evidence
### Builder floors
Builder agents may continue doing useful narrow work deeper into degradation, but only inside a tighter box.
At `fallback2`, builder lanes must be limited to:
- single-issue work
- reversible patches
- narrow docs or test scaffolds
- bounded file counts and small diff sizes
At `terminal`, builder lanes must not:
- touch sensitive control surfaces
- merge or release
- do multi-repo or architecture work
- claim verification they did not run
Their terminal usefulness may still include:
- a small patch
- a reproducer test
- a docs fix
- a draft branch or artifact for later review
### Wolf / bulk floors
Wolf / bulk lanes stay useful as summarizers and sweepers, not as governors.
At `fallback2` and `terminal`, wolf / bulk lanes must not:
- fan out branch creation across repos
- mass-assign agents
- edit sensitive control surfaces
- perform irreversible queue mutation
Their degraded usefulness may still include:
- gathering evidence
- refreshing inventories
- summarizing logs
- proposing labels or routes
- producing repetitive, low-risk artifacts inside explicit caps
## Usable terminal lanes
A terminal fallback is only valid if it still does at least one of these safely:
- classify and summarize a backlog
- produce a bounded patch or test artifact
- summarize a diff with explicit uncertainty
- refresh an inventory or evidence bundle
If the terminal lane can only say "model unavailable" and stop, the portfolio is incomplete.
## Current sidecar reference lanes
`fallback-portfolios.yaml` defines the initial implementation-ready structure for four named lanes:
- `triage-coordinator` — judgment
- `pr-reviewer` — judgment
- `builder-main` — builder
- `wolf-sweeper` — wolf / bulk
These are the canonical resilience lanes for the current Timmy world-state.
Current playbooks should eventually map onto them like this:
- `playbooks/issue-triager.yaml` -> `triage-coordinator`
- `playbooks/pr-reviewer.yaml` -> `pr-reviewer`
- `playbooks/verified-logic.yaml` -> judgment lane family, pending a dedicated proof profile if needed
- `playbooks/bug-fixer.yaml`, `playbooks/test-writer.yaml`, and `playbooks/refactor-specialist.yaml` -> `builder-main`
- future sidecar bulk playbooks should inherit from `wolf-sweeper` instead of inventing independent fallback chains
Until runtime wiring lands, unmapped playbooks should be treated as policy-incomplete rather than inheriting an implicit fallback chain.
## Wiring contract for later implementation
When this is wired into runtime selection, the selector should:
- classify the incoming task into a role class
- check whether the task touches a sensitive control surface
- choose the named agent lane for that class
- step through the declared portfolio slots in order
- enforce the capability floor of the active slot before taking action
- record when a fallback transition happened and what authority was still allowed
The important part is not just choosing a different model. It is choosing a different authority envelope as the lane degrades.

284
fallback-portfolios.yaml Normal file
View File

@@ -0,0 +1,284 @@
schema_version: 1
status: proposed
runtime_wiring: false
owner: timmy-config
ownership:
owns:
- routing doctrine for task classes
- sidecar-readable per-agent fallback portfolios
- degraded-mode capability floors
does_not_own:
- live queue state outside Gitea truth
- launchd or loop process state
- ad hoc worktree history
policy:
require_four_slots_for_critical_agents: true
terminal_fallback_must_be_usable: true
forbid_synchronized_fleet_degradation: true
forbid_human_token_fallbacks: true
anti_correlation_rule: no two critical agents may share the same primary+fallback1 pair
sensitive_control_surfaces:
- SOUL.md
- config.yaml
- deploy.sh
- tasks.py
- playbooks/
- cron/
- memories/
- skins/
- training/
role_classes:
judgment:
current_surfaces:
- playbooks/issue-triager.yaml
- playbooks/pr-reviewer.yaml
- playbooks/verified-logic.yaml
task_classes:
- issue-triage
- queue-routing
- pr-review
- proof-check
- governance-review
degraded_mode:
fallback2:
allowed:
- classify backlog
- summarize risk
- produce draft routing plans
- leave bounded labels or comments with evidence
denied:
- merge pull requests
- close or rewrite governing issues or PRs
- mutate sensitive control surfaces
- bulk-reassign the fleet
- silently change routing policy
terminal:
lane: report-and-route
allowed:
- classify backlog
- summarize risk
- produce draft routing artifacts
denied:
- merge pull requests
- bulk-reassign the fleet
- mutate sensitive control surfaces
builder:
current_surfaces:
- playbooks/bug-fixer.yaml
- playbooks/test-writer.yaml
- playbooks/refactor-specialist.yaml
task_classes:
- bug-fix
- test-writing
- refactor
- bounded-docs-change
degraded_mode:
fallback2:
allowed:
- reversible single-issue changes
- narrow docs fixes
- test scaffolds and reproducers
denied:
- cross-repo changes
- sensitive control-surface edits
- merge or release actions
terminal:
lane: narrow-patch
allowed:
- single-issue small patch
- reproducer test
- docs-only repair
denied:
- sensitive control-surface edits
- multi-file architecture work
- irreversible actions
wolf_bulk:
current_surfaces:
- docs/automation-inventory.md
- FALSEWORK.md
task_classes:
- docs-inventory
- log-summarization
- queue-hygiene
- repetitive-small-diff
- research-sweep
degraded_mode:
fallback2:
allowed:
- gather evidence
- refresh inventories
- summarize logs
- propose labels or routes
denied:
- multi-repo branch fanout
- mass agent assignment
- sensitive control-surface edits
- irreversible queue mutation
terminal:
lane: gather-and-summarize
allowed:
- inventory refresh
- evidence bundles
- summaries
denied:
- multi-repo branch fanout
- mass agent assignment
- sensitive control-surface edits
routing:
issue-triage: judgment
queue-routing: judgment
pr-review: judgment
proof-check: judgment
governance-review: judgment
bug-fix: builder
test-writing: builder
refactor: builder
bounded-docs-change: builder
docs-inventory: wolf_bulk
log-summarization: wolf_bulk
queue-hygiene: wolf_bulk
repetitive-small-diff: wolf_bulk
research-sweep: wolf_bulk
promotion_rules:
- If a wolf/bulk task touches a sensitive control surface, promote it to judgment.
- If a builder task expands beyond 5 files, architecture review, or multi-repo coordination, promote it to judgment.
- If a terminal lane cannot produce a usable artifact, the portfolio is invalid and must be redesigned before wiring.
agents:
triage-coordinator:
role_class: judgment
critical: true
current_playbooks:
- playbooks/issue-triager.yaml
portfolio:
primary:
provider: anthropic
model: claude-opus-4-6
lane: full-judgment
fallback1:
provider: openai-codex
model: codex
lane: high-judgment
fallback2:
provider: gemini
model: gemini-2.5-pro
lane: bounded-judgment
terminal:
provider: ollama
model: hermes3:latest
lane: report-and-route
local_capable: true
usable_output:
- backlog classification
- routing draft
- risk summary
pr-reviewer:
role_class: judgment
critical: true
current_playbooks:
- playbooks/pr-reviewer.yaml
portfolio:
primary:
provider: anthropic
model: claude-opus-4-6
lane: full-review
fallback1:
provider: gemini
model: gemini-2.5-pro
lane: high-review
fallback2:
provider: grok
model: grok-3-mini-fast
lane: comment-only-review
terminal:
provider: openrouter
model: openai/gpt-4.1-mini
lane: low-stakes-diff-summary
local_capable: false
usable_output:
- diff risk summary
- explicit uncertainty notes
- merge-block recommendation
builder-main:
role_class: builder
critical: true
current_playbooks:
- playbooks/bug-fixer.yaml
- playbooks/test-writer.yaml
- playbooks/refactor-specialist.yaml
portfolio:
primary:
provider: openai-codex
model: codex
lane: full-builder
fallback1:
provider: kimi-coding
model: kimi-k2.5
lane: bounded-builder
fallback2:
provider: groq
model: llama-3.3-70b-versatile
lane: small-patch-builder
terminal:
provider: custom_provider
provider_name: Local llama.cpp
model: hermes4:14b
lane: narrow-patch
local_capable: true
usable_output:
- small patch
- reproducer test
- docs repair
wolf-sweeper:
role_class: wolf_bulk
critical: true
current_world_state:
- docs/automation-inventory.md
portfolio:
primary:
provider: gemini
model: gemini-2.5-flash
lane: fast-bulk
fallback1:
provider: groq
model: llama-3.3-70b-versatile
lane: fast-bulk-backup
fallback2:
provider: openrouter
model: openai/gpt-4.1-mini
lane: bounded-bulk-summary
terminal:
provider: ollama
model: hermes3:latest
lane: gather-and-summarize
local_capable: true
usable_output:
- inventory refresh
- evidence bundle
- summary comment
cross_checks:
unique_primary_fallback1_pairs:
triage-coordinator:
- anthropic/claude-opus-4-6
- openai-codex/codex
pr-reviewer:
- anthropic/claude-opus-4-6
- gemini/gemini-2.5-pro
builder-main:
- openai-codex/codex
- kimi-coding/kimi-k2.5
wolf-sweeper:
- gemini/gemini-2.5-flash
- groq/llama-3.3-70b-versatile