docs: define fallback portfolios and routing policy

2026-04-04 17:24:33 -04:00
parent 6a71dfb5c7
commit 525af930cc
3 changed files with 536 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -13,6 +13,7 @@ timmy-config/
 ├── FALSEWORK.md               ← API cost management strategy
 ├── DEPRECATED.md              ← What was removed and why
 ├── config.yaml                ← Hermes harness configuration
+├── fallback-portfolios.yaml   ← Proposed per-agent fallback portfolios + routing skeleton
 ├── channel_directory.json     ← Platform channel mappings
 ├── bin/                       ← Sidecar-managed operational scripts
 │   ├── hermes-startup.sh      ← Dormant startup path (audit before enabling)
@@ -26,13 +27,14 @@ timmy-config/
 ├── playbooks/                 ← Agent playbooks (YAML)
 ├── cron/                      ← Cron job definitions
 ├── docs/automation-inventory.md ← Live automation + stale-state inventory
+├── docs/fallback-portfolios.md ← Routing and degraded-authority doctrine
 └── training/                  ← Transitional training recipes, not canonical lived data
 ```

 ## Boundary

-`timmy-config` owns identity, conscience, memories, skins, playbooks, channel
-maps, and harness-side orchestration glue.
+`timmy-config` owns identity, conscience, memories, skins, playbooks, routing doctrine,
+channel maps, fallback portfolio declarations, and harness-side orchestration glue.

 `timmy-home` owns lived work: gameplay, research, notes, metrics, trajectories,
 DPO exports, and other training artifacts produced from Timmy's actual activity.
--- a/docs/fallback-portfolios.md
+++ b/docs/fallback-portfolios.md
@@ -0,0 +1,248 @@
+# Per-Agent Fallback Portfolios and Task-Class Routing
+
+Status: proposed doctrine for issue #155
+Scope: policy and sidecar structure only; no runtime wiring in `tasks.py` or live loops yet
+
+## Why this exists
+
+Timmy already has multiple model paths declared in `config.yaml`, multiple task surfaces in `playbooks/`, and multiple live automation lanes documented in `docs/automation-inventory.md`.
+
+What is missing is a declared resilience doctrine for how specific agents degrade when a provider, quota, or model family fails. Without that doctrine, the whole fleet tends to collapse onto the same fallback chain, which means one outage turns into synchronized fleet degradation.
+
+This spec makes the fallback graph explicit before runtime wiring lands.
+
+## Timmy ownership boundary
+
+`timmy-config` owns:
+- routing doctrine for Timmy-side task classes
+- sidecar-readable fallback portfolio declarations
+- capability floors and degraded-mode authority restrictions
+- the mapping between current playbooks and future resilient agent lanes
+
+`timmy-config` does not own:
+- live queue state or issue truth outside Gitea
+- launchd state, loop resurrection, or stale runtime reuse
+- ad hoc worktree history or hidden queue mutation
+
+That split matters. This repo should declare how routing is supposed to work. Runtime surfaces should consume that declaration instead of inventing their own fallback orderings.
+
+## Non-goals
+
+This issue does not:
+- fully wire portfolio selection into `tasks.py`, launch agents, or live loops
+- bless human-token or operator-token fallbacks as part of an automated chain
+- allow degraded agents to keep full authority just because they are still producing output
+
+## Role classes
+
+### 1. Judgment
+
+Use for work where the main risk is a bad decision, not a missing patch.
+
+Current Timmy surfaces:
+- `playbooks/issue-triager.yaml`
+- `playbooks/pr-reviewer.yaml`
+- `playbooks/verified-logic.yaml`
+
+Typical task classes:
+- issue triage
+- queue routing
+- PR review
+- proof / consistency checks
+- governance-sensitive review
+
+Judgment lanes may read broadly, but they lose authority earlier than builder lanes when degraded.
+
+### 2. Builder
+
+Use for work where the main risk is producing or verifying a change.
+
+Current Timmy surfaces:
+- `playbooks/bug-fixer.yaml`
+- `playbooks/test-writer.yaml`
+- `playbooks/refactor-specialist.yaml`
+
+Typical task classes:
+- bug fixes
+- test writing
+- bounded refactors
+- narrow docs or code repairs with verification
+
+Builder lanes keep patch-producing usefulness longer than judgment lanes, but they must lose control-plane authority as they degrade.
+
+### 3. Wolf / bulk
+
+Use for repetitive, high-volume, bounded, reversible work.
+
+Current Timmy world-state:
+- bulk and sweep behavior is still represented more by live ops reality in `docs/automation-inventory.md` than by a dedicated sidecar playbook
+- this class covers the work shape currently associated with queue hygiene, inventory refresh, docs sweeps, log summarization, and repetitive small-diff passes
+
+Typical task classes:
+- docs inventory refresh
+- log summarization
+- queue hygiene
+- repetitive small diffs
+- research or extraction sweeps
+
+Wolf / bulk lanes are throughput-first and deliberately lower-authority.
+
+## Routing policy
+
+1. If the task touches a sensitive control surface, route to judgment first even if the edit is small.
+2. If the task is primarily about merge authority, routing authority, proof, or governance, route to judgment.
+3. If the task is primarily about producing a patch with local verification, route to builder.
+4. If the task is repetitive, bounded, reversible, and low-authority, route to wolf / bulk.
+5. If a wolf / bulk task expands beyond its size or authority envelope, promote it upward; do not let it keep grinding forward through scope creep.
+6. If a builder task becomes architecture, multi-repo coordination, or control-plane review, promote it to judgment.
+7. If a lane reaches terminal fallback, it must still land in a usable degraded mode. Dead silence is not an acceptable terminal state.
+
+## Sensitive control surfaces
+
+These paths stay judgment-routed unless explicitly reviewed otherwise:
+- `SOUL.md`
+- `config.yaml`
+- `deploy.sh`
+- `tasks.py`
+- `playbooks/`
+- `cron/`
+- `memories/`
+- `skins/`
+- `training/`
+
+This mirrors the current PR-review doctrine and keeps degraded builder or bulk lanes away from Timmy's control plane.
+
+## Portfolio design rules
+
+The sidecar portfolio declaration in `fallback-portfolios.yaml` follows these rules:
+
+1. Every critical agent gets four slots:
+   - primary
+   - fallback1
+   - fallback2
+   - terminal fallback
+2. No two critical agents may share the same `primary + fallback1` pair.
+3. Provider families should be anti-correlated across critical lanes whenever practical.
+4. Terminal fallbacks must end in a usable degraded lane, not a null lane.
+5. At least one critical lane must end on a local-capable path.
+6. No human-token fallback patterns are allowed in automated chains.
+7. Degraded mode reduces authority before it removes usefulness.
+8. A terminal lane that cannot safely produce an artifact is not a valid terminal lane.
+
+## Explicit ban: synchronized fleet degradation
+
+Synchronized fleet degradation is forbidden.
+
+That means:
+- do not point every critical agent at the same fallback stack
+- do not let all judgment agents converge on the same first backup if avoidable
+- do not let all builder agents collapse onto the same weak terminal lane
+- do not treat "everyone fell back to the cheapest thing" as resilience
+
+A resilient fleet degrades unevenly on purpose. Some lanes should stay sharp while others become slower or narrower.
+
+## Capability floors and degraded authority
+
+### Shared slot semantics
+
+- `primary`: full role-class authority
+- `fallback1`: full task authority for normal work, but no silent broadening of scope
+- `fallback2`: bounded and reversible work only; no irreversible control-plane action
+- `terminal`: usable degraded lane only; must produce a machine-usable artifact but must not impersonate full authority
+
+### Judgment floors
+
+Judgment agents lose authority earliest.
+
+At `fallback2` and below, judgment lanes must not:
+- merge PRs
+- close or rewrite governing issues or PRs
+- mutate sensitive control surfaces
+- bulk-reassign the fleet
+- silently change routing policy
+
+Their degraded usefulness is still real:
+- classify backlog
+- produce draft routing plans
+- summarize risk
+- leave bounded labels or comments with explicit evidence
+
+### Builder floors
+
+Builder agents may continue doing useful narrow work deeper into degradation, but only inside a tighter box.
+
+At `fallback2`, builder lanes must be limited to:
+- single-issue work
+- reversible patches
+- narrow docs or test scaffolds
+- bounded file counts and small diff sizes
+
+At `terminal`, builder lanes must not:
+- touch sensitive control surfaces
+- merge or release
+- do multi-repo or architecture work
+- claim verification they did not run
+
+Their terminal usefulness may still include:
+- a small patch
+- a reproducer test
+- a docs fix
+- a draft branch or artifact for later review
+
+### Wolf / bulk floors
+
+Wolf / bulk lanes stay useful as summarizers and sweepers, not as governors.
+
+At `fallback2` and `terminal`, wolf / bulk lanes must not:
+- fan out branch creation across repos
+- mass-assign agents
+- edit sensitive control surfaces
+- perform irreversible queue mutation
+
+Their degraded usefulness may still include:
+- gathering evidence
+- refreshing inventories
+- summarizing logs
+- proposing labels or routes
+- producing repetitive, low-risk artifacts inside explicit caps
+
+## Usable terminal lanes
+
+A terminal fallback is only valid if it still does at least one of these safely:
+- classify and summarize a backlog
+- produce a bounded patch or test artifact
+- summarize a diff with explicit uncertainty
+- refresh an inventory or evidence bundle
+
+If the terminal lane can only say "model unavailable" and stop, the portfolio is incomplete.
+
+## Current sidecar reference lanes
+
+`fallback-portfolios.yaml` defines the initial implementation-ready structure for four named lanes:
+- `triage-coordinator` — judgment
+- `pr-reviewer` — judgment
+- `builder-main` — builder
+- `wolf-sweeper` — wolf / bulk
+
+These are the canonical resilience lanes for the current Timmy world-state.
+
+Current playbooks should eventually map onto them like this:
+- `playbooks/issue-triager.yaml` -> `triage-coordinator`
+- `playbooks/pr-reviewer.yaml` -> `pr-reviewer`
+- `playbooks/verified-logic.yaml` -> judgment lane family, pending a dedicated proof profile if needed
+- `playbooks/bug-fixer.yaml`, `playbooks/test-writer.yaml`, and `playbooks/refactor-specialist.yaml` -> `builder-main`
+- future sidecar bulk playbooks should inherit from `wolf-sweeper` instead of inventing independent fallback chains
+
+Until runtime wiring lands, unmapped playbooks should be treated as policy-incomplete rather than inheriting an implicit fallback chain.
+
+## Wiring contract for later implementation
+
+When this is wired into runtime selection, the selector should:
+- classify the incoming task into a role class
+- check whether the task touches a sensitive control surface
+- choose the named agent lane for that class
+- step through the declared portfolio slots in order
+- enforce the capability floor of the active slot before taking action
+- record when a fallback transition happened and what authority was still allowed
+
+The important part is not just choosing a different model. It is choosing a different authority envelope as the lane degrades.
--- a/fallback-portfolios.yaml
+++ b/fallback-portfolios.yaml
@@ -0,0 +1,284 @@
+schema_version: 1
+status: proposed
+runtime_wiring: false
+owner: timmy-config
+
+ownership:
+  owns:
+    - routing doctrine for task classes
+    - sidecar-readable per-agent fallback portfolios
+    - degraded-mode capability floors
+  does_not_own:
+    - live queue state outside Gitea truth
+    - launchd or loop process state
+    - ad hoc worktree history
+
+policy:
+  require_four_slots_for_critical_agents: true
+  terminal_fallback_must_be_usable: true
+  forbid_synchronized_fleet_degradation: true
+  forbid_human_token_fallbacks: true
+  anti_correlation_rule: no two critical agents may share the same primary+fallback1 pair
+
+sensitive_control_surfaces:
+  - SOUL.md
+  - config.yaml
+  - deploy.sh
+  - tasks.py
+  - playbooks/
+  - cron/
+  - memories/
+  - skins/
+  - training/
+
+role_classes:
+  judgment:
+    current_surfaces:
+      - playbooks/issue-triager.yaml
+      - playbooks/pr-reviewer.yaml
+      - playbooks/verified-logic.yaml
+    task_classes:
+      - issue-triage
+      - queue-routing
+      - pr-review
+      - proof-check
+      - governance-review
+    degraded_mode:
+      fallback2:
+        allowed:
+          - classify backlog
+          - summarize risk
+          - produce draft routing plans
+          - leave bounded labels or comments with evidence
+        denied:
+          - merge pull requests
+          - close or rewrite governing issues or PRs
+          - mutate sensitive control surfaces
+          - bulk-reassign the fleet
+          - silently change routing policy
+      terminal:
+        lane: report-and-route
+        allowed:
+          - classify backlog
+          - summarize risk
+          - produce draft routing artifacts
+        denied:
+          - merge pull requests
+          - bulk-reassign the fleet
+          - mutate sensitive control surfaces
+
+  builder:
+    current_surfaces:
+      - playbooks/bug-fixer.yaml
+      - playbooks/test-writer.yaml
+      - playbooks/refactor-specialist.yaml
+    task_classes:
+      - bug-fix
+      - test-writing
+      - refactor
+      - bounded-docs-change
+    degraded_mode:
+      fallback2:
+        allowed:
+          - reversible single-issue changes
+          - narrow docs fixes
+          - test scaffolds and reproducers
+        denied:
+          - cross-repo changes
+          - sensitive control-surface edits
+          - merge or release actions
+      terminal:
+        lane: narrow-patch
+        allowed:
+          - single-issue small patch
+          - reproducer test
+          - docs-only repair
+        denied:
+          - sensitive control-surface edits
+          - multi-file architecture work
+          - irreversible actions
+
+  wolf_bulk:
+    current_surfaces:
+      - docs/automation-inventory.md
+      - FALSEWORK.md
+    task_classes:
+      - docs-inventory
+      - log-summarization
+      - queue-hygiene
+      - repetitive-small-diff
+      - research-sweep
+    degraded_mode:
+      fallback2:
+        allowed:
+          - gather evidence
+          - refresh inventories
+          - summarize logs
+          - propose labels or routes
+        denied:
+          - multi-repo branch fanout
+          - mass agent assignment
+          - sensitive control-surface edits
+          - irreversible queue mutation
+      terminal:
+        lane: gather-and-summarize
+        allowed:
+          - inventory refresh
+          - evidence bundles
+          - summaries
+        denied:
+          - multi-repo branch fanout
+          - mass agent assignment
+          - sensitive control-surface edits
+
+routing:
+  issue-triage: judgment
+  queue-routing: judgment
+  pr-review: judgment
+  proof-check: judgment
+  governance-review: judgment
+  bug-fix: builder
+  test-writing: builder
+  refactor: builder
+  bounded-docs-change: builder
+  docs-inventory: wolf_bulk
+  log-summarization: wolf_bulk
+  queue-hygiene: wolf_bulk
+  repetitive-small-diff: wolf_bulk
+  research-sweep: wolf_bulk
+
+promotion_rules:
+  - If a wolf/bulk task touches a sensitive control surface, promote it to judgment.
+  - If a builder task expands beyond 5 files, architecture review, or multi-repo coordination, promote it to judgment.
+  - If a terminal lane cannot produce a usable artifact, the portfolio is invalid and must be redesigned before wiring.
+
+agents:
+  triage-coordinator:
+    role_class: judgment
+    critical: true
+    current_playbooks:
+      - playbooks/issue-triager.yaml
+    portfolio:
+      primary:
+        provider: anthropic
+        model: claude-opus-4-6
+        lane: full-judgment
+      fallback1:
+        provider: openai-codex
+        model: codex
+        lane: high-judgment
+      fallback2:
+        provider: gemini
+        model: gemini-2.5-pro
+        lane: bounded-judgment
+      terminal:
+        provider: ollama
+        model: hermes3:latest
+        lane: report-and-route
+        local_capable: true
+        usable_output:
+          - backlog classification
+          - routing draft
+          - risk summary
+
+  pr-reviewer:
+    role_class: judgment
+    critical: true
+    current_playbooks:
+      - playbooks/pr-reviewer.yaml
+    portfolio:
+      primary:
+        provider: anthropic
+        model: claude-opus-4-6
+        lane: full-review
+      fallback1:
+        provider: gemini
+        model: gemini-2.5-pro
+        lane: high-review
+      fallback2:
+        provider: grok
+        model: grok-3-mini-fast
+        lane: comment-only-review
+      terminal:
+        provider: openrouter
+        model: openai/gpt-4.1-mini
+        lane: low-stakes-diff-summary
+        local_capable: false
+        usable_output:
+          - diff risk summary
+          - explicit uncertainty notes
+          - merge-block recommendation
+
+  builder-main:
+    role_class: builder
+    critical: true
+    current_playbooks:
+      - playbooks/bug-fixer.yaml
+      - playbooks/test-writer.yaml
+      - playbooks/refactor-specialist.yaml
+    portfolio:
+      primary:
+        provider: openai-codex
+        model: codex
+        lane: full-builder
+      fallback1:
+        provider: kimi-coding
+        model: kimi-k2.5
+        lane: bounded-builder
+      fallback2:
+        provider: groq
+        model: llama-3.3-70b-versatile
+        lane: small-patch-builder
+      terminal:
+        provider: custom_provider
+        provider_name: Local llama.cpp
+        model: hermes4:14b
+        lane: narrow-patch
+        local_capable: true
+        usable_output:
+          - small patch
+          - reproducer test
+          - docs repair
+
+  wolf-sweeper:
+    role_class: wolf_bulk
+    critical: true
+    current_world_state:
+      - docs/automation-inventory.md
+    portfolio:
+      primary:
+        provider: gemini
+        model: gemini-2.5-flash
+        lane: fast-bulk
+      fallback1:
+        provider: groq
+        model: llama-3.3-70b-versatile
+        lane: fast-bulk-backup
+      fallback2:
+        provider: openrouter
+        model: openai/gpt-4.1-mini
+        lane: bounded-bulk-summary
+      terminal:
+        provider: ollama
+        model: hermes3:latest
+        lane: gather-and-summarize
+        local_capable: true
+        usable_output:
+          - inventory refresh
+          - evidence bundle
+          - summary comment
+
+cross_checks:
+  unique_primary_fallback1_pairs:
+    triage-coordinator:
+      - anthropic/claude-opus-4-6
+      - openai-codex/codex
+    pr-reviewer:
+      - anthropic/claude-opus-4-6
+      - gemini/gemini-2.5-pro
+    builder-main:
+      - openai-codex/codex
+      - kimi-coding/kimi-k2.5
+    wolf-sweeper:
+      - gemini/gemini-2.5-flash
+      - groq/llama-3.3-70b-versatile