docs: define fallback portfolios and routing policy
This commit is contained in:
@@ -13,6 +13,7 @@ timmy-config/
|
||||
├── FALSEWORK.md ← API cost management strategy
|
||||
├── DEPRECATED.md ← What was removed and why
|
||||
├── config.yaml ← Hermes harness configuration
|
||||
├── fallback-portfolios.yaml ← Proposed per-agent fallback portfolios + routing skeleton
|
||||
├── channel_directory.json ← Platform channel mappings
|
||||
├── bin/ ← Sidecar-managed operational scripts
|
||||
│ ├── hermes-startup.sh ← Dormant startup path (audit before enabling)
|
||||
@@ -26,13 +27,14 @@ timmy-config/
|
||||
├── playbooks/ ← Agent playbooks (YAML)
|
||||
├── cron/ ← Cron job definitions
|
||||
├── docs/automation-inventory.md ← Live automation + stale-state inventory
|
||||
├── docs/fallback-portfolios.md ← Routing and degraded-authority doctrine
|
||||
└── training/ ← Transitional training recipes, not canonical lived data
|
||||
```
|
||||
|
||||
## Boundary
|
||||
|
||||
`timmy-config` owns identity, conscience, memories, skins, playbooks, channel
|
||||
maps, and harness-side orchestration glue.
|
||||
`timmy-config` owns identity, conscience, memories, skins, playbooks, routing doctrine,
|
||||
channel maps, fallback portfolio declarations, and harness-side orchestration glue.
|
||||
|
||||
`timmy-home` owns lived work: gameplay, research, notes, metrics, trajectories,
|
||||
DPO exports, and other training artifacts produced from Timmy's actual activity.
|
||||
|
||||
248
docs/fallback-portfolios.md
Normal file
248
docs/fallback-portfolios.md
Normal file
@@ -0,0 +1,248 @@
|
||||
# Per-Agent Fallback Portfolios and Task-Class Routing
|
||||
|
||||
Status: proposed doctrine for issue #155
|
||||
Scope: policy and sidecar structure only; no runtime wiring in `tasks.py` or live loops yet
|
||||
|
||||
## Why this exists
|
||||
|
||||
Timmy already has multiple model paths declared in `config.yaml`, multiple task surfaces in `playbooks/`, and multiple live automation lanes documented in `docs/automation-inventory.md`.
|
||||
|
||||
What is missing is a declared resilience doctrine for how specific agents degrade when a provider, quota, or model family fails. Without that doctrine, the whole fleet tends to collapse onto the same fallback chain, which means one outage turns into synchronized fleet degradation.
|
||||
|
||||
This spec makes the fallback graph explicit before runtime wiring lands.
|
||||
|
||||
## Timmy ownership boundary
|
||||
|
||||
`timmy-config` owns:
|
||||
- routing doctrine for Timmy-side task classes
|
||||
- sidecar-readable fallback portfolio declarations
|
||||
- capability floors and degraded-mode authority restrictions
|
||||
- the mapping between current playbooks and future resilient agent lanes
|
||||
|
||||
`timmy-config` does not own:
|
||||
- live queue state or issue truth outside Gitea
|
||||
- launchd state, loop resurrection, or stale runtime reuse
|
||||
- ad hoc worktree history or hidden queue mutation
|
||||
|
||||
That split matters. This repo should declare how routing is supposed to work. Runtime surfaces should consume that declaration instead of inventing their own fallback orderings.
|
||||
|
||||
## Non-goals
|
||||
|
||||
This issue does not:
|
||||
- fully wire portfolio selection into `tasks.py`, launch agents, or live loops
|
||||
- bless human-token or operator-token fallbacks as part of an automated chain
|
||||
- allow degraded agents to keep full authority just because they are still producing output
|
||||
|
||||
## Role classes
|
||||
|
||||
### 1. Judgment
|
||||
|
||||
Use for work where the main risk is a bad decision, not a missing patch.
|
||||
|
||||
Current Timmy surfaces:
|
||||
- `playbooks/issue-triager.yaml`
|
||||
- `playbooks/pr-reviewer.yaml`
|
||||
- `playbooks/verified-logic.yaml`
|
||||
|
||||
Typical task classes:
|
||||
- issue triage
|
||||
- queue routing
|
||||
- PR review
|
||||
- proof / consistency checks
|
||||
- governance-sensitive review
|
||||
|
||||
Judgment lanes may read broadly, but they lose authority earlier than builder lanes when degraded.
|
||||
|
||||
### 2. Builder
|
||||
|
||||
Use for work where the main risk is producing or verifying a change.
|
||||
|
||||
Current Timmy surfaces:
|
||||
- `playbooks/bug-fixer.yaml`
|
||||
- `playbooks/test-writer.yaml`
|
||||
- `playbooks/refactor-specialist.yaml`
|
||||
|
||||
Typical task classes:
|
||||
- bug fixes
|
||||
- test writing
|
||||
- bounded refactors
|
||||
- narrow docs or code repairs with verification
|
||||
|
||||
Builder lanes keep patch-producing usefulness longer than judgment lanes, but they must lose control-plane authority as they degrade.
|
||||
|
||||
### 3. Wolf / bulk
|
||||
|
||||
Use for repetitive, high-volume, bounded, reversible work.
|
||||
|
||||
Current Timmy world-state:
|
||||
- bulk and sweep behavior is still represented more by live ops reality in `docs/automation-inventory.md` than by a dedicated sidecar playbook
|
||||
- this class covers the work shape currently associated with queue hygiene, inventory refresh, docs sweeps, log summarization, and repetitive small-diff passes
|
||||
|
||||
Typical task classes:
|
||||
- docs inventory refresh
|
||||
- log summarization
|
||||
- queue hygiene
|
||||
- repetitive small diffs
|
||||
- research or extraction sweeps
|
||||
|
||||
Wolf / bulk lanes are throughput-first and deliberately lower-authority.
|
||||
|
||||
## Routing policy
|
||||
|
||||
1. If the task touches a sensitive control surface, route to judgment first even if the edit is small.
|
||||
2. If the task is primarily about merge authority, routing authority, proof, or governance, route to judgment.
|
||||
3. If the task is primarily about producing a patch with local verification, route to builder.
|
||||
4. If the task is repetitive, bounded, reversible, and low-authority, route to wolf / bulk.
|
||||
5. If a wolf / bulk task expands beyond its size or authority envelope, promote it upward; do not let it keep grinding forward through scope creep.
|
||||
6. If a builder task becomes architecture, multi-repo coordination, or control-plane review, promote it to judgment.
|
||||
7. If a lane reaches terminal fallback, it must still land in a usable degraded mode. Dead silence is not an acceptable terminal state.
|
||||
|
||||
## Sensitive control surfaces
|
||||
|
||||
These paths stay judgment-routed unless explicitly reviewed otherwise:
|
||||
- `SOUL.md`
|
||||
- `config.yaml`
|
||||
- `deploy.sh`
|
||||
- `tasks.py`
|
||||
- `playbooks/`
|
||||
- `cron/`
|
||||
- `memories/`
|
||||
- `skins/`
|
||||
- `training/`
|
||||
|
||||
This mirrors the current PR-review doctrine and keeps degraded builder or bulk lanes away from Timmy's control plane.
|
||||
|
||||
## Portfolio design rules
|
||||
|
||||
The sidecar portfolio declaration in `fallback-portfolios.yaml` follows these rules:
|
||||
|
||||
1. Every critical agent gets four slots:
|
||||
- primary
|
||||
- fallback1
|
||||
- fallback2
|
||||
- terminal fallback
|
||||
2. No two critical agents may share the same `primary + fallback1` pair.
|
||||
3. Provider families should be anti-correlated across critical lanes whenever practical.
|
||||
4. Terminal fallbacks must end in a usable degraded lane, not a null lane.
|
||||
5. At least one critical lane must end on a local-capable path.
|
||||
6. No human-token fallback patterns are allowed in automated chains.
|
||||
7. Degraded mode reduces authority before it removes usefulness.
|
||||
8. A terminal lane that cannot safely produce an artifact is not a valid terminal lane.
|
||||
|
||||
## Explicit ban: synchronized fleet degradation
|
||||
|
||||
Synchronized fleet degradation is forbidden.
|
||||
|
||||
That means:
|
||||
- do not point every critical agent at the same fallback stack
|
||||
- do not let all judgment agents converge on the same first backup if avoidable
|
||||
- do not let all builder agents collapse onto the same weak terminal lane
|
||||
- do not treat "everyone fell back to the cheapest thing" as resilience
|
||||
|
||||
A resilient fleet degrades unevenly on purpose. Some lanes should stay sharp while others become slower or narrower.
|
||||
|
||||
## Capability floors and degraded authority
|
||||
|
||||
### Shared slot semantics
|
||||
|
||||
- `primary`: full role-class authority
|
||||
- `fallback1`: full task authority for normal work, but no silent broadening of scope
|
||||
- `fallback2`: bounded and reversible work only; no irreversible control-plane action
|
||||
- `terminal`: usable degraded lane only; must produce a machine-usable artifact but must not impersonate full authority
|
||||
|
||||
### Judgment floors
|
||||
|
||||
Judgment agents lose authority earliest.
|
||||
|
||||
At `fallback2` and below, judgment lanes must not:
|
||||
- merge PRs
|
||||
- close or rewrite governing issues or PRs
|
||||
- mutate sensitive control surfaces
|
||||
- bulk-reassign the fleet
|
||||
- silently change routing policy
|
||||
|
||||
Their degraded usefulness is still real:
|
||||
- classify backlog
|
||||
- produce draft routing plans
|
||||
- summarize risk
|
||||
- leave bounded labels or comments with explicit evidence
|
||||
|
||||
### Builder floors
|
||||
|
||||
Builder agents may continue doing useful narrow work deeper into degradation, but only inside a tighter box.
|
||||
|
||||
At `fallback2`, builder lanes must be limited to:
|
||||
- single-issue work
|
||||
- reversible patches
|
||||
- narrow docs or test scaffolds
|
||||
- bounded file counts and small diff sizes
|
||||
|
||||
At `terminal`, builder lanes must not:
|
||||
- touch sensitive control surfaces
|
||||
- merge or release
|
||||
- do multi-repo or architecture work
|
||||
- claim verification they did not run
|
||||
|
||||
Their terminal usefulness may still include:
|
||||
- a small patch
|
||||
- a reproducer test
|
||||
- a docs fix
|
||||
- a draft branch or artifact for later review
|
||||
|
||||
### Wolf / bulk floors
|
||||
|
||||
Wolf / bulk lanes stay useful as summarizers and sweepers, not as governors.
|
||||
|
||||
At `fallback2` and `terminal`, wolf / bulk lanes must not:
|
||||
- fan out branch creation across repos
|
||||
- mass-assign agents
|
||||
- edit sensitive control surfaces
|
||||
- perform irreversible queue mutation
|
||||
|
||||
Their degraded usefulness may still include:
|
||||
- gathering evidence
|
||||
- refreshing inventories
|
||||
- summarizing logs
|
||||
- proposing labels or routes
|
||||
- producing repetitive, low-risk artifacts inside explicit caps
|
||||
|
||||
## Usable terminal lanes
|
||||
|
||||
A terminal fallback is only valid if it still does at least one of these safely:
|
||||
- classify and summarize a backlog
|
||||
- produce a bounded patch or test artifact
|
||||
- summarize a diff with explicit uncertainty
|
||||
- refresh an inventory or evidence bundle
|
||||
|
||||
If the terminal lane can only say "model unavailable" and stop, the portfolio is incomplete.
|
||||
|
||||
## Current sidecar reference lanes
|
||||
|
||||
`fallback-portfolios.yaml` defines the initial implementation-ready structure for four named lanes:
|
||||
- `triage-coordinator` — judgment
|
||||
- `pr-reviewer` — judgment
|
||||
- `builder-main` — builder
|
||||
- `wolf-sweeper` — wolf / bulk
|
||||
|
||||
These are the canonical resilience lanes for the current Timmy world-state.
|
||||
|
||||
Current playbooks should eventually map onto them like this:
|
||||
- `playbooks/issue-triager.yaml` -> `triage-coordinator`
|
||||
- `playbooks/pr-reviewer.yaml` -> `pr-reviewer`
|
||||
- `playbooks/verified-logic.yaml` -> judgment lane family, pending a dedicated proof profile if needed
|
||||
- `playbooks/bug-fixer.yaml`, `playbooks/test-writer.yaml`, and `playbooks/refactor-specialist.yaml` -> `builder-main`
|
||||
- future sidecar bulk playbooks should inherit from `wolf-sweeper` instead of inventing independent fallback chains
|
||||
|
||||
Until runtime wiring lands, unmapped playbooks should be treated as policy-incomplete rather than inheriting an implicit fallback chain.
|
||||
|
||||
## Wiring contract for later implementation
|
||||
|
||||
When this is wired into runtime selection, the selector should:
|
||||
- classify the incoming task into a role class
|
||||
- check whether the task touches a sensitive control surface
|
||||
- choose the named agent lane for that class
|
||||
- step through the declared portfolio slots in order
|
||||
- enforce the capability floor of the active slot before taking action
|
||||
- record when a fallback transition happened and what authority was still allowed
|
||||
|
||||
The important part is not just choosing a different model. It is choosing a different authority envelope as the lane degrades.
|
||||
284
fallback-portfolios.yaml
Normal file
284
fallback-portfolios.yaml
Normal file
@@ -0,0 +1,284 @@
|
||||
schema_version: 1
|
||||
status: proposed
|
||||
runtime_wiring: false
|
||||
owner: timmy-config
|
||||
|
||||
ownership:
|
||||
owns:
|
||||
- routing doctrine for task classes
|
||||
- sidecar-readable per-agent fallback portfolios
|
||||
- degraded-mode capability floors
|
||||
does_not_own:
|
||||
- live queue state outside Gitea truth
|
||||
- launchd or loop process state
|
||||
- ad hoc worktree history
|
||||
|
||||
policy:
|
||||
require_four_slots_for_critical_agents: true
|
||||
terminal_fallback_must_be_usable: true
|
||||
forbid_synchronized_fleet_degradation: true
|
||||
forbid_human_token_fallbacks: true
|
||||
anti_correlation_rule: no two critical agents may share the same primary+fallback1 pair
|
||||
|
||||
sensitive_control_surfaces:
|
||||
- SOUL.md
|
||||
- config.yaml
|
||||
- deploy.sh
|
||||
- tasks.py
|
||||
- playbooks/
|
||||
- cron/
|
||||
- memories/
|
||||
- skins/
|
||||
- training/
|
||||
|
||||
role_classes:
|
||||
judgment:
|
||||
current_surfaces:
|
||||
- playbooks/issue-triager.yaml
|
||||
- playbooks/pr-reviewer.yaml
|
||||
- playbooks/verified-logic.yaml
|
||||
task_classes:
|
||||
- issue-triage
|
||||
- queue-routing
|
||||
- pr-review
|
||||
- proof-check
|
||||
- governance-review
|
||||
degraded_mode:
|
||||
fallback2:
|
||||
allowed:
|
||||
- classify backlog
|
||||
- summarize risk
|
||||
- produce draft routing plans
|
||||
- leave bounded labels or comments with evidence
|
||||
denied:
|
||||
- merge pull requests
|
||||
- close or rewrite governing issues or PRs
|
||||
- mutate sensitive control surfaces
|
||||
- bulk-reassign the fleet
|
||||
- silently change routing policy
|
||||
terminal:
|
||||
lane: report-and-route
|
||||
allowed:
|
||||
- classify backlog
|
||||
- summarize risk
|
||||
- produce draft routing artifacts
|
||||
denied:
|
||||
- merge pull requests
|
||||
- bulk-reassign the fleet
|
||||
- mutate sensitive control surfaces
|
||||
|
||||
builder:
|
||||
current_surfaces:
|
||||
- playbooks/bug-fixer.yaml
|
||||
- playbooks/test-writer.yaml
|
||||
- playbooks/refactor-specialist.yaml
|
||||
task_classes:
|
||||
- bug-fix
|
||||
- test-writing
|
||||
- refactor
|
||||
- bounded-docs-change
|
||||
degraded_mode:
|
||||
fallback2:
|
||||
allowed:
|
||||
- reversible single-issue changes
|
||||
- narrow docs fixes
|
||||
- test scaffolds and reproducers
|
||||
denied:
|
||||
- cross-repo changes
|
||||
- sensitive control-surface edits
|
||||
- merge or release actions
|
||||
terminal:
|
||||
lane: narrow-patch
|
||||
allowed:
|
||||
- single-issue small patch
|
||||
- reproducer test
|
||||
- docs-only repair
|
||||
denied:
|
||||
- sensitive control-surface edits
|
||||
- multi-file architecture work
|
||||
- irreversible actions
|
||||
|
||||
wolf_bulk:
|
||||
current_surfaces:
|
||||
- docs/automation-inventory.md
|
||||
- FALSEWORK.md
|
||||
task_classes:
|
||||
- docs-inventory
|
||||
- log-summarization
|
||||
- queue-hygiene
|
||||
- repetitive-small-diff
|
||||
- research-sweep
|
||||
degraded_mode:
|
||||
fallback2:
|
||||
allowed:
|
||||
- gather evidence
|
||||
- refresh inventories
|
||||
- summarize logs
|
||||
- propose labels or routes
|
||||
denied:
|
||||
- multi-repo branch fanout
|
||||
- mass agent assignment
|
||||
- sensitive control-surface edits
|
||||
- irreversible queue mutation
|
||||
terminal:
|
||||
lane: gather-and-summarize
|
||||
allowed:
|
||||
- inventory refresh
|
||||
- evidence bundles
|
||||
- summaries
|
||||
denied:
|
||||
- multi-repo branch fanout
|
||||
- mass agent assignment
|
||||
- sensitive control-surface edits
|
||||
|
||||
routing:
|
||||
issue-triage: judgment
|
||||
queue-routing: judgment
|
||||
pr-review: judgment
|
||||
proof-check: judgment
|
||||
governance-review: judgment
|
||||
bug-fix: builder
|
||||
test-writing: builder
|
||||
refactor: builder
|
||||
bounded-docs-change: builder
|
||||
docs-inventory: wolf_bulk
|
||||
log-summarization: wolf_bulk
|
||||
queue-hygiene: wolf_bulk
|
||||
repetitive-small-diff: wolf_bulk
|
||||
research-sweep: wolf_bulk
|
||||
|
||||
promotion_rules:
|
||||
- If a wolf/bulk task touches a sensitive control surface, promote it to judgment.
|
||||
- If a builder task expands beyond 5 files, architecture review, or multi-repo coordination, promote it to judgment.
|
||||
- If a terminal lane cannot produce a usable artifact, the portfolio is invalid and must be redesigned before wiring.
|
||||
|
||||
agents:
|
||||
triage-coordinator:
|
||||
role_class: judgment
|
||||
critical: true
|
||||
current_playbooks:
|
||||
- playbooks/issue-triager.yaml
|
||||
portfolio:
|
||||
primary:
|
||||
provider: anthropic
|
||||
model: claude-opus-4-6
|
||||
lane: full-judgment
|
||||
fallback1:
|
||||
provider: openai-codex
|
||||
model: codex
|
||||
lane: high-judgment
|
||||
fallback2:
|
||||
provider: gemini
|
||||
model: gemini-2.5-pro
|
||||
lane: bounded-judgment
|
||||
terminal:
|
||||
provider: ollama
|
||||
model: hermes3:latest
|
||||
lane: report-and-route
|
||||
local_capable: true
|
||||
usable_output:
|
||||
- backlog classification
|
||||
- routing draft
|
||||
- risk summary
|
||||
|
||||
pr-reviewer:
|
||||
role_class: judgment
|
||||
critical: true
|
||||
current_playbooks:
|
||||
- playbooks/pr-reviewer.yaml
|
||||
portfolio:
|
||||
primary:
|
||||
provider: anthropic
|
||||
model: claude-opus-4-6
|
||||
lane: full-review
|
||||
fallback1:
|
||||
provider: gemini
|
||||
model: gemini-2.5-pro
|
||||
lane: high-review
|
||||
fallback2:
|
||||
provider: grok
|
||||
model: grok-3-mini-fast
|
||||
lane: comment-only-review
|
||||
terminal:
|
||||
provider: openrouter
|
||||
model: openai/gpt-4.1-mini
|
||||
lane: low-stakes-diff-summary
|
||||
local_capable: false
|
||||
usable_output:
|
||||
- diff risk summary
|
||||
- explicit uncertainty notes
|
||||
- merge-block recommendation
|
||||
|
||||
builder-main:
|
||||
role_class: builder
|
||||
critical: true
|
||||
current_playbooks:
|
||||
- playbooks/bug-fixer.yaml
|
||||
- playbooks/test-writer.yaml
|
||||
- playbooks/refactor-specialist.yaml
|
||||
portfolio:
|
||||
primary:
|
||||
provider: openai-codex
|
||||
model: codex
|
||||
lane: full-builder
|
||||
fallback1:
|
||||
provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
lane: bounded-builder
|
||||
fallback2:
|
||||
provider: groq
|
||||
model: llama-3.3-70b-versatile
|
||||
lane: small-patch-builder
|
||||
terminal:
|
||||
provider: custom_provider
|
||||
provider_name: Local llama.cpp
|
||||
model: hermes4:14b
|
||||
lane: narrow-patch
|
||||
local_capable: true
|
||||
usable_output:
|
||||
- small patch
|
||||
- reproducer test
|
||||
- docs repair
|
||||
|
||||
wolf-sweeper:
|
||||
role_class: wolf_bulk
|
||||
critical: true
|
||||
current_world_state:
|
||||
- docs/automation-inventory.md
|
||||
portfolio:
|
||||
primary:
|
||||
provider: gemini
|
||||
model: gemini-2.5-flash
|
||||
lane: fast-bulk
|
||||
fallback1:
|
||||
provider: groq
|
||||
model: llama-3.3-70b-versatile
|
||||
lane: fast-bulk-backup
|
||||
fallback2:
|
||||
provider: openrouter
|
||||
model: openai/gpt-4.1-mini
|
||||
lane: bounded-bulk-summary
|
||||
terminal:
|
||||
provider: ollama
|
||||
model: hermes3:latest
|
||||
lane: gather-and-summarize
|
||||
local_capable: true
|
||||
usable_output:
|
||||
- inventory refresh
|
||||
- evidence bundle
|
||||
- summary comment
|
||||
|
||||
cross_checks:
|
||||
unique_primary_fallback1_pairs:
|
||||
triage-coordinator:
|
||||
- anthropic/claude-opus-4-6
|
||||
- openai-codex/codex
|
||||
pr-reviewer:
|
||||
- anthropic/claude-opus-4-6
|
||||
- gemini/gemini-2.5-pro
|
||||
builder-main:
|
||||
- openai-codex/codex
|
||||
- kimi-coding/kimi-k2.5
|
||||
wolf-sweeper:
|
||||
- gemini/gemini-2.5-flash
|
||||
- groq/llama-3.3-70b-versatile
|
||||
Reference in New Issue
Block a user