Files
timmy-config/docs/fallback-portfolios.md

9.3 KiB

Per-Agent Fallback Portfolios and Task-Class Routing

Status: proposed doctrine for issue #155 Scope: policy and sidecar structure only; no runtime wiring in tasks.py or live loops yet

Why this exists

Timmy already has multiple model paths declared in config.yaml, multiple task surfaces in playbooks/, and multiple live automation lanes documented in docs/automation-inventory.md.

What is missing is a declared resilience doctrine for how specific agents degrade when a provider, quota, or model family fails. Without that doctrine, the whole fleet tends to collapse onto the same fallback chain, which means one outage turns into synchronized fleet degradation.

This spec makes the fallback graph explicit before runtime wiring lands.

Timmy ownership boundary

timmy-config owns:

  • routing doctrine for Timmy-side task classes
  • sidecar-readable fallback portfolio declarations
  • capability floors and degraded-mode authority restrictions
  • the mapping between current playbooks and future resilient agent lanes

timmy-config does not own:

  • live queue state or issue truth outside Gitea
  • launchd state, loop resurrection, or stale runtime reuse
  • ad hoc worktree history or hidden queue mutation

That split matters. This repo should declare how routing is supposed to work. Runtime surfaces should consume that declaration instead of inventing their own fallback orderings.

Non-goals

This issue does not:

  • fully wire portfolio selection into tasks.py, launch agents, or live loops
  • bless human-token or operator-token fallbacks as part of an automated chain
  • allow degraded agents to keep full authority just because they are still producing output

Role classes

1. Judgment

Use for work where the main risk is a bad decision, not a missing patch.

Current Timmy surfaces:

  • playbooks/issue-triager.yaml
  • playbooks/pr-reviewer.yaml
  • playbooks/verified-logic.yaml

Typical task classes:

  • issue triage
  • queue routing
  • PR review
  • proof / consistency checks
  • governance-sensitive review

Judgment lanes may read broadly, but they lose authority earlier than builder lanes when degraded.

2. Builder

Use for work where the main risk is producing or verifying a change.

Current Timmy surfaces:

  • playbooks/bug-fixer.yaml
  • playbooks/test-writer.yaml
  • playbooks/refactor-specialist.yaml

Typical task classes:

  • bug fixes
  • test writing
  • bounded refactors
  • narrow docs or code repairs with verification

Builder lanes keep patch-producing usefulness longer than judgment lanes, but they must lose control-plane authority as they degrade.

3. Wolf / bulk

Use for repetitive, high-volume, bounded, reversible work.

Current Timmy world-state:

  • bulk and sweep behavior is still represented more by live ops reality in docs/automation-inventory.md than by a dedicated sidecar playbook
  • this class covers the work shape currently associated with queue hygiene, inventory refresh, docs sweeps, log summarization, and repetitive small-diff passes

Typical task classes:

  • docs inventory refresh
  • log summarization
  • queue hygiene
  • repetitive small diffs
  • research or extraction sweeps

Wolf / bulk lanes are throughput-first and deliberately lower-authority.

Routing policy

  1. If the task touches a sensitive control surface, route to judgment first even if the edit is small.
  2. If the task is primarily about merge authority, routing authority, proof, or governance, route to judgment.
  3. If the task is primarily about producing a patch with local verification, route to builder.
  4. If the task is repetitive, bounded, reversible, and low-authority, route to wolf / bulk.
  5. If a wolf / bulk task expands beyond its size or authority envelope, promote it upward; do not let it keep grinding forward through scope creep.
  6. If a builder task becomes architecture, multi-repo coordination, or control-plane review, promote it to judgment.
  7. If a lane reaches terminal fallback, it must still land in a usable degraded mode. Dead silence is not an acceptable terminal state.

Sensitive control surfaces

These paths stay judgment-routed unless explicitly reviewed otherwise:

  • SOUL.md
  • config.yaml
  • deploy.sh
  • tasks.py
  • playbooks/
  • cron/
  • memories/
  • skins/
  • training/

This mirrors the current PR-review doctrine and keeps degraded builder or bulk lanes away from Timmy's control plane.

Portfolio design rules

The sidecar portfolio declaration in fallback-portfolios.yaml follows these rules:

  1. Every critical agent gets four slots:
    • primary
    • fallback1
    • fallback2
    • terminal fallback
  2. No two critical agents may share the same primary + fallback1 pair.
  3. Provider families should be anti-correlated across critical lanes whenever practical.
  4. Terminal fallbacks must end in a usable degraded lane, not a null lane.
  5. At least one critical lane must end on a local-capable path.
  6. No human-token fallback patterns are allowed in automated chains.
  7. Degraded mode reduces authority before it removes usefulness.
  8. A terminal lane that cannot safely produce an artifact is not a valid terminal lane.

Explicit ban: synchronized fleet degradation

Synchronized fleet degradation is forbidden.

That means:

  • do not point every critical agent at the same fallback stack
  • do not let all judgment agents converge on the same first backup if avoidable
  • do not let all builder agents collapse onto the same weak terminal lane
  • do not treat "everyone fell back to the cheapest thing" as resilience

A resilient fleet degrades unevenly on purpose. Some lanes should stay sharp while others become slower or narrower.

Capability floors and degraded authority

Shared slot semantics

  • primary: full role-class authority
  • fallback1: full task authority for normal work, but no silent broadening of scope
  • fallback2: bounded and reversible work only; no irreversible control-plane action
  • terminal: usable degraded lane only; must produce a machine-usable artifact but must not impersonate full authority

Judgment floors

Judgment agents lose authority earliest.

At fallback2 and below, judgment lanes must not:

  • merge PRs
  • close or rewrite governing issues or PRs
  • mutate sensitive control surfaces
  • bulk-reassign the fleet
  • silently change routing policy

Their degraded usefulness is still real:

  • classify backlog
  • produce draft routing plans
  • summarize risk
  • leave bounded labels or comments with explicit evidence

Builder floors

Builder agents may continue doing useful narrow work deeper into degradation, but only inside a tighter box.

At fallback2, builder lanes must be limited to:

  • single-issue work
  • reversible patches
  • narrow docs or test scaffolds
  • bounded file counts and small diff sizes

At terminal, builder lanes must not:

  • touch sensitive control surfaces
  • merge or release
  • do multi-repo or architecture work
  • claim verification they did not run

Their terminal usefulness may still include:

  • a small patch
  • a reproducer test
  • a docs fix
  • a draft branch or artifact for later review

Wolf / bulk floors

Wolf / bulk lanes stay useful as summarizers and sweepers, not as governors.

At fallback2 and terminal, wolf / bulk lanes must not:

  • fan out branch creation across repos
  • mass-assign agents
  • edit sensitive control surfaces
  • perform irreversible queue mutation

Their degraded usefulness may still include:

  • gathering evidence
  • refreshing inventories
  • summarizing logs
  • proposing labels or routes
  • producing repetitive, low-risk artifacts inside explicit caps

Usable terminal lanes

A terminal fallback is only valid if it still does at least one of these safely:

  • classify and summarize a backlog
  • produce a bounded patch or test artifact
  • summarize a diff with explicit uncertainty
  • refresh an inventory or evidence bundle

If the terminal lane can only say "model unavailable" and stop, the portfolio is incomplete.

Current sidecar reference lanes

fallback-portfolios.yaml defines the initial implementation-ready structure for four named lanes:

  • triage-coordinator — judgment
  • pr-reviewer — judgment
  • builder-main — builder
  • wolf-sweeper — wolf / bulk

These are the canonical resilience lanes for the current Timmy world-state.

Current playbooks should eventually map onto them like this:

  • playbooks/issue-triager.yaml -> triage-coordinator
  • playbooks/pr-reviewer.yaml -> pr-reviewer
  • playbooks/verified-logic.yaml -> judgment lane family, pending a dedicated proof profile if needed
  • playbooks/bug-fixer.yaml, playbooks/test-writer.yaml, and playbooks/refactor-specialist.yaml -> builder-main
  • future sidecar bulk playbooks should inherit from wolf-sweeper instead of inventing independent fallback chains

Until runtime wiring lands, unmapped playbooks should be treated as policy-incomplete rather than inheriting an implicit fallback chain.

Wiring contract for later implementation

When this is wired into runtime selection, the selector should:

  • classify the incoming task into a role class
  • check whether the task touches a sensitive control surface
  • choose the named agent lane for that class
  • step through the declared portfolio slots in order
  • enforce the capability floor of the active slot before taking action
  • record when a fallback transition happened and what authority was still allowed

The important part is not just choosing a different model. It is choosing a different authority envelope as the lane degrades.