Files

Timmy Time ff7e22dcc8 [RESILIENCE] Define per-agent fallback portfolios and routing doctrine (#170 )

2026-04-04 21:40:36 +00:00

9.3 KiB

Raw Permalink Blame History

Per-Agent Fallback Portfolios and Task-Class Routing

Status: proposed doctrine for issue #155 Scope: policy and sidecar structure only; no runtime wiring in tasks.py or live loops yet

Why this exists

Timmy already has multiple model paths declared in config.yaml, multiple task surfaces in playbooks/, and multiple live automation lanes documented in docs/automation-inventory.md.

What is missing is a declared resilience doctrine for how specific agents degrade when a provider, quota, or model family fails. Without that doctrine, the whole fleet tends to collapse onto the same fallback chain, which means one outage turns into synchronized fleet degradation.

This spec makes the fallback graph explicit before runtime wiring lands.

Timmy ownership boundary

timmy-config owns:

routing doctrine for Timmy-side task classes
sidecar-readable fallback portfolio declarations
capability floors and degraded-mode authority restrictions
the mapping between current playbooks and future resilient agent lanes

timmy-config does not own:

live queue state or issue truth outside Gitea
launchd state, loop resurrection, or stale runtime reuse
ad hoc worktree history or hidden queue mutation

That split matters. This repo should declare how routing is supposed to work. Runtime surfaces should consume that declaration instead of inventing their own fallback orderings.

Non-goals

This issue does not:

fully wire portfolio selection into tasks.py, launch agents, or live loops
bless human-token or operator-token fallbacks as part of an automated chain
allow degraded agents to keep full authority just because they are still producing output

Role classes

1. Judgment

Use for work where the main risk is a bad decision, not a missing patch.

Current Timmy surfaces:

playbooks/issue-triager.yaml
playbooks/pr-reviewer.yaml
playbooks/verified-logic.yaml

Typical task classes:

issue triage
queue routing
PR review
proof / consistency checks
governance-sensitive review

Judgment lanes may read broadly, but they lose authority earlier than builder lanes when degraded.

2. Builder

Use for work where the main risk is producing or verifying a change.

Current Timmy surfaces:

playbooks/bug-fixer.yaml
playbooks/test-writer.yaml
playbooks/refactor-specialist.yaml

Typical task classes:

bug fixes
test writing
bounded refactors
narrow docs or code repairs with verification

Builder lanes keep patch-producing usefulness longer than judgment lanes, but they must lose control-plane authority as they degrade.

3. Wolf / bulk

Use for repetitive, high-volume, bounded, reversible work.

Current Timmy world-state:

bulk and sweep behavior is still represented more by live ops reality in docs/automation-inventory.md than by a dedicated sidecar playbook
this class covers the work shape currently associated with queue hygiene, inventory refresh, docs sweeps, log summarization, and repetitive small-diff passes

Typical task classes:

docs inventory refresh
log summarization
queue hygiene
repetitive small diffs
research or extraction sweeps

Wolf / bulk lanes are throughput-first and deliberately lower-authority.

Routing policy

If the task touches a sensitive control surface, route to judgment first even if the edit is small.
If the task is primarily about merge authority, routing authority, proof, or governance, route to judgment.
If the task is primarily about producing a patch with local verification, route to builder.
If the task is repetitive, bounded, reversible, and low-authority, route to wolf / bulk.
If a wolf / bulk task expands beyond its size or authority envelope, promote it upward; do not let it keep grinding forward through scope creep.
If a builder task becomes architecture, multi-repo coordination, or control-plane review, promote it to judgment.
If a lane reaches terminal fallback, it must still land in a usable degraded mode. Dead silence is not an acceptable terminal state.

Sensitive control surfaces

These paths stay judgment-routed unless explicitly reviewed otherwise:

SOUL.md
config.yaml
deploy.sh
tasks.py
playbooks/
cron/
memories/
skins/
training/

This mirrors the current PR-review doctrine and keeps degraded builder or bulk lanes away from Timmy's control plane.

Portfolio design rules

The sidecar portfolio declaration in fallback-portfolios.yaml follows these rules:

Every critical agent gets four slots:
- primary
- fallback1
- fallback2
- terminal fallback
No two critical agents may share the same primary + fallback1 pair.
Provider families should be anti-correlated across critical lanes whenever practical.
Terminal fallbacks must end in a usable degraded lane, not a null lane.
At least one critical lane must end on a local-capable path.
No human-token fallback patterns are allowed in automated chains.
Degraded mode reduces authority before it removes usefulness.
A terminal lane that cannot safely produce an artifact is not a valid terminal lane.

Explicit ban: synchronized fleet degradation

Synchronized fleet degradation is forbidden.

That means:

do not point every critical agent at the same fallback stack
do not let all judgment agents converge on the same first backup if avoidable
do not let all builder agents collapse onto the same weak terminal lane
do not treat "everyone fell back to the cheapest thing" as resilience

A resilient fleet degrades unevenly on purpose. Some lanes should stay sharp while others become slower or narrower.

Capability floors and degraded authority

Shared slot semantics

primary: full role-class authority
fallback1: full task authority for normal work, but no silent broadening of scope
fallback2: bounded and reversible work only; no irreversible control-plane action
terminal: usable degraded lane only; must produce a machine-usable artifact but must not impersonate full authority

Judgment floors

Judgment agents lose authority earliest.

At fallback2 and below, judgment lanes must not:

merge PRs
close or rewrite governing issues or PRs
mutate sensitive control surfaces
bulk-reassign the fleet
silently change routing policy

Their degraded usefulness is still real:

classify backlog
produce draft routing plans
summarize risk
leave bounded labels or comments with explicit evidence

Builder floors

Builder agents may continue doing useful narrow work deeper into degradation, but only inside a tighter box.

At fallback2, builder lanes must be limited to:

single-issue work
reversible patches
narrow docs or test scaffolds
bounded file counts and small diff sizes

At terminal, builder lanes must not:

touch sensitive control surfaces
merge or release
do multi-repo or architecture work
claim verification they did not run

Their terminal usefulness may still include:

a small patch
a reproducer test
a docs fix
a draft branch or artifact for later review

Wolf / bulk floors

Wolf / bulk lanes stay useful as summarizers and sweepers, not as governors.

At fallback2 and terminal, wolf / bulk lanes must not:

fan out branch creation across repos
mass-assign agents
edit sensitive control surfaces
perform irreversible queue mutation

Their degraded usefulness may still include:

gathering evidence
refreshing inventories
summarizing logs
proposing labels or routes
producing repetitive, low-risk artifacts inside explicit caps

Usable terminal lanes

A terminal fallback is only valid if it still does at least one of these safely:

classify and summarize a backlog
produce a bounded patch or test artifact
summarize a diff with explicit uncertainty
refresh an inventory or evidence bundle

If the terminal lane can only say "model unavailable" and stop, the portfolio is incomplete.

Current sidecar reference lanes

fallback-portfolios.yaml defines the initial implementation-ready structure for four named lanes:

triage-coordinator — judgment
pr-reviewer — judgment
builder-main — builder
wolf-sweeper — wolf / bulk

These are the canonical resilience lanes for the current Timmy world-state.

Current playbooks should eventually map onto them like this:

playbooks/issue-triager.yaml -> triage-coordinator
playbooks/pr-reviewer.yaml -> pr-reviewer
playbooks/verified-logic.yaml -> judgment lane family, pending a dedicated proof profile if needed
playbooks/bug-fixer.yaml, playbooks/test-writer.yaml, and playbooks/refactor-specialist.yaml -> builder-main
future sidecar bulk playbooks should inherit from wolf-sweeper instead of inventing independent fallback chains

Until runtime wiring lands, unmapped playbooks should be treated as policy-incomplete rather than inheriting an implicit fallback chain.

Wiring contract for later implementation

When this is wired into runtime selection, the selector should:

classify the incoming task into a role class
check whether the task touches a sensitive control surface
choose the named agent lane for that class
step through the declared portfolio slots in order
enforce the capability floor of the active slot before taking action
record when a fallback transition happened and what authority was still allowed

The important part is not just choosing a different model. It is choosing a different authority envelope as the lane degrades.

9.3 KiB Raw Permalink Blame History

Per-Agent Fallback Portfolios and Task-Class Routing

Why this exists

Timmy ownership boundary

Non-goals

Role classes

1. Judgment

2. Builder

3. Wolf / bulk

Routing policy

Sensitive control surfaces

Portfolio design rules

Explicit ban: synchronized fleet degradation

Capability floors and degraded authority

Shared slot semantics

Judgment floors

Builder floors

Wolf / bulk floors

Usable terminal lanes

Current sidecar reference lanes

Wiring contract for later implementation

9.3 KiB

Raw Permalink Blame History