Files
Timmy-time-dashboard/config/stress_modes.yaml
kimi 919a011cae feat: adapt token rewards based on system stress signals (#714)
Implements adaptive token rewards that respond to system stress:

- StressDetector module (timmy/stress_detector.py):
  - Monitors 4 stress signals: flaky test rate, P1 backlog growth,
    CI failure rate, open bug count
  - Calculates weighted stress score (0-1) and determines mode:
    calm (<0.3), elevated (0.3-0.6), high (>0.6)
  - Applies quest-specific multipliers based on current mode

- Configuration (config/stress_modes.yaml):
  - Thresholds for mode transitions
  - Signal weights and thresholds
  - Multipliers per mode (e.g., test_improve: 1.5x in high stress)

- Quest system integration:
  - Rewards now include stress bonus/penalty in notification
  - Quest status API includes adjusted_reward and multiplier
  - Agent can see current stress mode and why rewards changed

- API endpoints:
  - GET /quests/api/stress - current stress mode and signals
  - POST /quests/api/stress/refresh - force refresh stress detection

Fixes #714
2026-03-21 17:26:40 -04:00

99 lines
3.7 KiB
YAML

# ── System Stress Modes Configuration ────────────────────────────────────────
#
# This configuration defines how token rewards adapt based on system stress.
# When the system detects elevated stress (flaky tests, growing backlog,
# CI failures), quest rewards are adjusted to incentivize agents to focus
# on the most critical areas.
#
# ── How It Works ─────────────────────────────────────────────────────────────
#
# 1. SIGNALS: System metrics are monitored continuously
# 2. SCORE: Weighted contributions from triggered signals create a stress score
# 3. MODE: Score determines the stress mode (calm, elevated, high)
# 4. MULTIPLIERS: Token rewards are multiplied based on the current mode
#
# ── Stress Thresholds ────────────────────────────────────────────────────────
thresholds:
# Minimum score to enter elevated mode (0.0 - 1.0)
elevated_min: 0.3
# Minimum score to enter high stress mode (0.0 - 1.0)
high_min: 0.6
# ── Stress Signals ───────────────────────────────────────────────────────────
#
# Each signal has:
# - threshold: Value at which signal is considered "triggered"
# - weight: Contribution to overall stress score (should sum to ~1.0)
signals:
flaky_test_rate:
threshold: 0.15 # 15% of tests showing flakiness
weight: 0.30
description: "Percentage of test runs that are flaky"
p1_backlog_growth:
threshold: 5 # 5 new P1 issues in lookback period
weight: 0.25
description: "Net growth in P1 priority issues over 7 days"
ci_failure_rate:
threshold: 0.20 # 20% of CI runs failing
weight: 0.25
description: "Percentage of CI runs failing in lookback period"
open_bug_count:
threshold: 20 # 20 open bugs
weight: 0.20
description: "Total open issues labeled as 'bug'"
# ── Token Multipliers ────────────────────────────────────────────────────────
#
# Multipliers are applied to quest rewards based on current stress mode.
# Values > 1.0 increase rewards, < 1.0 decrease rewards.
#
# Quest types:
# - test_improve: Test coverage/quality improvements
# - docs_update: Documentation updates
# - issue_count: Closing specific issue types
# - issue_reduce: Reducing overall issue backlog
# - daily_run: Daily Run session completion
# - custom: Special/manual quests
# - exploration: Exploratory work
# - refactor: Code refactoring
multipliers:
calm:
# Calm periods: incentivize maintenance and exploration
test_improve: 1.0
docs_update: 1.2
issue_count: 1.0
issue_reduce: 1.0
daily_run: 1.0
custom: 1.0
exploration: 1.3
refactor: 1.2
elevated:
# Elevated stress: start emphasizing stability
test_improve: 1.2
docs_update: 1.0
issue_count: 1.1
issue_reduce: 1.1
daily_run: 1.0
custom: 1.0
exploration: 1.0
refactor: 0.9 # Discourage risky changes
high:
# High stress: crisis mode, focus on stabilization
test_improve: 1.5 # Strongly incentivize testing
docs_update: 0.8 # Deprioritize docs
issue_count: 1.3 # Reward closing issues
issue_reduce: 1.4 # Strongly reward reducing backlog
daily_run: 1.1
custom: 1.0
exploration: 0.7 # Discourage exploration
refactor: 0.6 # Discourage refactors during crisis