feat: adapt token rewards based on system stress signals (#714)

Implements adaptive token rewards that respond to system stress: - StressDetector module (timmy/stress_detector.py): - Monitors 4 stress signals: flaky test rate, P1 backlog growth, CI failure rate, open bug count - Calculates weighted stress score (0-1) and determines mode: calm (<0.3), elevated (0.3-0.6), high (>0.6) - Applies quest-specific multipliers based on current mode - Configuration (config/stress_modes.yaml): - Thresholds for mode transitions - Signal weights and thresholds - Multipliers per mode (e.g., test_improve: 1.5x in high stress) - Quest system integration: - Rewards now include stress bonus/penalty in notification - Quest status API includes adjusted_reward and multiplier - Agent can see current stress mode and why rewards changed - API endpoints: - GET /quests/api/stress - current stress mode and signals - POST /quests/api/stress/refresh - force refresh stress detection Fixes #714
2026-03-21 17:26:40 -04:00
parent a95cf806c8
commit 919a011cae
5 changed files with 1081 additions and 3 deletions
--- a/config/stress_modes.yaml
+++ b/config/stress_modes.yaml
@@ -0,0 +1,98 @@
+# ── System Stress Modes Configuration ────────────────────────────────────────
+#
+# This configuration defines how token rewards adapt based on system stress.
+# When the system detects elevated stress (flaky tests, growing backlog,
+# CI failures), quest rewards are adjusted to incentivize agents to focus
+# on the most critical areas.
+#
+# ── How It Works ─────────────────────────────────────────────────────────────
+#
+# 1. SIGNALS: System metrics are monitored continuously
+# 2. SCORE: Weighted contributions from triggered signals create a stress score
+# 3. MODE: Score determines the stress mode (calm, elevated, high)
+# 4. MULTIPLIERS: Token rewards are multiplied based on the current mode
+#
+# ── Stress Thresholds ────────────────────────────────────────────────────────
+
+thresholds:
+  # Minimum score to enter elevated mode (0.0 - 1.0)
+  elevated_min: 0.3
+  
+  # Minimum score to enter high stress mode (0.0 - 1.0)
+  high_min: 0.6
+
+# ── Stress Signals ───────────────────────────────────────────────────────────
+#
+# Each signal has:
+#   - threshold: Value at which signal is considered "triggered"
+#   - weight: Contribution to overall stress score (should sum to ~1.0)
+
+signals:
+  flaky_test_rate:
+    threshold: 0.15  # 15% of tests showing flakiness
+    weight: 0.30
+    description: "Percentage of test runs that are flaky"
+    
+  p1_backlog_growth:
+    threshold: 5  # 5 new P1 issues in lookback period
+    weight: 0.25
+    description: "Net growth in P1 priority issues over 7 days"
+    
+  ci_failure_rate:
+    threshold: 0.20  # 20% of CI runs failing
+    weight: 0.25
+    description: "Percentage of CI runs failing in lookback period"
+    
+  open_bug_count:
+    threshold: 20  # 20 open bugs
+    weight: 0.20
+    description: "Total open issues labeled as 'bug'"
+
+# ── Token Multipliers ────────────────────────────────────────────────────────
+#
+# Multipliers are applied to quest rewards based on current stress mode.
+# Values > 1.0 increase rewards, < 1.0 decrease rewards.
+#
+# Quest types:
+#   - test_improve: Test coverage/quality improvements
+#   - docs_update: Documentation updates
+#   - issue_count: Closing specific issue types
+#   - issue_reduce: Reducing overall issue backlog
+#   - daily_run: Daily Run session completion
+#   - custom: Special/manual quests
+#   - exploration: Exploratory work
+#   - refactor: Code refactoring
+
+multipliers:
+  calm:
+    # Calm periods: incentivize maintenance and exploration
+    test_improve: 1.0
+    docs_update: 1.2
+    issue_count: 1.0
+    issue_reduce: 1.0
+    daily_run: 1.0
+    custom: 1.0
+    exploration: 1.3
+    refactor: 1.2
+    
+  elevated:
+    # Elevated stress: start emphasizing stability
+    test_improve: 1.2
+    docs_update: 1.0
+    issue_count: 1.1
+    issue_reduce: 1.1
+    daily_run: 1.0
+    custom: 1.0
+    exploration: 1.0
+    refactor: 0.9  # Discourage risky changes
+    
+  high:
+    # High stress: crisis mode, focus on stabilization
+    test_improve: 1.5  # Strongly incentivize testing
+    docs_update: 0.8  # Deprioritize docs
+    issue_count: 1.3  # Reward closing issues
+    issue_reduce: 1.4  # Strongly reward reducing backlog
+    daily_run: 1.1
+    custom: 1.0
+    exploration: 0.7  # Discourage exploration
+    refactor: 0.6  # Discourage refactors during crisis