forked from Rockachopa/Timmy-time-dashboard
feat: adapt token rewards based on system stress signals (#714)
Implements adaptive token rewards that respond to system stress:
- StressDetector module (timmy/stress_detector.py):
- Monitors 4 stress signals: flaky test rate, P1 backlog growth,
CI failure rate, open bug count
- Calculates weighted stress score (0-1) and determines mode:
calm (<0.3), elevated (0.3-0.6), high (>0.6)
- Applies quest-specific multipliers based on current mode
- Configuration (config/stress_modes.yaml):
- Thresholds for mode transitions
- Signal weights and thresholds
- Multipliers per mode (e.g., test_improve: 1.5x in high stress)
- Quest system integration:
- Rewards now include stress bonus/penalty in notification
- Quest status API includes adjusted_reward and multiplier
- Agent can see current stress mode and why rewards changed
- API endpoints:
- GET /quests/api/stress - current stress mode and signals
- POST /quests/api/stress/refresh - force refresh stress detection
Fixes #714
This commit is contained in:
98
config/stress_modes.yaml
Normal file
98
config/stress_modes.yaml
Normal file
@@ -0,0 +1,98 @@
|
||||
# ── System Stress Modes Configuration ────────────────────────────────────────
|
||||
#
|
||||
# This configuration defines how token rewards adapt based on system stress.
|
||||
# When the system detects elevated stress (flaky tests, growing backlog,
|
||||
# CI failures), quest rewards are adjusted to incentivize agents to focus
|
||||
# on the most critical areas.
|
||||
#
|
||||
# ── How It Works ─────────────────────────────────────────────────────────────
|
||||
#
|
||||
# 1. SIGNALS: System metrics are monitored continuously
|
||||
# 2. SCORE: Weighted contributions from triggered signals create a stress score
|
||||
# 3. MODE: Score determines the stress mode (calm, elevated, high)
|
||||
# 4. MULTIPLIERS: Token rewards are multiplied based on the current mode
|
||||
#
|
||||
# ── Stress Thresholds ────────────────────────────────────────────────────────
|
||||
|
||||
thresholds:
|
||||
# Minimum score to enter elevated mode (0.0 - 1.0)
|
||||
elevated_min: 0.3
|
||||
|
||||
# Minimum score to enter high stress mode (0.0 - 1.0)
|
||||
high_min: 0.6
|
||||
|
||||
# ── Stress Signals ───────────────────────────────────────────────────────────
|
||||
#
|
||||
# Each signal has:
|
||||
# - threshold: Value at which signal is considered "triggered"
|
||||
# - weight: Contribution to overall stress score (should sum to ~1.0)
|
||||
|
||||
signals:
|
||||
flaky_test_rate:
|
||||
threshold: 0.15 # 15% of tests showing flakiness
|
||||
weight: 0.30
|
||||
description: "Percentage of test runs that are flaky"
|
||||
|
||||
p1_backlog_growth:
|
||||
threshold: 5 # 5 new P1 issues in lookback period
|
||||
weight: 0.25
|
||||
description: "Net growth in P1 priority issues over 7 days"
|
||||
|
||||
ci_failure_rate:
|
||||
threshold: 0.20 # 20% of CI runs failing
|
||||
weight: 0.25
|
||||
description: "Percentage of CI runs failing in lookback period"
|
||||
|
||||
open_bug_count:
|
||||
threshold: 20 # 20 open bugs
|
||||
weight: 0.20
|
||||
description: "Total open issues labeled as 'bug'"
|
||||
|
||||
# ── Token Multipliers ────────────────────────────────────────────────────────
|
||||
#
|
||||
# Multipliers are applied to quest rewards based on current stress mode.
|
||||
# Values > 1.0 increase rewards, < 1.0 decrease rewards.
|
||||
#
|
||||
# Quest types:
|
||||
# - test_improve: Test coverage/quality improvements
|
||||
# - docs_update: Documentation updates
|
||||
# - issue_count: Closing specific issue types
|
||||
# - issue_reduce: Reducing overall issue backlog
|
||||
# - daily_run: Daily Run session completion
|
||||
# - custom: Special/manual quests
|
||||
# - exploration: Exploratory work
|
||||
# - refactor: Code refactoring
|
||||
|
||||
multipliers:
|
||||
calm:
|
||||
# Calm periods: incentivize maintenance and exploration
|
||||
test_improve: 1.0
|
||||
docs_update: 1.2
|
||||
issue_count: 1.0
|
||||
issue_reduce: 1.0
|
||||
daily_run: 1.0
|
||||
custom: 1.0
|
||||
exploration: 1.3
|
||||
refactor: 1.2
|
||||
|
||||
elevated:
|
||||
# Elevated stress: start emphasizing stability
|
||||
test_improve: 1.2
|
||||
docs_update: 1.0
|
||||
issue_count: 1.1
|
||||
issue_reduce: 1.1
|
||||
daily_run: 1.0
|
||||
custom: 1.0
|
||||
exploration: 1.0
|
||||
refactor: 0.9 # Discourage risky changes
|
||||
|
||||
high:
|
||||
# High stress: crisis mode, focus on stabilization
|
||||
test_improve: 1.5 # Strongly incentivize testing
|
||||
docs_update: 0.8 # Deprioritize docs
|
||||
issue_count: 1.3 # Reward closing issues
|
||||
issue_reduce: 1.4 # Strongly reward reducing backlog
|
||||
daily_run: 1.1
|
||||
custom: 1.0
|
||||
exploration: 0.7 # Discourage exploration
|
||||
refactor: 0.6 # Discourage refactors during crisis
|
||||
Reference in New Issue
Block a user