Implements adaptive token rewards that respond to system stress:
- StressDetector module (timmy/stress_detector.py):
- Monitors 4 stress signals: flaky test rate, P1 backlog growth,
CI failure rate, open bug count
- Calculates weighted stress score (0-1) and determines mode:
calm (<0.3), elevated (0.3-0.6), high (>0.6)
- Applies quest-specific multipliers based on current mode
- Configuration (config/stress_modes.yaml):
- Thresholds for mode transitions
- Signal weights and thresholds
- Multipliers per mode (e.g., test_improve: 1.5x in high stress)
- Quest system integration:
- Rewards now include stress bonus/penalty in notification
- Quest status API includes adjusted_reward and multiplier
- Agent can see current stress mode and why rewards changed
- API endpoints:
- GET /quests/api/stress - current stress mode and signals
- POST /quests/api/stress/refresh - force refresh stress detection
Fixes #714
99 lines
3.7 KiB
YAML
99 lines
3.7 KiB
YAML
# ── System Stress Modes Configuration ────────────────────────────────────────
|
|
#
|
|
# This configuration defines how token rewards adapt based on system stress.
|
|
# When the system detects elevated stress (flaky tests, growing backlog,
|
|
# CI failures), quest rewards are adjusted to incentivize agents to focus
|
|
# on the most critical areas.
|
|
#
|
|
# ── How It Works ─────────────────────────────────────────────────────────────
|
|
#
|
|
# 1. SIGNALS: System metrics are monitored continuously
|
|
# 2. SCORE: Weighted contributions from triggered signals create a stress score
|
|
# 3. MODE: Score determines the stress mode (calm, elevated, high)
|
|
# 4. MULTIPLIERS: Token rewards are multiplied based on the current mode
|
|
#
|
|
# ── Stress Thresholds ────────────────────────────────────────────────────────
|
|
|
|
thresholds:
|
|
# Minimum score to enter elevated mode (0.0 - 1.0)
|
|
elevated_min: 0.3
|
|
|
|
# Minimum score to enter high stress mode (0.0 - 1.0)
|
|
high_min: 0.6
|
|
|
|
# ── Stress Signals ───────────────────────────────────────────────────────────
|
|
#
|
|
# Each signal has:
|
|
# - threshold: Value at which signal is considered "triggered"
|
|
# - weight: Contribution to overall stress score (should sum to ~1.0)
|
|
|
|
signals:
|
|
flaky_test_rate:
|
|
threshold: 0.15 # 15% of tests showing flakiness
|
|
weight: 0.30
|
|
description: "Percentage of test runs that are flaky"
|
|
|
|
p1_backlog_growth:
|
|
threshold: 5 # 5 new P1 issues in lookback period
|
|
weight: 0.25
|
|
description: "Net growth in P1 priority issues over 7 days"
|
|
|
|
ci_failure_rate:
|
|
threshold: 0.20 # 20% of CI runs failing
|
|
weight: 0.25
|
|
description: "Percentage of CI runs failing in lookback period"
|
|
|
|
open_bug_count:
|
|
threshold: 20 # 20 open bugs
|
|
weight: 0.20
|
|
description: "Total open issues labeled as 'bug'"
|
|
|
|
# ── Token Multipliers ────────────────────────────────────────────────────────
|
|
#
|
|
# Multipliers are applied to quest rewards based on current stress mode.
|
|
# Values > 1.0 increase rewards, < 1.0 decrease rewards.
|
|
#
|
|
# Quest types:
|
|
# - test_improve: Test coverage/quality improvements
|
|
# - docs_update: Documentation updates
|
|
# - issue_count: Closing specific issue types
|
|
# - issue_reduce: Reducing overall issue backlog
|
|
# - daily_run: Daily Run session completion
|
|
# - custom: Special/manual quests
|
|
# - exploration: Exploratory work
|
|
# - refactor: Code refactoring
|
|
|
|
multipliers:
|
|
calm:
|
|
# Calm periods: incentivize maintenance and exploration
|
|
test_improve: 1.0
|
|
docs_update: 1.2
|
|
issue_count: 1.0
|
|
issue_reduce: 1.0
|
|
daily_run: 1.0
|
|
custom: 1.0
|
|
exploration: 1.3
|
|
refactor: 1.2
|
|
|
|
elevated:
|
|
# Elevated stress: start emphasizing stability
|
|
test_improve: 1.2
|
|
docs_update: 1.0
|
|
issue_count: 1.1
|
|
issue_reduce: 1.1
|
|
daily_run: 1.0
|
|
custom: 1.0
|
|
exploration: 1.0
|
|
refactor: 0.9 # Discourage risky changes
|
|
|
|
high:
|
|
# High stress: crisis mode, focus on stabilization
|
|
test_improve: 1.5 # Strongly incentivize testing
|
|
docs_update: 0.8 # Deprioritize docs
|
|
issue_count: 1.3 # Reward closing issues
|
|
issue_reduce: 1.4 # Strongly reward reducing backlog
|
|
daily_run: 1.1
|
|
custom: 1.0
|
|
exploration: 0.7 # Discourage exploration
|
|
refactor: 0.6 # Discourage refactors during crisis
|