Compare commits

...

3 Commits

Author SHA1 Message Date
1cce28d1bb [claude] Investigate: document paths to resolution for 5 closed PRs (#1219) (#1266)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / lint (pull_request) Failing after 29s
Tests / test (pull_request) Has been skipped
Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
2026-03-24 01:36:06 +00:00
4c6b69885d [claude] feat: Agent Energy Budget Monitoring (#1009) (#1267)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-24 01:35:50 +00:00
6b2e6d9e8c [claude] feat: Agent Energy Budget Monitoring (#1009) (#1267)
Some checks failed
Tests / lint (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-03-24 01:35:49 +00:00
7 changed files with 882 additions and 0 deletions

75
docs/pr-recovery-1219.md Normal file
View File

@@ -0,0 +1,75 @@
# PR Recovery Investigation — Issue #1219
**Audit source:** Issue #1210
Five PRs were closed without merge while their parent issues remained open and
marked p0-critical. This document records the investigation findings and the
path to resolution for each.
---
## Root Cause
Per Timmy's comment on #1219: all five PRs were closed due to **merge conflicts
during the mass-merge cleanup cycle** (a rebase storm), not due to code
quality problems or a changed approach. The code in each PR was correct;
the branches simply became stale.
---
## Status Matrix
| PR | Feature | Issue | PR Closed | Issue State | Resolution |
|----|---------|-------|-----------|-------------|------------|
| #1163 | Three-Strike Detector | #962 | Rebase storm | **Closed ✓** | v2 merged via PR #1232 |
| #1162 | Session Sovereignty Report | #957 | Rebase storm | **Open** | PR #1263 (v3 — rebased) |
| #1157 | Qwen3-8B/14B routing | #1065 | Rebase storm | **Closed ✓** | v2 merged via PR #1233 |
| #1156 | Agent Dreaming Mode | #1019 | Rebase storm | **Open** | PR #1264 (v3 — rebased) |
| #1145 | Qwen3-14B config | #1064 | Rebase storm | **Closed ✓** | Code present on main |
---
## Detail: Already Resolved
### PR #1163 → Issue #962 (Three-Strike Detector)
- **Why closed:** merge conflict during rebase storm
- **Resolution:** `src/timmy/sovereignty/three_strike.py` and
`src/dashboard/routes/three_strike.py` are present on `main` (landed via
PR #1232). Issue #962 is closed.
### PR #1157 → Issue #1065 (Qwen3-8B/14B dual-model routing)
- **Why closed:** merge conflict during rebase storm
- **Resolution:** `src/infrastructure/router/classifier.py` and
`src/infrastructure/router/cascade.py` are present on `main` (landed via
PR #1233). Issue #1065 is closed.
### PR #1145 → Issue #1064 (Qwen3-14B config)
- **Why closed:** merge conflict during rebase storm
- **Resolution:** `Modelfile.timmy`, `Modelfile.qwen3-14b`, and the `config.py`
defaults (`ollama_model = "qwen3:14b"`) are present on `main`. Issue #1064
is closed.
---
## Detail: Requiring Action
### PR #1162 → Issue #957 (Session Sovereignty Report Generator)
- **Why closed:** merge conflict during rebase storm
- **Branch preserved:** `claude/issue-957-v2` (one feature commit)
- **Action taken:** Rebased onto current `main`, resolved conflict in
`src/timmy/sovereignty/__init__.py` (both three-strike and session-report
docstrings kept). All 458 unit tests pass.
- **New PR:** #1263 (`claude/issue-957-v3``main`)
### PR #1156 → Issue #1019 (Agent Dreaming Mode)
- **Why closed:** merge conflict during rebase storm
- **Branch preserved:** `claude/issue-1019-v2` (one feature commit)
- **Action taken:** Rebased onto current `main`, resolved conflict in
`src/dashboard/app.py` (both `three_strike_router` and `dreaming_router`
registered). All 435 unit tests pass.
- **New PR:** #1264 (`claude/issue-1019-v3``main`)

View File

@@ -422,6 +422,14 @@ class Settings(BaseSettings):
# Alert threshold: free disk below this triggers cleanup / alert (GB).
hermes_disk_free_min_gb: float = 10.0
# ── Energy Budget Monitoring ───────────────────────────────────────
# Enable energy budget monitoring (tracks CPU/GPU power during inference).
energy_budget_enabled: bool = True
# Watts threshold that auto-activates low power mode (on-battery only).
energy_budget_watts_threshold: float = 15.0
# Model to prefer in low power mode (smaller = more efficient).
energy_low_power_model: str = "qwen3:1b"
# ── Error Logging ─────────────────────────────────────────────────
error_log_enabled: bool = True
error_log_dir: str = "logs"

View File

@@ -37,6 +37,7 @@ from dashboard.routes.db_explorer import router as db_explorer_router
from dashboard.routes.discord import router as discord_router
from dashboard.routes.experiments import router as experiments_router
from dashboard.routes.grok import router as grok_router
from dashboard.routes.energy import router as energy_router
from dashboard.routes.health import router as health_router
from dashboard.routes.hermes import router as hermes_router
from dashboard.routes.loop_qa import router as loop_qa_router
@@ -673,6 +674,7 @@ app.include_router(matrix_router)
app.include_router(tower_router)
app.include_router(daily_run_router)
app.include_router(hermes_router)
app.include_router(energy_router)
app.include_router(quests_router)
app.include_router(scorecards_router)
app.include_router(sovereignty_metrics_router)

View File

@@ -0,0 +1,121 @@
"""Energy Budget Monitoring routes.
Exposes the energy budget monitor via REST API so the dashboard and
external tools can query power draw, efficiency scores, and toggle
low power mode.
Refs: #1009
"""
import logging
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from config import settings
from infrastructure.energy.monitor import energy_monitor
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/energy", tags=["energy"])
class LowPowerRequest(BaseModel):
"""Request body for toggling low power mode."""
enabled: bool
class InferenceEventRequest(BaseModel):
"""Request body for recording an inference event."""
model: str
tokens_per_second: float
@router.get("/status")
async def energy_status():
"""Return the current energy budget status.
Returns the live power estimate, efficiency score (010), recent
inference samples, and whether low power mode is active.
"""
if not getattr(settings, "energy_budget_enabled", True):
return {
"enabled": False,
"message": "Energy budget monitoring is disabled (ENERGY_BUDGET_ENABLED=false)",
}
report = await energy_monitor.get_report()
return {**report.to_dict(), "enabled": True}
@router.get("/report")
async def energy_report():
"""Detailed energy budget report with all recent samples.
Same as /energy/status but always includes the full sample history.
"""
if not getattr(settings, "energy_budget_enabled", True):
raise HTTPException(status_code=503, detail="Energy budget monitoring is disabled")
report = await energy_monitor.get_report()
data = report.to_dict()
# Override recent_samples to include the full window (not just last 10)
data["recent_samples"] = [
{
"timestamp": s.timestamp,
"model": s.model,
"tokens_per_second": round(s.tokens_per_second, 1),
"estimated_watts": round(s.estimated_watts, 2),
"efficiency": round(s.efficiency, 3),
"efficiency_score": round(s.efficiency_score, 2),
}
for s in list(energy_monitor._samples)
]
return {**data, "enabled": True}
@router.post("/low-power")
async def set_low_power_mode(body: LowPowerRequest):
"""Enable or disable low power mode.
In low power mode the cascade router is advised to prefer the
configured energy_low_power_model (see settings).
"""
if not getattr(settings, "energy_budget_enabled", True):
raise HTTPException(status_code=503, detail="Energy budget monitoring is disabled")
energy_monitor.set_low_power_mode(body.enabled)
low_power_model = getattr(settings, "energy_low_power_model", "qwen3:1b")
return {
"low_power_mode": body.enabled,
"preferred_model": low_power_model if body.enabled else None,
"message": (
f"Low power mode {'enabled' if body.enabled else 'disabled'}. "
+ (f"Routing to {low_power_model}." if body.enabled else "Routing restored to default.")
),
}
@router.post("/record")
async def record_inference_event(body: InferenceEventRequest):
"""Record an inference event for efficiency tracking.
Called after each LLM inference completes. Updates the rolling
efficiency score and may auto-activate low power mode if watts
exceed the configured threshold.
"""
if not getattr(settings, "energy_budget_enabled", True):
return {"recorded": False, "message": "Energy budget monitoring is disabled"}
if body.tokens_per_second <= 0:
raise HTTPException(status_code=422, detail="tokens_per_second must be positive")
sample = energy_monitor.record_inference(body.model, body.tokens_per_second)
return {
"recorded": True,
"efficiency_score": round(sample.efficiency_score, 2),
"estimated_watts": round(sample.estimated_watts, 2),
"low_power_mode": energy_monitor.low_power_mode,
}

View File

@@ -0,0 +1,8 @@
"""Energy Budget Monitoring — power-draw estimation for LLM inference.
Refs: #1009
"""
from infrastructure.energy.monitor import EnergyBudgetMonitor, energy_monitor
__all__ = ["EnergyBudgetMonitor", "energy_monitor"]

View File

@@ -0,0 +1,371 @@
"""Energy Budget Monitor — estimates GPU/CPU power draw during LLM inference.
Tracks estimated power consumption to optimize for "metabolic efficiency".
Three estimation strategies attempted in priority order:
1. Battery discharge via ioreg (macOS — works without sudo, on-battery only)
2. CPU utilisation proxy via sysctl hw.cpufrequency + top
3. Model-size heuristic (tokens/s × model_size_gb × 2W/GB estimate)
Energy Efficiency score (010):
efficiency = tokens_per_second / estimated_watts, normalised to 010.
Low Power Mode:
Activated manually or automatically when draw exceeds the configured
threshold. In low power mode the cascade router is advised to prefer the
configured low_power_model (e.g. qwen3:1b or similar compact model).
Refs: #1009
"""
import asyncio
import json
import logging
import subprocess
import time
from collections import deque
from dataclasses import dataclass, field
from datetime import UTC, datetime
from typing import Any
from config import settings
logger = logging.getLogger(__name__)
# Approximate model-size lookup (GB) used for heuristic power estimate.
# Keys are lowercase substring matches against the model name.
_MODEL_SIZE_GB: dict[str, float] = {
"qwen3:1b": 0.8,
"qwen3:3b": 2.0,
"qwen3:4b": 2.5,
"qwen3:8b": 5.5,
"qwen3:14b": 9.0,
"qwen3:30b": 20.0,
"qwen3:32b": 20.0,
"llama3:8b": 5.5,
"llama3:70b": 45.0,
"mistral:7b": 4.5,
"gemma3:4b": 2.5,
"gemma3:12b": 8.0,
"gemma3:27b": 17.0,
"phi4:14b": 9.0,
}
_DEFAULT_MODEL_SIZE_GB = 5.0 # fallback when model not in table
_WATTS_PER_GB_HEURISTIC = 2.0 # rough W/GB for Apple Silicon unified memory
# Efficiency score normalisation: score 10 at this efficiency (tok/s per W).
_EFFICIENCY_SCORE_CEILING = 5.0 # tok/s per W → score 10
# Rolling window for recent samples
_HISTORY_MAXLEN = 60
@dataclass
class InferenceSample:
"""A single inference event captured by record_inference()."""
timestamp: str
model: str
tokens_per_second: float
estimated_watts: float
efficiency: float # tokens/s per watt
efficiency_score: float # 010
@dataclass
class EnergyReport:
"""Snapshot of current energy budget state."""
timestamp: str
low_power_mode: bool
current_watts: float
strategy: str # "battery", "cpu_proxy", "heuristic", "unavailable"
efficiency_score: float # 010; -1 if no inference samples yet
recent_samples: list[InferenceSample]
recommendation: str
details: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
return {
"timestamp": self.timestamp,
"low_power_mode": self.low_power_mode,
"current_watts": round(self.current_watts, 2),
"strategy": self.strategy,
"efficiency_score": round(self.efficiency_score, 2),
"recent_samples": [
{
"timestamp": s.timestamp,
"model": s.model,
"tokens_per_second": round(s.tokens_per_second, 1),
"estimated_watts": round(s.estimated_watts, 2),
"efficiency": round(s.efficiency, 3),
"efficiency_score": round(s.efficiency_score, 2),
}
for s in self.recent_samples
],
"recommendation": self.recommendation,
"details": self.details,
}
class EnergyBudgetMonitor:
"""Estimates power consumption and tracks LLM inference efficiency.
All blocking I/O (subprocess calls) is wrapped in asyncio.to_thread()
so the event loop is never blocked. Results are cached.
Usage::
# Record an inference event
energy_monitor.record_inference("qwen3:8b", tokens_per_second=42.0)
# Get the current report
report = await energy_monitor.get_report()
# Toggle low power mode
energy_monitor.set_low_power_mode(True)
"""
_POWER_CACHE_TTL = 10.0 # seconds between fresh power readings
def __init__(self) -> None:
self._low_power_mode: bool = False
self._samples: deque[InferenceSample] = deque(maxlen=_HISTORY_MAXLEN)
self._cached_watts: float = 0.0
self._cached_strategy: str = "unavailable"
self._cache_ts: float = 0.0
# ── Public API ────────────────────────────────────────────────────────────
@property
def low_power_mode(self) -> bool:
return self._low_power_mode
def set_low_power_mode(self, enabled: bool) -> None:
"""Enable or disable low power mode."""
self._low_power_mode = enabled
state = "enabled" if enabled else "disabled"
logger.info("Energy budget: low power mode %s", state)
def record_inference(self, model: str, tokens_per_second: float) -> InferenceSample:
"""Record an inference event for efficiency tracking.
Call this after each LLM inference completes with the model name and
measured throughput. The current power estimate is used to compute
the efficiency score.
Args:
model: Ollama model name (e.g. "qwen3:8b").
tokens_per_second: Measured decode throughput.
Returns:
The recorded InferenceSample.
"""
watts = self._cached_watts if self._cached_watts > 0 else self._estimate_watts_sync(model)
efficiency = tokens_per_second / max(watts, 0.1)
score = min(10.0, (efficiency / _EFFICIENCY_SCORE_CEILING) * 10.0)
sample = InferenceSample(
timestamp=datetime.now(UTC).isoformat(),
model=model,
tokens_per_second=tokens_per_second,
estimated_watts=watts,
efficiency=efficiency,
efficiency_score=score,
)
self._samples.append(sample)
# Auto-engage low power mode if above threshold and budget is enabled
threshold = getattr(settings, "energy_budget_watts_threshold", 15.0)
if watts > threshold and not self._low_power_mode:
logger.info(
"Energy budget: %.1fW exceeds threshold %.1fW — auto-engaging low power mode",
watts,
threshold,
)
self.set_low_power_mode(True)
return sample
async def get_report(self) -> EnergyReport:
"""Return the current energy budget report.
Refreshes the power estimate if the cache is stale.
"""
await self._refresh_power_cache()
score = self._compute_mean_efficiency_score()
recommendation = self._build_recommendation(score)
return EnergyReport(
timestamp=datetime.now(UTC).isoformat(),
low_power_mode=self._low_power_mode,
current_watts=self._cached_watts,
strategy=self._cached_strategy,
efficiency_score=score,
recent_samples=list(self._samples)[-10:],
recommendation=recommendation,
details={"sample_count": len(self._samples)},
)
# ── Power estimation ──────────────────────────────────────────────────────
async def _refresh_power_cache(self) -> None:
"""Refresh the cached power reading if stale."""
now = time.monotonic()
if now - self._cache_ts < self._POWER_CACHE_TTL:
return
try:
watts, strategy = await asyncio.to_thread(self._read_power)
except Exception as exc:
logger.debug("Energy: power read failed: %s", exc)
watts, strategy = 0.0, "unavailable"
self._cached_watts = watts
self._cached_strategy = strategy
self._cache_ts = now
def _read_power(self) -> tuple[float, str]:
"""Synchronous power reading — tries strategies in priority order.
Returns:
Tuple of (watts, strategy_name).
"""
# Strategy 1: battery discharge via ioreg (on-battery Macs)
try:
watts = self._read_battery_watts()
if watts > 0:
return watts, "battery"
except Exception:
pass
# Strategy 2: CPU utilisation proxy via top
try:
cpu_pct = self._read_cpu_pct()
if cpu_pct >= 0:
# M3 Max TDP ≈ 40W; scale linearly
watts = (cpu_pct / 100.0) * 40.0
return watts, "cpu_proxy"
except Exception:
pass
# Strategy 3: heuristic from loaded model size
return 0.0, "unavailable"
def _estimate_watts_sync(self, model: str) -> float:
"""Estimate watts from model size when no live reading is available."""
size_gb = self._model_size_gb(model)
return size_gb * _WATTS_PER_GB_HEURISTIC
def _read_battery_watts(self) -> float:
"""Read instantaneous battery discharge via ioreg.
Returns watts if on battery, 0.0 if plugged in or unavailable.
Requires macOS; no sudo needed.
"""
result = subprocess.run(
["ioreg", "-r", "-c", "AppleSmartBattery", "-d", "1"],
capture_output=True,
text=True,
timeout=3,
)
amperage_ma = 0.0
voltage_mv = 0.0
is_charging = True # assume charging unless we see ExternalConnected = No
for line in result.stdout.splitlines():
stripped = line.strip()
if '"InstantAmperage"' in stripped:
try:
amperage_ma = float(stripped.split("=")[-1].strip())
except ValueError:
pass
elif '"Voltage"' in stripped:
try:
voltage_mv = float(stripped.split("=")[-1].strip())
except ValueError:
pass
elif '"ExternalConnected"' in stripped:
is_charging = "Yes" in stripped
if is_charging or voltage_mv == 0 or amperage_ma <= 0:
return 0.0
# ioreg reports amperage in mA, voltage in mV
return (abs(amperage_ma) * voltage_mv) / 1_000_000
def _read_cpu_pct(self) -> float:
"""Read CPU utilisation from macOS top.
Returns aggregate CPU% (0100), or -1.0 on failure.
"""
result = subprocess.run(
["top", "-l", "1", "-n", "0", "-stats", "cpu"],
capture_output=True,
text=True,
timeout=5,
)
for line in result.stdout.splitlines():
if "CPU usage:" in line:
# "CPU usage: 12.5% user, 8.3% sys, 79.1% idle"
parts = line.split()
try:
user = float(parts[2].rstrip("%"))
sys_ = float(parts[4].rstrip("%"))
return user + sys_
except (IndexError, ValueError):
pass
return -1.0
# ── Helpers ───────────────────────────────────────────────────────────────
@staticmethod
def _model_size_gb(model: str) -> float:
"""Look up approximate model size in GB by name substring."""
lower = model.lower()
# Exact match first
if lower in _MODEL_SIZE_GB:
return _MODEL_SIZE_GB[lower]
# Substring match
for key, size in _MODEL_SIZE_GB.items():
if key in lower:
return size
return _DEFAULT_MODEL_SIZE_GB
def _compute_mean_efficiency_score(self) -> float:
"""Mean efficiency score over recent samples, or -1 if none."""
if not self._samples:
return -1.0
recent = list(self._samples)[-10:]
return sum(s.efficiency_score for s in recent) / len(recent)
def _build_recommendation(self, score: float) -> str:
"""Generate a human-readable recommendation from the efficiency score."""
threshold = getattr(settings, "energy_budget_watts_threshold", 15.0)
low_power_model = getattr(settings, "energy_low_power_model", "qwen3:1b")
if score < 0:
return "No inference data yet — run some tasks to populate efficiency metrics."
if self._low_power_mode:
return (
f"Low power mode active — routing to {low_power_model}. "
"Disable when power draw normalises."
)
if score < 3.0:
return (
f"Low efficiency (score {score:.1f}/10). "
f"Consider enabling low power mode to favour smaller models "
f"(threshold: {threshold}W)."
)
if score < 6.0:
return f"Moderate efficiency (score {score:.1f}/10). System operating normally."
return f"Good efficiency (score {score:.1f}/10). No action needed."
# Module-level singleton
energy_monitor = EnergyBudgetMonitor()

View File

@@ -0,0 +1,297 @@
"""Unit tests for the Energy Budget Monitor.
Tests power estimation strategies, inference recording, efficiency scoring,
and low power mode logic — all without real subprocesses.
Refs: #1009
"""
from unittest.mock import MagicMock, patch
import pytest
from infrastructure.energy.monitor import (
EnergyBudgetMonitor,
InferenceSample,
_DEFAULT_MODEL_SIZE_GB,
_EFFICIENCY_SCORE_CEILING,
_WATTS_PER_GB_HEURISTIC,
)
@pytest.fixture()
def monitor():
return EnergyBudgetMonitor()
# ── Model size lookup ─────────────────────────────────────────────────────────
def test_model_size_exact_match(monitor):
assert monitor._model_size_gb("qwen3:8b") == 5.5
def test_model_size_substring_match(monitor):
assert monitor._model_size_gb("some-qwen3:14b-custom") == 9.0
def test_model_size_unknown_returns_default(monitor):
assert monitor._model_size_gb("unknownmodel:99b") == _DEFAULT_MODEL_SIZE_GB
# ── Battery power reading ─────────────────────────────────────────────────────
def test_read_battery_watts_on_battery(monitor):
ioreg_output = (
"{\n"
' "InstantAmperage" = 2500\n'
' "Voltage" = 12000\n'
' "ExternalConnected" = No\n'
"}"
)
mock_result = MagicMock()
mock_result.stdout = ioreg_output
with patch("subprocess.run", return_value=mock_result):
watts = monitor._read_battery_watts()
# 2500 mA * 12000 mV / 1_000_000 = 30 W
assert watts == pytest.approx(30.0, abs=0.01)
def test_read_battery_watts_plugged_in_returns_zero(monitor):
ioreg_output = (
"{\n"
' "InstantAmperage" = 1000\n'
' "Voltage" = 12000\n'
' "ExternalConnected" = Yes\n'
"}"
)
mock_result = MagicMock()
mock_result.stdout = ioreg_output
with patch("subprocess.run", return_value=mock_result):
watts = monitor._read_battery_watts()
assert watts == 0.0
def test_read_battery_watts_subprocess_failure_raises(monitor):
with patch("subprocess.run", side_effect=OSError("no ioreg")):
with pytest.raises(OSError):
monitor._read_battery_watts()
# ── CPU proxy reading ─────────────────────────────────────────────────────────
def test_read_cpu_pct_parses_top(monitor):
top_output = (
"Processes: 450 total\n"
"CPU usage: 15.2% user, 8.8% sys, 76.0% idle\n"
)
mock_result = MagicMock()
mock_result.stdout = top_output
with patch("subprocess.run", return_value=mock_result):
pct = monitor._read_cpu_pct()
assert pct == pytest.approx(24.0, abs=0.1)
def test_read_cpu_pct_no_match_returns_negative(monitor):
mock_result = MagicMock()
mock_result.stdout = "No CPU line here\n"
with patch("subprocess.run", return_value=mock_result):
pct = monitor._read_cpu_pct()
assert pct == -1.0
# ── Power strategy selection ──────────────────────────────────────────────────
def test_read_power_uses_battery_first(monitor):
with patch.object(monitor, "_read_battery_watts", return_value=25.0):
watts, strategy = monitor._read_power()
assert watts == 25.0
assert strategy == "battery"
def test_read_power_falls_back_to_cpu_proxy(monitor):
with (
patch.object(monitor, "_read_battery_watts", return_value=0.0),
patch.object(monitor, "_read_cpu_pct", return_value=50.0),
):
watts, strategy = monitor._read_power()
assert strategy == "cpu_proxy"
assert watts == pytest.approx(20.0, abs=0.1) # 50% of 40W TDP
def test_read_power_unavailable_when_both_fail(monitor):
with (
patch.object(monitor, "_read_battery_watts", side_effect=OSError),
patch.object(monitor, "_read_cpu_pct", return_value=-1.0),
):
watts, strategy = monitor._read_power()
assert strategy == "unavailable"
assert watts == 0.0
# ── Inference recording ───────────────────────────────────────────────────────
def test_record_inference_produces_sample(monitor):
monitor._cached_watts = 10.0
monitor._cache_ts = 9999999999.0 # far future — cache won't expire
sample = monitor.record_inference("qwen3:8b", tokens_per_second=40.0)
assert isinstance(sample, InferenceSample)
assert sample.model == "qwen3:8b"
assert sample.tokens_per_second == 40.0
assert sample.estimated_watts == pytest.approx(10.0)
# efficiency = 40 / 10 = 4.0 tok/s per W
assert sample.efficiency == pytest.approx(4.0)
# score = min(10, (4.0 / 5.0) * 10) = 8.0
assert sample.efficiency_score == pytest.approx(8.0)
def test_record_inference_stores_in_history(monitor):
monitor._cached_watts = 5.0
monitor._cache_ts = 9999999999.0
monitor.record_inference("qwen3:8b", 30.0)
monitor.record_inference("qwen3:14b", 20.0)
assert len(monitor._samples) == 2
def test_record_inference_auto_activates_low_power(monitor):
monitor._cached_watts = 20.0 # above default 15W threshold
monitor._cache_ts = 9999999999.0
assert not monitor.low_power_mode
monitor.record_inference("qwen3:30b", 8.0)
assert monitor.low_power_mode
def test_record_inference_no_auto_low_power_below_threshold(monitor):
monitor._cached_watts = 10.0 # below default 15W threshold
monitor._cache_ts = 9999999999.0
monitor.record_inference("qwen3:8b", 40.0)
assert not monitor.low_power_mode
# ── Efficiency score ──────────────────────────────────────────────────────────
def test_efficiency_score_caps_at_10(monitor):
monitor._cached_watts = 1.0
monitor._cache_ts = 9999999999.0
sample = monitor.record_inference("qwen3:1b", tokens_per_second=1000.0)
assert sample.efficiency_score == pytest.approx(10.0)
def test_efficiency_score_no_samples_returns_negative_one(monitor):
assert monitor._compute_mean_efficiency_score() == -1.0
def test_mean_efficiency_score_averages_last_10(monitor):
monitor._cached_watts = 10.0
monitor._cache_ts = 9999999999.0
for _ in range(15):
monitor.record_inference("qwen3:8b", tokens_per_second=25.0) # efficiency=2.5 → score=5.0
score = monitor._compute_mean_efficiency_score()
assert score == pytest.approx(5.0, abs=0.01)
# ── Low power mode ────────────────────────────────────────────────────────────
def test_set_low_power_mode_toggle(monitor):
assert not monitor.low_power_mode
monitor.set_low_power_mode(True)
assert monitor.low_power_mode
monitor.set_low_power_mode(False)
assert not monitor.low_power_mode
# ── get_report ────────────────────────────────────────────────────────────────
@pytest.mark.asyncio
async def test_get_report_structure(monitor):
with patch.object(monitor, "_read_power", return_value=(8.0, "battery")):
report = await monitor.get_report()
assert report.timestamp
assert isinstance(report.low_power_mode, bool)
assert isinstance(report.current_watts, float)
assert report.strategy in ("battery", "cpu_proxy", "heuristic", "unavailable")
assert isinstance(report.recommendation, str)
@pytest.mark.asyncio
async def test_get_report_to_dict(monitor):
with patch.object(monitor, "_read_power", return_value=(5.0, "cpu_proxy")):
report = await monitor.get_report()
data = report.to_dict()
assert "timestamp" in data
assert "low_power_mode" in data
assert "current_watts" in data
assert "strategy" in data
assert "efficiency_score" in data
assert "recent_samples" in data
assert "recommendation" in data
@pytest.mark.asyncio
async def test_get_report_caches_power_reading(monitor):
call_count = 0
def counting_read_power():
nonlocal call_count
call_count += 1
return (10.0, "battery")
with patch.object(monitor, "_read_power", side_effect=counting_read_power):
await monitor.get_report()
await monitor.get_report()
# Cache TTL is 10s — should only call once
assert call_count == 1
# ── Recommendation text ───────────────────────────────────────────────────────
def test_recommendation_no_data(monitor):
rec = monitor._build_recommendation(-1.0)
assert "No inference data" in rec
def test_recommendation_low_power_mode(monitor):
monitor.set_low_power_mode(True)
rec = monitor._build_recommendation(2.0)
assert "Low power mode active" in rec
def test_recommendation_low_efficiency(monitor):
rec = monitor._build_recommendation(1.5)
assert "Low efficiency" in rec
def test_recommendation_good_efficiency(monitor):
rec = monitor._build_recommendation(8.0)
assert "Good efficiency" in rec