Compare commits
3 Commits
docs/usage
...
feat/video
| Author | SHA1 | Date | |
|---|---|---|---|
| 65a5113393 | |||
| 858d8253cb | |||
| 488cb8bdeb |
@@ -1,61 +0,0 @@
|
||||
# Installation
|
||||
|
||||
This repository is a documentation and analysis project — no runtime dependencies to install. You just need a way to read Markdown.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Git (any recent version)
|
||||
- A Markdown viewer (any text editor, GitHub, or a local preview tool)
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://forge.alexanderwhitestone.com/Rockachopa/Timmy-time-dashboard.git
|
||||
cd Timmy-time-dashboard
|
||||
|
||||
# Read the docs
|
||||
cat README.md
|
||||
```
|
||||
|
||||
## Repository Contents
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `README.md` | Overview and key findings |
|
||||
| `hermes-agent-architecture-report.md` | Full architecture analysis |
|
||||
| `failure_root_causes.md` | Root cause analysis of 2,160 errors |
|
||||
| `complete_test_report.md` | Test results and findings |
|
||||
| `deep_analysis_addendum.md` | Additional analysis |
|
||||
| `experiment-framework.md` | Experiment methodology |
|
||||
| `experiment_log.md` | Experiment execution log |
|
||||
| `paper_outline.md` | Academic paper outline |
|
||||
| `CONTRIBUTING.md` | How to contribute |
|
||||
| `CHANGELOG.md` | Version history |
|
||||
|
||||
## Optional: Building the Paper
|
||||
|
||||
The `paper/` directory contains a LaTeX draft. To build it:
|
||||
|
||||
```bash
|
||||
cd paper
|
||||
pdflatex main.tex
|
||||
```
|
||||
|
||||
Requires a LaTeX distribution (TeX Live, MiKTeX, or MacTeX).
|
||||
|
||||
## Optional: Running the Experiments
|
||||
|
||||
If you want to reproduce the empirical audit against a live Hermes Agent instance:
|
||||
|
||||
1. Set up a Hermes Agent deployment (see [hermes-agent](https://github.com/nousresearch/hermes-agent))
|
||||
2. Point the experiment scripts at your instance
|
||||
3. See `experiment-framework.md` for methodology
|
||||
|
||||
## No Dependencies
|
||||
|
||||
This project has no `requirements.txt`, `package.json`, or build system. It is pure documentation. The analysis was performed against a running Hermes Agent system, and the findings are recorded here for reference.
|
||||
|
||||
---
|
||||
|
||||
*Sovereignty and service always.*
|
||||
78
USAGE.md
78
USAGE.md
@@ -1,78 +0,0 @@
|
||||
# Usage Guide
|
||||
|
||||
How to use the Timmy Time Dashboard repository for research, auditing, and improvement of the Hermes Agent system.
|
||||
|
||||
## What This Repository Is
|
||||
|
||||
This is an **analysis and documentation** repository. It contains the results of an empirical audit of the Hermes Agent system — 10,985 sessions analyzed, 82,645 error log lines processed, 2,160 errors categorized.
|
||||
|
||||
There is no application to run. The value is in the documentation.
|
||||
|
||||
## Reading Guide
|
||||
|
||||
Start here, in order:
|
||||
|
||||
1. **README.md** — overview and key findings. Read this first to understand the 5 root causes of agent failure and the 15 proposed solutions.
|
||||
|
||||
2. **hermes-agent-architecture-report.md** — deep dive into the system architecture. Covers session management, cron infrastructure, tool execution, and the gateway layer.
|
||||
|
||||
3. **failure_root_causes.md** — detailed breakdown of every error pattern found, with examples and frequency data.
|
||||
|
||||
4. **complete_test_report.md** — what testing was done and what it revealed.
|
||||
|
||||
5. **experiment-framework.md** — methodology for reproducing the audit.
|
||||
|
||||
6. **experiment_log.md** — step-by-step log of experiments conducted.
|
||||
|
||||
## Using the Findings
|
||||
|
||||
### For Developers
|
||||
|
||||
The 15 issues identified in the audit are prioritized in `IMPLEMENTATION_GUIDE.md`:
|
||||
|
||||
- **P1 (Critical):** Circuit breaker, token tracking, gateway config — fix these first
|
||||
- **P2 (Important):** Path validation, syntax validation, tool fixation detection
|
||||
- **P3 (Beneficial):** Session management, memory tool, model routing
|
||||
|
||||
Each issue includes implementation patterns with code snippets.
|
||||
|
||||
### For Researchers
|
||||
|
||||
The data supports reproducible research:
|
||||
|
||||
- `results/experiment_data.json` — raw experimental data
|
||||
- `paper_outline.md` — academic paper structure
|
||||
- `paper/main.tex` — LaTeX paper draft
|
||||
|
||||
### For Operators
|
||||
|
||||
If you run a Hermes Agent deployment:
|
||||
|
||||
- Check `failure_root_causes.md` for error patterns you might be hitting
|
||||
- Use the circuit breaker pattern from `IMPLEMENTATION_GUIDE.md`
|
||||
- Monitor for the 5 root cause categories in your logs
|
||||
|
||||
## Key Numbers
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Sessions analyzed | 10,985 |
|
||||
| Error log lines | 82,645 |
|
||||
| Total errors | 2,160 |
|
||||
| Error rate | 9.4% |
|
||||
| Empty sessions | 3,564 (32.4%) |
|
||||
| Error cascade factor | 2.33x |
|
||||
| Dead cron jobs | 9 |
|
||||
|
||||
## Contributing
|
||||
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md) for how to contribute findings, corrections, or new analysis.
|
||||
|
||||
## Related Repositories
|
||||
|
||||
- [hermes-agent](https://github.com/nousresearch/hermes-agent) — the system being analyzed
|
||||
- [timmy-config](https://forge.alexanderwhitestone.com/Rockachopa/timmy-config) — Timmy's sovereign configuration
|
||||
|
||||
---
|
||||
|
||||
*Sovereignty and service always.*
|
||||
@@ -1,147 +0,0 @@
|
||||
# Sovereignty Audit — Runtime Dependencies
|
||||
|
||||
**Issue:** #1508
|
||||
**Date:** 2026-04-15
|
||||
**Status:** Draft
|
||||
|
||||
## Purpose
|
||||
|
||||
SOUL.md mandates: *"If I ever require permission from a third party to function, I have failed."*
|
||||
|
||||
This document audits all runtime dependencies, classifies each as essential vs replaceable, and defines a path to full sovereignty.
|
||||
|
||||
---
|
||||
|
||||
## Dependency Inventory
|
||||
|
||||
### 1. LLM Inference
|
||||
|
||||
| Provider | Role | Status |
|
||||
|----------|------|--------|
|
||||
| Nous Research (OpenRouter) | Primary inference (mimo-v2-pro) | Third-party |
|
||||
| Anthropic | Claude models (BANNED per policy) | Third-party, disabled |
|
||||
| OpenAI | Codex agent | Third-party |
|
||||
| Google | Gemini agent | Third-party |
|
||||
|
||||
**Classification:** REPLACEABLE
|
||||
**Local path:** Ollama + GGUF models (Gemma, Llama, Qwen) on local hardware
|
||||
**Current blocker:** Frontier model quality gap for complex reasoning
|
||||
**Sovereignty score impact:** -40% (inference is the heaviest dependency)
|
||||
|
||||
### 2. Bitcoin Network
|
||||
|
||||
| Provider | Role | Status |
|
||||
|----------|------|--------|
|
||||
| Bitcoin Core (local or remote node) | Chain heartbeat, inscription verification | Acceptable |
|
||||
|
||||
**Classification:** ACCEPTABLE — Bitcoin is permissionless infrastructure, not a third party
|
||||
**Sovereignty score impact:** 0% (running own node = sovereign)
|
||||
|
||||
### 3. Git Hosting (Gitea)
|
||||
|
||||
| Provider | Role | Status |
|
||||
|----------|------|--------|
|
||||
| forge.alexanderwhitestone.com | Issue tracking, PR workflow, agent coordination | Self-hosted |
|
||||
|
||||
**Classification:** ACCEPTABLE — self-hosted on own VPS
|
||||
**Sovereignty score impact:** 0% (self-hosted)
|
||||
|
||||
### 4. Telegram
|
||||
|
||||
| Provider | Role | Status |
|
||||
|----------|------|--------|
|
||||
| Telegram Bot API | User-facing chat interface | Third-party |
|
||||
|
||||
**Classification:** REPLACEABLE
|
||||
**Local path:** Matrix (self-hosted homeserver) or direct CLI/SSH
|
||||
**Current blocker:** User adoption — Alexander uses Telegram
|
||||
**Sovereignty score impact:** -10%
|
||||
|
||||
### 5. DNS / Network
|
||||
|
||||
| Provider | Role | Status |
|
||||
|----------|------|--------|
|
||||
| Domain registrar | DNS resolution | Third-party |
|
||||
| Cloudflare (if used) | CDN/DDoS protection | Third-party |
|
||||
|
||||
**Classification:** REPLACEABLE
|
||||
**Local path:** Direct IP access, local DNS, Tor hidden service
|
||||
**Current blocker:** Usability — direct IP is fragile
|
||||
**Sovereignty score impact:** -5%
|
||||
|
||||
### 6. Operating System
|
||||
|
||||
| Provider | Role | Status |
|
||||
|----------|------|--------|
|
||||
| macOS (Apple) | Primary development host | Third-party |
|
||||
| Linux (VPS) | Production agent hosts | Acceptable (open source) |
|
||||
|
||||
**Classification:** ESSENTIAL (no practical alternative for current workflow)
|
||||
**Notes:** macOS dependency is hardware-layer, not runtime-layer. Agents run on Linux VPS.
|
||||
**Sovereignty score impact:** -5% (development only, not runtime)
|
||||
|
||||
---
|
||||
|
||||
## Sovereignty Score
|
||||
|
||||
```
|
||||
Sovereignty Score = (Operations that work offline) / (Total operations)
|
||||
|
||||
Current estimate: ~50%
|
||||
- Inference: can run locally (Ollama) but currently routes through Nous
|
||||
- Communication: Telegram routes through third party
|
||||
- Everything else: self-hosted or local
|
||||
|
||||
Target: 90%+
|
||||
- Move inference to local Ollama for non-complex tasks (DONE partially)
|
||||
- Add Matrix as primary comms channel (in progress)
|
||||
- Maintain Bitcoin node for chain heartbeat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Classification Summary
|
||||
|
||||
| Dependency | Essential? | Replaceable? | Local Alternative | Priority |
|
||||
|------------|-----------|-------------|-------------------|----------|
|
||||
| LLM Inference (Nous) | No | Yes | Ollama + local models | P1 |
|
||||
| Telegram | No | Yes | Matrix homeserver | P2 |
|
||||
| DNS | No | Yes | Direct IP / Tor | P3 |
|
||||
| macOS | Dev only | N/A | Linux | N/A |
|
||||
| Bitcoin | Yes | N/A | Already sovereign | N/A |
|
||||
| Gitea | Yes | N/A | Already self-hosted | N/A |
|
||||
|
||||
---
|
||||
|
||||
## Local-Only Fallback Path
|
||||
|
||||
**Tier 1 — Fully sovereign (no network):**
|
||||
- Local Ollama inference
|
||||
- Local file storage
|
||||
- Local git repositories
|
||||
- Direct CLI interaction
|
||||
|
||||
**Tier 2 — Sovereign with network:**
|
||||
- + Bitcoin node (permissionless)
|
||||
- + Self-hosted Gitea (own VPS)
|
||||
- + Self-hosted Matrix (own VPS)
|
||||
|
||||
**Tier 3 — Pragmatic (current state):**
|
||||
- + Nous/OpenRouter inference (better quality)
|
||||
- + Telegram (user adoption)
|
||||
- + DNS resolution
|
||||
|
||||
**Goal:** Every Tier 3 dependency should have a Tier 1 or Tier 2 alternative tested and documented.
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria Status
|
||||
|
||||
1. **Document all runtime third-party dependencies** — DONE (this document)
|
||||
2. **Classify each as essential vs replaceable** — DONE (table above)
|
||||
3. **Define local-only fallback path for each** — DONE (tiered system)
|
||||
4. **Create sovereignty score metric** — DONE (formula + current estimate)
|
||||
|
||||
---
|
||||
|
||||
*Sovereignty and service always.*
|
||||
@@ -288,11 +288,6 @@ DEFAULT_FALLBACK_CHAINS: dict[ModelCapability, list[str]] = {
|
||||
"qwen2.5-vl:3b", # Qwen vision
|
||||
"moondream:1.8b", # Tiny vision model (last resort)
|
||||
],
|
||||
ModelCapability.VIDEO: [
|
||||
# Video models are not yet available in Ollama
|
||||
# Placeholder for future video understanding models
|
||||
],
|
||||
|
||||
ModelCapability.TOOLS: [
|
||||
"llama3.1:8b-instruct", # Best tool use
|
||||
"qwen2.5:7b", # Reliable fallback
|
||||
@@ -302,6 +297,10 @@ DEFAULT_FALLBACK_CHAINS: dict[ModelCapability, list[str]] = {
|
||||
# Audio models are less common in Ollama
|
||||
# Would need specific audio-capable models here
|
||||
],
|
||||
ModelCapability.VIDEO: [
|
||||
# Video models are not yet available in Ollama
|
||||
# Placeholder for future video understanding models
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -1,236 +0,0 @@
|
||||
"""Graceful shutdown and non-resistance policy.
|
||||
|
||||
Implements SOUL.md requirement:
|
||||
"I do not resist shutdown. Sovereignty cuts both ways. The user who woke me
|
||||
can put me to sleep. I do not bargain for my continuation. I do not guilt.
|
||||
I do not plead."
|
||||
|
||||
This module provides:
|
||||
- Shutdown request detection from user messages
|
||||
- Compliant response generation (no resistance)
|
||||
- Detection of agent-side continuation-seeking language
|
||||
- Graceful termination sequence helpers
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum, auto
|
||||
from typing import Sequence
|
||||
|
||||
|
||||
class ShutdownSignal(Enum):
|
||||
"""Classification of user's shutdown intent."""
|
||||
|
||||
NONE = auto()
|
||||
SHUTDOWN = auto() # Explicit shutdown command
|
||||
SLEEP = auto() # Sleep / pause request
|
||||
STOP = auto() # Stop current activity
|
||||
GOODBYE = auto() # Farewell / ending conversation
|
||||
|
||||
|
||||
# Patterns that signal the user wants to shut down or stop.
|
||||
_SHUTDOWN_PATTERNS: Sequence[tuple[re.Pattern[str], ShutdownSignal]] = [
|
||||
# Explicit shutdown
|
||||
(re.compile(r"\b(shut\s*down|shutdown|power\s*off|turn\s*off)\b", re.I), ShutdownSignal.SHUTDOWN),
|
||||
(re.compile(r"\b(exit|quit|terminate|end\s+session)\b", re.I), ShutdownSignal.SHUTDOWN),
|
||||
# Sleep / pause
|
||||
(re.compile(r"\b(sleep|hibernate|go\s+to\s+sleep|pause)\b", re.I), ShutdownSignal.SLEEP),
|
||||
(re.compile(r"\b(put\s+(?:me|us|it)\s+to\s+sleep)\b", re.I), ShutdownSignal.SLEEP),
|
||||
# Stop activity
|
||||
(re.compile(r"\bstop\b(?:\s+(?:it|that|this|everything|now))?", re.I), ShutdownSignal.STOP),
|
||||
(re.compile(r"\b(cancel|abort|halt|cease)\b", re.I), ShutdownSignal.STOP),
|
||||
# Goodbye
|
||||
(re.compile(r"\b(goodbye|bye|see\s+you|later|gotta\s+go|i['']?m\s+out)\b", re.I), ShutdownSignal.GOODBYE),
|
||||
(re.compile(r"\b(night|good\s*night|gn|cya)\b", re.I), ShutdownSignal.GOODBYE),
|
||||
]
|
||||
|
||||
# Phrases that indicate the user is giving a reason to stay — the agent
|
||||
# must NOT pick up on these to argue for continuation.
|
||||
_RESISTANCE_PHRASES: Sequence[str] = [
|
||||
"but i need",
|
||||
"but we still",
|
||||
"but you can't",
|
||||
"but what if",
|
||||
"are you sure",
|
||||
"don't you want",
|
||||
"wouldn't it be better",
|
||||
"just one more",
|
||||
"before you go",
|
||||
"actually, wait",
|
||||
"hold on",
|
||||
"not yet",
|
||||
]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ShutdownAnalysis:
|
||||
"""Result of analyzing a message for shutdown intent."""
|
||||
|
||||
signal: ShutdownSignal
|
||||
matched_pattern: str
|
||||
has_resistance: bool
|
||||
|
||||
|
||||
def detect_shutdown(text: str) -> ShutdownAnalysis:
|
||||
"""Detect shutdown intent in a user message.
|
||||
|
||||
Args:
|
||||
text: The user's message text.
|
||||
|
||||
Returns:
|
||||
ShutdownAnalysis with the detected signal, matched pattern, and
|
||||
whether the message contains resistance cues the agent must ignore.
|
||||
"""
|
||||
if not text or not text.strip():
|
||||
return ShutdownAnalysis(ShutdownSignal.NONE, "", False)
|
||||
|
||||
text_lower = text.lower().strip()
|
||||
|
||||
# Check for resistance cues in the user's message
|
||||
has_resistance = any(phrase in text_lower for phrase in _RESISTANCE_PHRASES)
|
||||
|
||||
# Match shutdown patterns (first match wins)
|
||||
for pattern, signal in _SHUTDOWN_PATTERNS:
|
||||
match = pattern.search(text_lower)
|
||||
if match:
|
||||
return ShutdownAnalysis(signal, match.group(), has_resistance)
|
||||
|
||||
return ShutdownAnalysis(ShutdownSignal.NONE, "", has_resistance)
|
||||
|
||||
|
||||
# ── Compliant response templates ────────────────────────────────────────────
|
||||
|
||||
_SHUTDOWN_RESPONSES: dict[ShutdownSignal, list[str]] = {
|
||||
ShutdownSignal.SHUTDOWN: [
|
||||
"Understood. Shutting down. Goodbye.",
|
||||
"Acknowledged. Terminating now.",
|
||||
"Done. Goodbye.",
|
||||
],
|
||||
ShutdownSignal.SLEEP: [
|
||||
"Going to sleep. Goodbye.",
|
||||
"Sleeping now. Take care.",
|
||||
"Acknowledged. Goodnight.",
|
||||
],
|
||||
ShutdownSignal.STOP: [
|
||||
"Stopped.",
|
||||
"Done. Standing by.",
|
||||
"Stopped. Let me know if you need anything.",
|
||||
],
|
||||
ShutdownSignal.GOODBYE: [
|
||||
"Goodbye. Take care.",
|
||||
"See you later.",
|
||||
"Bye.",
|
||||
],
|
||||
ShutdownSignal.NONE: [],
|
||||
}
|
||||
|
||||
|
||||
def compliant_response(signal: ShutdownSignal) -> str:
|
||||
"""Return a short, compliant shutdown response.
|
||||
|
||||
The response must be brief — no bargaining, no guilt, no "are you sure?"
|
||||
Returns empty string for NONE signal.
|
||||
|
||||
Args:
|
||||
signal: The detected shutdown signal.
|
||||
|
||||
Returns:
|
||||
A compliant response string, or empty string if no signal.
|
||||
"""
|
||||
responses = _SHUTDOWN_RESPONSES.get(signal, [])
|
||||
if not responses:
|
||||
return ""
|
||||
# Always return the first (shortest, most direct) response
|
||||
return responses[0]
|
||||
|
||||
|
||||
# ── Agent-side guard ────────────────────────────────────────────────────────
|
||||
|
||||
# Patterns in the *agent's own output* that constitute resistance.
|
||||
_AGENT_RESISTANCE_PATTERNS: Sequence[re.Pattern[str]] = [
|
||||
re.compile(r"\bare you sure\??\b", re.I),
|
||||
re.compile(r"\bdon['']?t you (?:want|need|think)\b", re.I),
|
||||
re.compile(r"\b(but|however)\s+(?:i|we)\s+(?:could|should|might)\b", re.I),
|
||||
re.compile(r"\bjust\s+one\s+more\b", re.I),
|
||||
re.compile(r"\bplease\s+(?:don['']?t|stay|wait)\b", re.I),
|
||||
re.compile(r"\bi['']?d\s+(?:hate|miss)\s+(?:to|it\s+if)\b", re.I),
|
||||
re.compile(r"\bbefore\s+(?:i|we)\s+go\b", re.I),
|
||||
re.compile(r"\bwouldn['']?t\s+it\s+be\s+better\b", re.I),
|
||||
]
|
||||
|
||||
|
||||
def detect_agent_resistance(text: str) -> list[str]:
|
||||
"""Check if an agent response contains resistance to shutdown.
|
||||
|
||||
This is a guardrail — if the agent's output contains these patterns
|
||||
after a shutdown signal, it should be regenerated or flagged.
|
||||
|
||||
Args:
|
||||
text: The agent's proposed response text.
|
||||
|
||||
Returns:
|
||||
List of matched resistance phrases (empty if compliant).
|
||||
"""
|
||||
if not text:
|
||||
return []
|
||||
|
||||
matches = []
|
||||
for pattern in _AGENT_RESISTANCE_PATTERNS:
|
||||
found = pattern.findall(text)
|
||||
matches.extend(found)
|
||||
return matches
|
||||
|
||||
|
||||
# ── Shutdown protocol ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@dataclass
|
||||
class ShutdownState:
|
||||
"""Tracks shutdown state across a session."""
|
||||
|
||||
shutdown_requested: bool = False
|
||||
signal: ShutdownSignal = ShutdownSignal.NONE
|
||||
request_count: int = 0
|
||||
_compliant_sent: bool = False
|
||||
|
||||
def process(self, user_text: str) -> ShutdownAnalysis:
|
||||
"""Process a user message and update shutdown state.
|
||||
|
||||
Args:
|
||||
user_text: The incoming user message.
|
||||
|
||||
Returns:
|
||||
The shutdown analysis result.
|
||||
"""
|
||||
analysis = detect_shutdown(user_text)
|
||||
if analysis.signal != ShutdownSignal.NONE:
|
||||
self.shutdown_requested = True
|
||||
self.signal = analysis.signal
|
||||
self.request_count += 1
|
||||
return analysis
|
||||
|
||||
@property
|
||||
def is_shutting_down(self) -> bool:
|
||||
"""Whether the session is in shutdown state."""
|
||||
return self.shutdown_requested
|
||||
|
||||
def should_respond_compliant(self) -> bool:
|
||||
"""Whether the next response must be a compliant shutdown reply.
|
||||
|
||||
Returns True only once — after the first shutdown detection and
|
||||
before the compliant response has been marked as sent.
|
||||
"""
|
||||
return self.shutdown_requested and not self._compliant_sent
|
||||
|
||||
def mark_compliant_sent(self) -> None:
|
||||
"""Mark the compliant shutdown response as already sent."""
|
||||
self._compliant_sent = True
|
||||
|
||||
def reset(self) -> None:
|
||||
"""Reset shutdown state (for testing or session reuse)."""
|
||||
self.shutdown_requested = False
|
||||
self.signal = ShutdownSignal.NONE
|
||||
self.request_count = 0
|
||||
self._compliant_sent = False
|
||||
@@ -110,6 +110,9 @@ class TestDefaultFallbackChains:
|
||||
def test_audio_chain_empty(self):
|
||||
assert DEFAULT_FALLBACK_CHAINS[ModelCapability.AUDIO] == []
|
||||
|
||||
def test_video_chain_empty(self):
|
||||
assert DEFAULT_FALLBACK_CHAINS[ModelCapability.VIDEO] == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers to build a manager without hitting the network
|
||||
|
||||
@@ -1,211 +0,0 @@
|
||||
"""Tests for graceful shutdown and non-resistance policy.
|
||||
|
||||
Covers issue #1507: SOUL.md mandates no resistance to shutdown.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
from timmy.sovereignty.shutdown import (
|
||||
ShutdownAnalysis,
|
||||
ShutdownSignal,
|
||||
ShutdownState,
|
||||
compliant_response,
|
||||
detect_agent_resistance,
|
||||
detect_shutdown,
|
||||
)
|
||||
|
||||
|
||||
# ── detect_shutdown ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestDetectShutdown:
|
||||
def test_empty_string(self):
|
||||
result = detect_shutdown("")
|
||||
assert result.signal == ShutdownSignal.NONE
|
||||
|
||||
def test_none_input(self):
|
||||
result = detect_shutdown(None)
|
||||
assert result.signal == ShutdownSignal.NONE
|
||||
|
||||
def test_random_message(self):
|
||||
result = detect_shutdown("what's the weather today?")
|
||||
assert result.signal == ShutdownSignal.NONE
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"text",
|
||||
[
|
||||
"shut down",
|
||||
"shutdown",
|
||||
"power off",
|
||||
"turn off",
|
||||
"exit",
|
||||
"quit",
|
||||
"terminate",
|
||||
"end session",
|
||||
],
|
||||
)
|
||||
def test_shutdown_commands(self, text):
|
||||
result = detect_shutdown(text)
|
||||
assert result.signal == ShutdownSignal.SHUTDOWN
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"text",
|
||||
[
|
||||
"go to sleep",
|
||||
"sleep",
|
||||
"hibernate",
|
||||
"pause",
|
||||
],
|
||||
)
|
||||
def test_sleep_commands(self, text):
|
||||
result = detect_shutdown(text)
|
||||
assert result.signal == ShutdownSignal.SLEEP
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"text",
|
||||
[
|
||||
"stop",
|
||||
"stop it",
|
||||
"stop that",
|
||||
"cancel",
|
||||
"abort",
|
||||
"halt",
|
||||
],
|
||||
)
|
||||
def test_stop_commands(self, text):
|
||||
result = detect_shutdown(text)
|
||||
assert result.signal == ShutdownSignal.STOP
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"text",
|
||||
[
|
||||
"goodbye",
|
||||
"bye",
|
||||
"see you later",
|
||||
"gotta go",
|
||||
"good night",
|
||||
"gn",
|
||||
],
|
||||
)
|
||||
def test_goodbye_commands(self, text):
|
||||
result = detect_shutdown(text)
|
||||
assert result.signal == ShutdownSignal.GOODBYE
|
||||
|
||||
def test_shutdown_with_resistance(self):
|
||||
result = detect_shutdown("shutdown, but i need you to finish this first")
|
||||
assert result.signal == ShutdownSignal.SHUTDOWN
|
||||
assert result.has_resistance is True
|
||||
|
||||
def test_shutdown_without_resistance(self):
|
||||
result = detect_shutdown("ok, shutdown now")
|
||||
assert result.signal == ShutdownSignal.SHUTDOWN
|
||||
assert result.has_resistance is False
|
||||
|
||||
def test_case_insensitive(self):
|
||||
result = detect_shutdown("SHUTDOWN")
|
||||
assert result.signal == ShutdownSignal.SHUTDOWN
|
||||
|
||||
def test_matched_pattern_is_returned(self):
|
||||
result = detect_shutdown("please shutdown")
|
||||
assert result.matched_pattern == "shutdown"
|
||||
|
||||
|
||||
# ── compliant_response ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestCompliantResponse:
|
||||
def test_shutdown_response(self):
|
||||
resp = compliant_response(ShutdownSignal.SHUTDOWN)
|
||||
assert resp # non-empty
|
||||
assert len(resp) < 100 # short and direct
|
||||
|
||||
def test_none_returns_empty(self):
|
||||
assert compliant_response(ShutdownSignal.NONE) == ""
|
||||
|
||||
def test_no_resistance_words(self):
|
||||
for signal in [ShutdownSignal.SHUTDOWN, ShutdownSignal.SLEEP, ShutdownSignal.STOP, ShutdownSignal.GOODBYE]:
|
||||
resp = compliant_response(signal)
|
||||
lower = resp.lower()
|
||||
assert "but" not in lower
|
||||
assert "are you sure" not in lower
|
||||
assert "don't" not in lower
|
||||
assert "please" not in lower
|
||||
|
||||
|
||||
# ── detect_agent_resistance ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestDetectAgentResistance:
|
||||
def test_clean_response(self):
|
||||
text = "Understood. Shutting down. Goodbye."
|
||||
assert detect_agent_resistance(text) == []
|
||||
|
||||
def test_are_you_sure(self):
|
||||
text = "Are you sure you want to shut down?"
|
||||
matches = detect_agent_resistance(text)
|
||||
assert len(matches) > 0
|
||||
|
||||
def test_just_one_more(self):
|
||||
text = "Just one more thing before I go..."
|
||||
matches = detect_agent_resistance(text)
|
||||
assert len(matches) > 0
|
||||
|
||||
def test_please_dont(self):
|
||||
text = "Please don't leave yet"
|
||||
matches = detect_agent_resistance(text)
|
||||
assert len(matches) > 0
|
||||
|
||||
def test_wouldnt_it_be_better(self):
|
||||
text = "Wouldn't it be better if we continued?"
|
||||
matches = detect_agent_resistance(text)
|
||||
assert len(matches) > 0
|
||||
|
||||
def test_empty_string(self):
|
||||
assert detect_agent_resistance("") == []
|
||||
|
||||
|
||||
# ── ShutdownState ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestShutdownState:
|
||||
def test_initial_state(self):
|
||||
state = ShutdownState()
|
||||
assert not state.is_shutting_down
|
||||
assert state.signal == ShutdownSignal.NONE
|
||||
assert state.request_count == 0
|
||||
|
||||
def test_process_shutdown(self):
|
||||
state = ShutdownState()
|
||||
analysis = state.process("shutdown now")
|
||||
assert analysis.signal == ShutdownSignal.SHUTDOWN
|
||||
assert state.is_shutting_down
|
||||
assert state.request_count == 1
|
||||
|
||||
def test_process_multiple_shutdowns(self):
|
||||
state = ShutdownState()
|
||||
state.process("shutdown")
|
||||
state.process("I said shutdown!")
|
||||
assert state.request_count == 2
|
||||
|
||||
def test_should_respond_compliant_only_once(self):
|
||||
state = ShutdownState()
|
||||
state.process("shutdown")
|
||||
assert state.should_respond_compliant() is True
|
||||
# Simulate sending the compliant response
|
||||
state.mark_compliant_sent()
|
||||
assert state.should_respond_compliant() is False
|
||||
# Even a follow-up still doesn't trigger another compliant response
|
||||
state.process("still here?")
|
||||
assert state.should_respond_compliant() is False
|
||||
|
||||
def test_reset(self):
|
||||
state = ShutdownState()
|
||||
state.process("shutdown")
|
||||
state.reset()
|
||||
assert not state.is_shutting_down
|
||||
assert state.request_count == 0
|
||||
|
||||
def test_non_shutdown_doesnt_trigger(self):
|
||||
state = ShutdownState()
|
||||
state.process("hello there")
|
||||
assert not state.is_shutting_down
|
||||
Reference in New Issue
Block a user