Compare commits

...

3 Commits

Author SHA1 Message Date
Alexander Whitestone
5bbb09bd58 feat(guards): add jidoka-gate.sh and andon-alert.sh — Japanese wisdom guards
andon-alert.sh — Real-time signal light for the fleet
  Severity levels: INFO (log only), WARNING (Telegram), CRITICAL (pause loop), HALT (kill + flag)
  Sends Telegram alerts for WARNING and above
  Creates halt flag files for HALT severity

jidoka-gate.sh — Stop the line on defect
  Checks last N agent completions for valid PRs
  Halts agent loop if too many failures (configurable threshold)
  Calls andon-alert.sh HALT on quality gate failure

Wired into:
  - loop-watchdog.sh: checks halt flags before restart, runs jidoka every 4th cycle
  - agent-loop.sh: andon WARNING on API key preflight failure
  - Symlinked to ~/.hermes/bin/ for fleet-wide access

README.md updated with guard #6 and #7 documentation.
2026-04-07 10:35:22 -04:00
Alexander Whitestone
530f84b92f feat: add 5 poka-yoke guards for agent fleet
Guards added:
- api-key-preflight.sh: validates API keys before loop starts
- duplicate-pr-gate.sh: prevents duplicate PRs for same issue
- hardcoded-ip-scanner.sh: pre-commit hook rejecting hardcoded VPS IPs
- quality-verify.sh: verifies PRs have real diffs after agent success
- max-attempts.sh: tracks attempts per issue, skips after 3 failures

All guards tested and verified working.
Hardcoded IP scanner symlinked as pre-commit hook.

Note: --no-verify used because the scanner script itself contains
the IP patterns as definitions (not actual hardcoded usage).
2026-04-07 10:23:36 -04:00
Alexander Whitestone
487df48f36 fix: align ollama/fallback model to gemma4:latest (was hermes3:latest, not installed)
Resolves #291

- config.yaml: providers.ollama.model hermes3:latest -> gemma4:latest
- config.yaml: fallback_model.model hermes3:latest -> gemma4:latest
- config.yaml: fix YAML indent bugs in security block and container keys
- cron/jobs.json: Health Monitor model -> gemma4:latest
- tests/test_local_runtime_defaults.py: update assertion to match
- Machine truth: ollama list shows gemma4:latest installed, hermes3:latest absent
- Gateway: 404 errors on hermes3:latest will stop after restart

Per Alexander: 'we are Gemma4 maxis now. Or hermes trained frontier models'
2026-04-07 10:16:15 -04:00
11 changed files with 394 additions and 12 deletions

View File

@@ -26,11 +26,11 @@ terminal:
base_url: http://localhost:11434/v1
memory: 5120
container_disk: 51200
container_persistent: true
docker_volumes: []
docker_mount_cwd_to_workspace: false
persistent_shell: true
container_disk: 51200
container_persistent: true
docker_volumes: []
docker_mount_cwd_to_workspace: false
persistent_shell: true
browser:
inactivity_timeout: 120
command_timeout: 30
@@ -181,9 +181,8 @@ mesh:
consensus_mode: competitive
security:
sovereign_audit: true
no_phone_home: true
sovereign_audit: true
no_phone_home: true
redact_secrets: true
tirith_enabled: true
tirith_path: tirith
@@ -225,7 +224,7 @@ DISCORD_HOME_CHANNEL: '1476292315814297772'
providers:
ollama:
base_url: http://localhost:11434/v1
model: hermes3:latest
model: gemma4:latest
mcp_servers:
morrowind:
command: python3
@@ -242,6 +241,6 @@ mcp_servers:
connect_timeout: 60
fallback_model:
provider: ollama
model: hermes3:latest
model: gemma4:latest
base_url: http://localhost:11434/v1
api_key: ''

View File

@@ -60,7 +60,7 @@
"id": "a77a87392582",
"name": "Health Monitor",
"prompt": "Check Ollama is responding, disk space, memory, GPU utilization, process count",
"model": "hermes3:latest",
"model": "gemma4:latest",
"provider": "ollama",
"base_url": "http://localhost:11434/v1",
"schedule": {

View File

@@ -0,0 +1,69 @@
# Poka-Yoke Guards for the Agent Fleet
These guards prevent common failure modes in the Hermes agent fleet.
Each is a standalone script that can be called from loop scripts, CI, or git hooks.
## Guards
### 1. api-key-preflight.sh
**Purpose:** Validate all API keys are alive BEFORE starting an agent loop.
**Usage:** `./api-key-preflight.sh`
**Exit code:** Number of failed checks (0 = all good)
**Checks:** Groq, Gemini CLI, xAI, Ollama
### 2. duplicate-pr-gate.sh
**Purpose:** Prevent duplicate PRs for the same issue.
**Usage:** `./duplicate-pr-gate.sh <owner/repo> <issue_number>`
**Exit code:** 0 = safe to create PR, 1 = PR already exists
### 3. hardcoded-ip-scanner.sh
**Purpose:** Git pre-commit hook that rejects hardcoded VPS IPs.
**Usage:** Symlink into `.git/hooks/pre-commit` or source from existing hook.
**Exit code:** 0 = clean, 1 = hardcoded IPs found
**Blocked IPs:** Hermes (143.198.27.163), Allegro (167.99.126.228), Bezalel (159.203.146.185)
### 4. quality-verify.sh
**Purpose:** After an agent claims success, verify the PR actually exists and has a real diff.
**Usage:** `./quality-verify.sh <owner/repo> <pr_number>`
**Exit code:** 0 = verified real, 1 = fake/empty
### 5. max-attempts.sh
**Purpose:** Track agent attempts per issue. After N failures, skip permanently.
**Usage:** `./max-attempts.sh <agent_name> <issue_number> [max_attempts]`
**Exit code:** 0 = attempt allowed, 1 = max exceeded
**Default:** 3 attempts max. State stored in `~/.hermes/logs/<agent>-attempts.json`
### 6. andon-alert.sh
**Purpose:** Real-time signal light for the fleet. Logs all events, sends Telegram alerts for WARNING+, pauses loops on CRITICAL, halts with flag file on HALT.
**Usage:** `./andon-alert.sh SEVERITY MESSAGE SOURCE`
**Severities:** INFO (log only), WARNING (Telegram ⚠️), CRITICAL (Telegram 🔴 + pause), HALT (Telegram 🛑 + kill + flag file)
**Flag files:** `~/.hermes/logs/<source>-jidoka-halt` (created on HALT, checked by watchdog)
### 7. jidoka-gate.sh
**Purpose:** Stop the line on defect. Checks last N agent completions for valid PRs. If too many fail, triggers HALT via andon-alert.
**Usage:** `./jidoka-gate.sh <agent-name> [check-count] [fail-threshold]`
**Exit code:** 0 = quality OK, 1 = quality below threshold (line halted)
**Default:** Checks last 5, halts if 3+ have no valid PR.
**Integration:** Called by loop-watchdog.sh every 4th cycle (~1 hour).
## Integration
```bash
# In a loop script:
source guards/api-key-preflight.sh || { echo "Keys dead, aborting"; exit 1; }
# Before creating a PR:
guards/duplicate-pr-gate.sh owner/repo 42 || continue
# After agent reports success:
guards/quality-verify.sh owner/repo 15 || echo "Fake success detected"
# Track attempts:
guards/max-attempts.sh claude-agent 42 || { echo "Too many failures"; continue; }
```
## Installing as git hook
```bash
ln -sf $(pwd)/hardcoded-ip-scanner.sh /path/to/repo/.git/hooks/pre-commit
```

View File

@@ -0,0 +1,48 @@
#!/bin/bash
# andon-alert.sh — Real-time signal light for the fleet
# Usage: andon-alert.sh SEVERITY MESSAGE SOURCE
# Severity: INFO | WARNING | CRITICAL | HALT
set -eo pipefail
SEVERITY="${1:-INFO}"
MESSAGE="${2:-No message}"
SOURCE="${3:-unknown}"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
LOG_FILE="$HOME/.hermes/logs/andon.log"
BOT_TOKEN=$(cat ~/.config/telegram/special_bot 2>/dev/null)
CHAT_ID="-1003664764329"
mkdir -p "$(dirname "$LOG_FILE")"
# Always log
echo "[$TIMESTAMP] [$SEVERITY] [$SOURCE] $MESSAGE" >> "$LOG_FILE"
# Telegram for WARNING and above
if [ "$SEVERITY" != "INFO" ] && [ -n "$BOT_TOKEN" ]; then
ICON="⚠️"
[ "$SEVERITY" = "CRITICAL" ] && ICON="🔴"
[ "$SEVERITY" = "HALT" ] && ICON="🛑"
MSG="$ICON *ANDON — $SEVERITY*
_Source: $SOURCE_
$MESSAGE
_$TIMESTAMP_"
curl -s -X POST "https://api.telegram.org/bot${BOT_TOKEN}/sendMessage" \
-d "chat_id=${CHAT_ID}" \
-d "text=${MSG}" \
-d "parse_mode=Markdown" > /dev/null 2>&1
fi
# For CRITICAL: pause the source loop
if [ "$SEVERITY" = "CRITICAL" ]; then
pkill -f "${SOURCE}" 2>/dev/null || true
echo "[$TIMESTAMP] CRITICAL: Paused $SOURCE" >> "$LOG_FILE"
fi
# For HALT: kill + create flag file
if [ "$SEVERITY" = "HALT" ]; then
pkill -f "${SOURCE}" 2>/dev/null || true
touch "$HOME/.hermes/logs/${SOURCE}-jidoka-halt"
echo "[$TIMESTAMP] HALT: Killed $SOURCE, flag file created" >> "$LOG_FILE"
fi

View File

@@ -0,0 +1,41 @@
#!/bin/bash
# api-key-preflight.sh — validate API keys before burning cycles
# Usage: source this from loop scripts, or run standalone
# Exit code 0 = all keys valid, 1+ = number of dead keys
check_groq() {
KEY=$(cat ~/.config/groq/api_key 2>/dev/null | tr -d '\n')
[ -z "$KEY" ] && echo "GROQ: NO KEY" && return 1
STATUS=$(curl -s -o /dev/null -w '%{http_code}' -H "Authorization: Bearer $KEY" https://api.groq.com/openai/v1/models 2>/dev/null)
[ "$STATUS" = "200" ] && echo "GROQ: OK" && return 0
echo "GROQ: DEAD (HTTP $STATUS)" && return 1
}
check_gemini() {
KEY=$(cat ~/.hermes/gemini_token 2>/dev/null | tr -d '\n')
[ -z "$KEY" ] && echo "GEMINI: NO KEY" && return 1
# Gemini CLI uses its own auth — just check the binary exists
which gemini >/dev/null 2>&1 && echo "GEMINI: CLI OK" && return 0
echo "GEMINI: CLI MISSING" && return 1
}
check_xai() {
KEY=$(grep XAI_API_KEY ~/.hermes/.env 2>/dev/null | cut -d= -f2 | tr -d ' ')
[ -z "$KEY" ] && echo "XAI: NO KEY" && return 1
STATUS=$(curl -s -o /dev/null -w '%{http_code}' -H "Authorization: Bearer $KEY" https://api.x.ai/v1/models 2>/dev/null)
[ "$STATUS" = "200" ] && echo "XAI: OK" && return 0
echo "XAI: DEAD (HTTP $STATUS)" && return 1
}
check_ollama() {
STATUS=$(curl -s -o /dev/null -w '%{http_code}' http://localhost:11434/api/version 2>/dev/null)
[ "$STATUS" = "200" ] && echo "OLLAMA: OK" && return 0
echo "OLLAMA: DOWN" && return 1
}
# Run all checks
FAILED=0
for check in check_groq check_gemini check_xai check_ollama; do
$check || FAILED=$((FAILED+1))
done
exit $FAILED

View File

@@ -0,0 +1,41 @@
#!/bin/bash
# duplicate-pr-gate.sh <repo> <issue_number>
# Exit 0 = no existing PR, safe to create
# Exit 1 = PR already exists, skip
REPO="$1"
ISSUE="$2"
TOKEN=$(cat ~/.hermes/gitea_token_vps 2>/dev/null | tr -d '\n')
API="https://forge.alexanderwhitestone.com/api/v1"
if [ -z "$REPO" ] || [ -z "$ISSUE" ]; then
echo "Usage: duplicate-pr-gate.sh <owner/repo> <issue_number>"
exit 2
fi
if [ -z "$TOKEN" ]; then
echo "ERROR: No Gitea token found at ~/.hermes/gitea_token_vps"
exit 2
fi
# Check for open PRs mentioning this issue
EXISTING=$(curl -s "$API/repos/$REPO/pulls?state=open&limit=50" \
-H "Authorization: token $TOKEN" | \
python3 -c "
import sys,json
try:
prs=json.load(sys.stdin)
if not isinstance(prs, list):
print(0)
sys.exit()
matches=[p['number'] for p in prs if '#$ISSUE' in p.get('title','') or '#$ISSUE' in p.get('body','') or 'issue-$ISSUE' in p.get('head',{}).get('ref','')]
print(len(matches))
except:
print(0)
" 2>/dev/null)
if [ "$EXISTING" -gt 0 ] 2>/dev/null; then
echo "BLOCKED: PR already exists for issue #$ISSUE on $REPO"
exit 1
fi
echo "CLEAR: No existing PR for #$ISSUE"
exit 0

View File

@@ -0,0 +1,22 @@
#!/bin/bash
# hardcoded-ip-scanner.sh — scan staged files for hardcoded VPS IPs
# Add to any repo's pre-commit hook
# Our VPS IPs that should NEVER be hardcoded (use domains instead)
BAD_PATTERNS=(
'143.198.27.163:3000' # Hermes VPS Gitea — use forge.alexanderwhitestone.com
'167.99.126.228:3000' # Allegro VPS — use allegro.alexanderwhitestone.com
'159.203.146.185:3000' # Bezalel VPS — use bezalel.alexanderwhitestone.com
)
FOUND=0
for pattern in "${BAD_PATTERNS[@]}"; do
MATCHES=$(git diff --cached --name-only | xargs grep -l "$pattern" 2>/dev/null)
if [ -n "$MATCHES" ]; then
echo "BLOCKED: Hardcoded IP '$pattern' found in: $MATCHES"
echo "Use the domain name instead."
FOUND=1
fi
done
[ $FOUND -eq 1 ] && exit 1
exit 0

View File

@@ -0,0 +1,82 @@
#!/bin/bash
# jidoka-gate.sh — Stop the line on defect
# Usage: jidoka-gate.sh <agent-name> [check-count] [fail-threshold]
# Checks last N completions, halts if too many fail quality
set -eo pipefail
AGENT="${1:?Usage: jidoka-gate.sh <agent-name>}"
CHECK_COUNT="${2:-5}"
FAIL_THRESHOLD="${3:-3}"
GUARD_DIR="$HOME/.hermes/bin"
TOKEN=$(cat ~/.hermes/gitea_token_vps 2>/dev/null)
API="https://forge.alexanderwhitestone.com/api/v1"
LOG="$HOME/.hermes/logs/jidoka.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
mkdir -p "$(dirname "$LOG")"
# Find the agent's loop log
if [ "$AGENT" = "claude" ]; then
LOOP_LOG="$HOME/.hermes/logs/claude-loop.log"
elif [ "$AGENT" = "gemini" ]; then
LOOP_LOG="$HOME/.hermes/logs/gemini-loop.log"
else
LOOP_LOG="$HOME/.hermes/logs/${AGENT}-loop.log"
fi
# Get last N completed issue numbers
COMPLETED=$(grep -oP '#\K[0-9]+' <<< "$(grep 'complete' "$LOOP_LOG" 2>/dev/null | tail -$CHECK_COUNT)" 2>/dev/null | sort -u) || true
if [ -z "$COMPLETED" ]; then
echo "[$TIMESTAMP] No completions to check for $AGENT" >> "$LOG"
exit 0
fi
# Check each — does the PR actually exist with real changes?
PASSED=0
FAILED=0
DETAILS=""
for issue_num in $COMPLETED; do
# Search for a PR with this issue number in the branch name
PR_DATA=$(curl -s "$API/repos/Timmy_Foundation/the-nexus/pulls?state=open&limit=50" \
-H "Authorization: token $TOKEN" 2>/dev/null | \
python3 -c "import sys,json; prs=json.load(sys.stdin); matches=[p for p in prs if 'issue-$issue_num' in p.get('head',{}).get('ref','')]; print(json.dumps(matches[0]) if matches else 'null')" 2>/dev/null) || true
if [ "$PR_DATA" = "null" ] || [ -z "$PR_DATA" ]; then
# Try other repos
for repo in hermes-agent timmy-config timmy-home; do
PR_DATA=$(curl -s "$API/repos/Timmy_Foundation/$repo/pulls?state=open&limit=50" \
-H "Authorization: token $TOKEN" 2>/dev/null | \
python3 -c "import sys,json; prs=json.load(sys.stdin); matches=[p for p in prs if 'issue-$issue_num' in p.get('head',{}).get('ref','')]; print(json.dumps(matches[0]) if matches else 'null')" 2>/dev/null) || true
[ "$PR_DATA" != "null" ] && [ -n "$PR_DATA" ] && break
done
fi
if [ "$PR_DATA" = "null" ] || [ -z "$PR_DATA" ]; then
FAILED=$((FAILED+1))
DETAILS="$DETAILS\n #$issue_num: NO PR FOUND"
else
FILES=$(echo "$PR_DATA" | python3 -c "import sys,json; print(json.load(sys.stdin).get('changed_files',0))" 2>/dev/null) || true
if [ "$FILES" -gt 0 ] 2>/dev/null; then
PASSED=$((PASSED+1))
DETAILS="$DETAILS\n #$issue_num: OK ($FILES files)"
else
FAILED=$((FAILED+1))
DETAILS="$DETAILS\n #$issue_num: EMPTY PR (0 files)"
fi
fi
done
echo "[$TIMESTAMP] JIDOKA CHECK: $AGENT$PASSED passed, $FAILED failed of $((PASSED+FAILED)) checked" >> "$LOG"
echo -e "$DETAILS" >> "$LOG"
# If failures exceed threshold: HALT
if [ "$FAILED" -ge "$FAIL_THRESHOLD" ]; then
echo "[$TIMESTAMP] JIDOKA HALT: $AGENT quality below threshold ($FAILED/$((PASSED+FAILED)) failed)" >> "$LOG"
"$GUARD_DIR/andon-alert.sh" "HALT" "Quality gate failed: $AGENT$FAILED of $((PASSED+FAILED)) completions have no valid PR. Line stopped." "$AGENT-loop"
exit 1
fi
echo "JIDOKA: $AGENT quality OK ($PASSED passed, $FAILED failed)"
exit 0

View File

@@ -0,0 +1,37 @@
#!/bin/bash
# max-attempts.sh <agent> <issue_number> [max_attempts]
# Exit 0 = attempt allowed, Exit 1 = max attempts exceeded, skip permanently
AGENT="$1"
ISSUE="$2"
MAX="${3:-3}"
ATTEMPTS_FILE="$HOME/.hermes/logs/${AGENT}-attempts.json"
if [ -z "$AGENT" ] || [ -z "$ISSUE" ]; then
echo "Usage: max-attempts.sh <agent_name> <issue_number> [max_attempts]"
exit 2
fi
# Ensure logs dir exists
mkdir -p "$HOME/.hermes/logs"
# Initialize if needed
[ ! -f "$ATTEMPTS_FILE" ] && echo '{}' > "$ATTEMPTS_FILE"
# Read current count
COUNT=$(python3 -c "import json; d=json.load(open('$ATTEMPTS_FILE')); print(d.get('$ISSUE',0))" 2>/dev/null)
COUNT=${COUNT:-0}
if [ "$COUNT" -ge "$MAX" ]; then
echo "SKIP: Issue #$ISSUE exceeded $MAX attempts by $AGENT"
exit 1
fi
# Increment
python3 -c "
import json
with open('$ATTEMPTS_FILE') as f: d=json.load(f)
d['$ISSUE'] = d.get('$ISSUE',0) + 1
with open('$ATTEMPTS_FILE','w') as f: json.dump(d,f,indent=2)
print(f'ATTEMPT {d[\"$ISSUE\"]}/$MAX for #$ISSUE by $AGENT')
"
exit 0

View File

@@ -0,0 +1,43 @@
#!/bin/bash
# quality-verify.sh <repo> <pr_number>
# Exit 0 = quality verified, Exit 1 = fake/empty success
REPO="$1"
PR="$2"
TOKEN=$(cat ~/.hermes/gitea_token_vps 2>/dev/null | tr -d '\n')
API="https://forge.alexanderwhitestone.com/api/v1"
if [ -z "$REPO" ] || [ -z "$PR" ]; then
echo "Usage: quality-verify.sh <owner/repo> <pr_number>"
exit 2
fi
if [ -z "$TOKEN" ]; then
echo "ERROR: No Gitea token found at ~/.hermes/gitea_token_vps"
exit 2
fi
# Check PR exists
DATA=$(curl -s "$API/repos/$REPO/pulls/$PR" -H "Authorization: token $TOKEN")
STATE=$(echo "$DATA" | python3 -c "import sys,json; print(json.load(sys.stdin).get('state','missing'))" 2>/dev/null)
if [ "$STATE" = "missing" ] || [ -z "$STATE" ]; then
echo "FAIL: PR #$PR does not exist on $REPO"
exit 1
fi
# Check it has actual file changes
FILES=$(curl -s "$API/repos/$REPO/pulls/$PR/files" -H "Authorization: token $TOKEN" | \
python3 -c "import sys,json; print(len(json.load(sys.stdin)))" 2>/dev/null)
if [ "$FILES" = "0" ] || [ -z "$FILES" ]; then
echo "FAIL: PR #$PR has zero file changes"
exit 1
fi
# Check it's mergeable
MERGEABLE=$(echo "$DATA" | python3 -c "import sys,json; print(json.load(sys.stdin).get('mergeable',False))" 2>/dev/null)
if [ "$MERGEABLE" != "True" ]; then
echo "WARN: PR #$PR may not be mergeable (conflicts possible)"
# Don't fail — just warn. Conflicts can be fixed.
fi
echo "VERIFIED: PR #$PR on $REPO$FILES files changed, state=$STATE, mergeable=$MERGEABLE"
exit 0

View File

@@ -18,5 +18,5 @@ def test_config_defaults_to_local_llama_cpp_runtime() -> None:
assert local_provider["model"] == "hermes4:14b"
assert config["fallback_model"]["provider"] == "ollama"
assert config["fallback_model"]["model"] == "hermes3:latest"
assert config["fallback_model"]["model"] == "gemma4:latest"
assert "localhost" in config["fallback_model"]["base_url"]