Compare commits
3 Commits
fix/ci-val
...
feat/sover
| Author | SHA1 | Date | |
|---|---|---|---|
| 6ad6469c40 | |||
|
|
3af63cf172 | ||
|
|
6d713aeeb9 |
@@ -1,54 +0,0 @@
|
||||
## Summary
|
||||
|
||||
<!-- What changed and why. One paragraph max. -->
|
||||
|
||||
## Governing Issue
|
||||
|
||||
<!-- REQUIRED. Every PR must reference at least one issue. Max 3 issues per PR. -->
|
||||
<!-- Closes #ISSUENUM -->
|
||||
<!-- Refs #ISSUENUM -->
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
<!-- List the specific outcomes this PR delivers. Check each only when proven. -->
|
||||
<!-- Copy these from the governing issue if it has them. -->
|
||||
|
||||
- [ ] Criterion 1
|
||||
- [ ] Criterion 2
|
||||
|
||||
## Proof
|
||||
|
||||
<!-- No proof = no merge. See CONTRIBUTING.md for the full standard. -->
|
||||
|
||||
### Commands / logs / world-state proof
|
||||
|
||||
<!-- Paste the exact commands, output, log paths, or world-state artifacts that prove each acceptance criterion was met. -->
|
||||
|
||||
```
|
||||
$ <command you ran>
|
||||
<relevant output>
|
||||
```
|
||||
|
||||
### Visual proof (if applicable)
|
||||
|
||||
<!-- For skin updates, UI changes, dashboard changes: attach screenshot to the PR discussion. -->
|
||||
<!-- Name what the screenshot proves. Do not commit binary media unless explicitly required. -->
|
||||
|
||||
## Risk and Rollback
|
||||
|
||||
<!-- What could go wrong? How do we undo it? -->
|
||||
|
||||
- **Risk level:** low / medium / high
|
||||
- **What breaks if this is wrong:**
|
||||
- **How to rollback:**
|
||||
|
||||
## Checklist
|
||||
|
||||
<!-- Complete every item before requesting review. -->
|
||||
|
||||
- [ ] PR body references at least one issue number (`Closes #N` or `Refs #N`)
|
||||
- [ ] Changed files are syntactically valid (`python -c "import ast; ast.parse(open(f).read())"`, `node --check`, `bash -n`)
|
||||
- [ ] Proof meets CONTRIBUTING.md standard (exact commands, output, or artifacts — not "looks right")
|
||||
- [ ] Branch is up-to-date with base
|
||||
- [ ] No more than 3 unrelated issues bundled in this PR
|
||||
- [ ] Shell scripts are executable (`chmod +x`)
|
||||
@@ -1,41 +0,0 @@
|
||||
# architecture-lint.yml — CI gate for the Architecture Linter v2
|
||||
# Refs: #437 — repo-aware, test-backed, CI-enforced.
|
||||
#
|
||||
# Runs on every PR to main. Validates Python syntax, then runs
|
||||
# linter tests and finally lints the repo itself.
|
||||
|
||||
name: Architecture Lint
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
branches: [main, master]
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
linter-tests:
|
||||
name: Linter Tests
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
- name: Install test deps
|
||||
run: pip install pytest
|
||||
- name: Compile-check linter
|
||||
run: python3 -m py_compile scripts/architecture_linter_v2.py
|
||||
- name: Run linter tests
|
||||
run: python3 -m pytest tests/test_linter.py -v
|
||||
|
||||
lint-repo:
|
||||
name: Lint Repository
|
||||
runs-on: ubuntu-latest
|
||||
needs: linter-tests
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
- name: Run architecture linter
|
||||
run: python3 scripts/architecture_linter_v2.py .
|
||||
@@ -112,10 +112,23 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.11'
|
||||
- name: Install PyYAML
|
||||
run: pip install pyyaml
|
||||
- name: Validate playbook structure
|
||||
run: python3 scripts/validate_playbook_schema.py
|
||||
run: |
|
||||
python3 -c "
|
||||
import yaml, sys, glob
|
||||
required_keys = {'name', 'description'}
|
||||
for f in glob.glob('playbooks/*.yaml'):
|
||||
with open(f) as fh:
|
||||
try:
|
||||
data = yaml.safe_load(fh)
|
||||
if not isinstance(data, dict):
|
||||
print(f'ERROR: {f} is not a YAML mapping')
|
||||
sys.exit(1)
|
||||
missing = required_keys - set(data.keys())
|
||||
if missing:
|
||||
print(f'WARNING: {f} missing keys: {missing}')
|
||||
print(f'OK: {f}')
|
||||
except yaml.YAMLError as e:
|
||||
print(f'ERROR: {f}: {e}')
|
||||
sys.exit(1)
|
||||
"
|
||||
24
.gitignore
vendored
24
.gitignore
vendored
@@ -10,27 +10,3 @@ __pycache__/
|
||||
|
||||
# Generated audit reports
|
||||
reports/
|
||||
|
||||
# Secrets and credentials
|
||||
.bash_history
|
||||
.git-credentials
|
||||
.gitea_token
|
||||
.ssh/id_*
|
||||
.ssh/known_hosts
|
||||
.viminfo
|
||||
.wget-hsts
|
||||
.profile
|
||||
.bashrc
|
||||
.bash_logout
|
||||
.python_history
|
||||
.lesshst
|
||||
.selected_editor
|
||||
.sudo_as_admin_successful
|
||||
.config/telegram/
|
||||
.hermes/.env
|
||||
.hermes/auth.json
|
||||
*.pem
|
||||
*.key
|
||||
.env
|
||||
.env.*
|
||||
!.env.example
|
||||
|
||||
63
BANNED_PROVIDERS.md
Normal file
63
BANNED_PROVIDERS.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# Banned Providers
|
||||
|
||||
This document is a hard policy. It is not advisory. It is not aspirational.
|
||||
Any agent, wizard, or automated process that violates this policy is broken
|
||||
and must be fixed immediately.
|
||||
|
||||
## Permanently Banned
|
||||
|
||||
### Anthropic (Claude)
|
||||
|
||||
**Status:** BANNED — April 2026
|
||||
**Scope:** All configs, fallback chains, playbooks, wizard bootstraps, and fleet scripts.
|
||||
**Enforcement:** Pre-commit hook, architecture linter, sovereignty enforcement tests.
|
||||
|
||||
No Anthropic model (Claude Opus, Sonnet, Haiku, or any variant) may appear as:
|
||||
- A primary provider
|
||||
- A fallback provider
|
||||
- An OpenRouter model slug (e.g. `anthropic/claude-*`)
|
||||
- An API endpoint (api.anthropic.com)
|
||||
- A required dependency (`anthropic` pip package)
|
||||
- An environment variable (`ANTHROPIC_API_KEY`, `ANTHROPIC_TOKEN`)
|
||||
|
||||
### What to use instead
|
||||
|
||||
| Was | Now |
|
||||
|-----|-----|
|
||||
| claude-opus-4-6 | kimi-k2.5 |
|
||||
| claude-sonnet-4-20250514 | kimi-k2.5 |
|
||||
| claude-haiku | google/gemini-2.5-pro |
|
||||
| anthropic (provider) | kimi-coding |
|
||||
| anthropic/claude-* (OpenRouter) | google/gemini-2.5-pro |
|
||||
| ANTHROPIC_API_KEY | KIMI_API_KEY |
|
||||
|
||||
### Exceptions
|
||||
|
||||
The following files may reference Anthropic for **historical or defensive** purposes:
|
||||
|
||||
- `training/` — Training data must not be altered
|
||||
- `evaluations/` — Historical benchmark results
|
||||
- `RELEASE_*.md` — Changelogs
|
||||
- `metrics_helpers.py` — Historical cost calculation
|
||||
- `pre-commit.py` — Detects leaked Anthropic keys (defensive)
|
||||
- `secret-scan.yml` — Detects leaked Anthropic keys (defensive)
|
||||
- `architecture_linter.py` — Warns/blocks Anthropic usage (enforcement)
|
||||
- `test_sovereignty_enforcement.py` — Tests that Anthropic is blocked (enforcement)
|
||||
|
||||
### Golden State
|
||||
|
||||
```yaml
|
||||
fallback_providers:
|
||||
- provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
reason: Primary
|
||||
- provider: openrouter
|
||||
model: google/gemini-2.5-pro
|
||||
reason: Cloud fallback
|
||||
- provider: ollama
|
||||
model: gemma4:latest
|
||||
base_url: http://localhost:11434/v1
|
||||
reason: Terminal fallback — never phones home
|
||||
```
|
||||
|
||||
*Sovereignty and service always.*
|
||||
@@ -51,11 +51,11 @@ Alexander is pleased with the state. This tag marks a high-water mark.
|
||||
| OAI-Wolf-3 | 8683 | hermes gateway | ACTIVE |
|
||||
|
||||
- Disk: 12G/926G (4%) — pristine
|
||||
- Primary model: claude-opus-4-6 via Anthropic
|
||||
- Primary model: kimi-k2.5 via Kimi
|
||||
- Fallback chain: codex → kimi-k2.5 → gemini-2.5-flash → llama-3.3-70b → grok-3-mini-fast → kimi → grok → kimi → gpt-4.1-mini
|
||||
- Ollama models: gemma4:latest (9.6GB), hermes4:14b (9.0GB)
|
||||
- Worktrees: 239 (9.8GB) — prune candidates exist
|
||||
- Running loops: 3 claude-loops, 3 gemini-loops, orchestrator, status watcher
|
||||
- Running loops: 3 gemini-loops, orchestrator, status watcher
|
||||
- LaunchD: hermes gateway running, fenrir stopped, kimi-heartbeat idle
|
||||
- MCP: morrowind server active
|
||||
|
||||
|
||||
@@ -1,47 +0,0 @@
|
||||
# =============================================================================
|
||||
# BANNED PROVIDERS — The Timmy Foundation
|
||||
# =============================================================================
|
||||
# "Anthropic is not only fired, but banned. I don't want these errors
|
||||
# cropping up." — Alexander, 2026-04-09
|
||||
#
|
||||
# This is a HARD BAN. Not deprecated. Not fallback. BANNED.
|
||||
# Enforcement: pre-commit hook, linter, Ansible validation, CI tests.
|
||||
# =============================================================================
|
||||
|
||||
banned_providers:
|
||||
- name: anthropic
|
||||
reason: "Permanently banned. SDK access gated despite active quota. Fleet was bricked because golden state pointed to Anthropic Sonnet."
|
||||
banned_date: "2026-04-09"
|
||||
enforcement: strict # Ansible playbook FAILS if detected
|
||||
models:
|
||||
- "claude-sonnet-*"
|
||||
- "claude-opus-*"
|
||||
- "claude-haiku-*"
|
||||
- "claude-*"
|
||||
endpoints:
|
||||
- "api.anthropic.com"
|
||||
- "anthropic/*" # OpenRouter pattern
|
||||
api_keys:
|
||||
- "ANTHROPIC_API_KEY"
|
||||
- "CLAUDE_API_KEY"
|
||||
|
||||
# Golden state alternative:
|
||||
approved_providers:
|
||||
- name: kimi-coding
|
||||
model: kimi-k2.5
|
||||
role: primary
|
||||
- name: openrouter
|
||||
model: google/gemini-2.5-pro
|
||||
role: fallback
|
||||
- name: ollama
|
||||
model: "gemma4:latest"
|
||||
role: terminal_fallback
|
||||
|
||||
# Future evaluation:
|
||||
evaluation_candidates:
|
||||
- name: mimo-v2-pro
|
||||
status: pending
|
||||
notes: "Free via Nous Portal for ~2 weeks from 2026-04-07. Add after fallback chain is fixed."
|
||||
- name: hermes-4
|
||||
status: available
|
||||
notes: "Free on Nous Portal. 36B and 70B variants. Home team model."
|
||||
@@ -1,95 +0,0 @@
|
||||
# Ansible IaC — The Timmy Foundation Fleet
|
||||
|
||||
> One canonical Ansible playbook defines: deadman switch, cron schedule,
|
||||
> golden state rollback, agent startup sequence.
|
||||
> — KT Final Session 2026-04-08, Priority TWO
|
||||
|
||||
## Purpose
|
||||
|
||||
This directory contains the **single source of truth** for fleet infrastructure.
|
||||
No more ad-hoc recovery implementations. No more overlapping deadman switches.
|
||||
No more agents mutating their own configs into oblivion.
|
||||
|
||||
**Everything** goes through Ansible. If it's not in a playbook, it doesn't exist.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Gitea (Source of Truth) │
|
||||
│ timmy-config/ansible/ │
|
||||
│ ├── inventory/hosts.yml (fleet machines) │
|
||||
│ ├── playbooks/site.yml (master playbook) │
|
||||
│ ├── roles/ (reusable roles) │
|
||||
│ └── group_vars/wizards.yml (golden state) │
|
||||
└──────────────────┬──────────────────────────────┘
|
||||
│ PR merge triggers webhook
|
||||
▼
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Gitea Webhook Handler │
|
||||
│ scripts/deploy_on_webhook.sh │
|
||||
│ → ansible-pull on each target machine │
|
||||
└──────────────────┬──────────────────────────────┘
|
||||
│ ansible-pull
|
||||
▼
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│ Timmy │ │ Allegro │ │ Bezalel │ │ Ezra │
|
||||
│ (Mac) │ │ (VPS) │ │ (VPS) │ │ (VPS) │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ deadman │ │ deadman │ │ deadman │ │ deadman │
|
||||
│ cron │ │ cron │ │ cron │ │ cron │
|
||||
│ golden │ │ golden │ │ golden │ │ golden │
|
||||
│ req_log │ │ req_log │ │ req_log │ │ req_log │
|
||||
└──────────┘ └──────────┘ └──────────┘ └──────────┘
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Deploy everything to all machines
|
||||
ansible-playbook -i inventory/hosts.yml playbooks/site.yml
|
||||
|
||||
# Deploy only golden state config
|
||||
ansible-playbook -i inventory/hosts.yml playbooks/golden_state.yml
|
||||
|
||||
# Deploy only to a specific wizard
|
||||
ansible-playbook -i inventory/hosts.yml playbooks/site.yml --limit bezalel
|
||||
|
||||
# Dry run (check mode)
|
||||
ansible-playbook -i inventory/hosts.yml playbooks/site.yml --check --diff
|
||||
```
|
||||
|
||||
## Golden State Provider Chain
|
||||
|
||||
All wizard configs converge on this provider chain. **Anthropic is BANNED.**
|
||||
|
||||
| Priority | Provider | Model | Endpoint |
|
||||
| -------- | -------------------- | ---------------- | --------------------------------- |
|
||||
| 1 | Kimi | kimi-k2.5 | https://api.kimi.com/coding/v1 |
|
||||
| 2 | Gemini (OpenRouter) | gemini-2.5-pro | https://openrouter.ai/api/v1 |
|
||||
| 3 | Ollama (local) | gemma4:latest | http://localhost:11434/v1 |
|
||||
|
||||
## Roles
|
||||
|
||||
| Role | Purpose |
|
||||
| ---------------- | ------------------------------------------------------------ |
|
||||
| `wizard_base` | Common wizard setup: directories, thin config, git pull |
|
||||
| `deadman_switch` | Health check → snapshot good config → rollback on death |
|
||||
| `golden_state` | Deploy and enforce golden state provider chain |
|
||||
| `request_log` | SQLite telemetry table for every inference call |
|
||||
| `cron_manager` | Source-controlled cron jobs — no manual crontab edits |
|
||||
|
||||
## Rules
|
||||
|
||||
1. **No manual changes.** If it's not in a playbook, it will be overwritten.
|
||||
2. **No Anthropic.** Banned. Enforcement is automated. See `BANNED_PROVIDERS.yml`.
|
||||
3. **Idempotent.** Every playbook can run 100 times with the same result.
|
||||
4. **PR required.** Config changes go through Gitea PR review, then deploy.
|
||||
5. **One identity per machine.** No duplicate agents. Fleet audit enforces this.
|
||||
|
||||
## Related Issues
|
||||
|
||||
- timmy-config #442: [P2] Ansible IaC Canonical Playbook
|
||||
- timmy-config #444: Wire Deadman Switch ACTION
|
||||
- timmy-config #443: Thin Config Pattern
|
||||
- timmy-config #446: request_log Telemetry Table
|
||||
@@ -1,21 +0,0 @@
|
||||
[defaults]
|
||||
inventory = inventory/hosts.yml
|
||||
roles_path = roles
|
||||
host_key_checking = False
|
||||
retry_files_enabled = False
|
||||
stdout_callback = yaml
|
||||
forks = 10
|
||||
timeout = 30
|
||||
|
||||
# Logging
|
||||
log_path = /var/log/ansible/timmy-fleet.log
|
||||
|
||||
[privilege_escalation]
|
||||
become = True
|
||||
become_method = sudo
|
||||
become_user = root
|
||||
become_ask_pass = False
|
||||
|
||||
[ssh_connection]
|
||||
pipelining = True
|
||||
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no
|
||||
@@ -1,74 +0,0 @@
|
||||
# =============================================================================
|
||||
# Wizard Group Variables — Golden State Configuration
|
||||
# =============================================================================
|
||||
# These variables are applied to ALL wizards in the fleet.
|
||||
# This IS the golden state. If a wizard deviates, Ansible corrects it.
|
||||
# =============================================================================
|
||||
|
||||
# --- Deadman Switch ---
|
||||
deadman_enabled: true
|
||||
deadman_check_interval: 300 # 5 minutes between health checks
|
||||
deadman_snapshot_dir: "~/.local/timmy/snapshots"
|
||||
deadman_max_snapshots: 10 # Rolling window of good configs
|
||||
deadman_restart_cooldown: 60 # Seconds to wait before restart after failure
|
||||
deadman_max_restart_attempts: 3
|
||||
deadman_escalation_channel: telegram # Alert Alexander after max attempts
|
||||
|
||||
# --- Thin Config ---
|
||||
thin_config_path: "~/.timmy/thin_config.yml"
|
||||
thin_config_mode: "0444" # Read-only — agents CANNOT modify
|
||||
upstream_repo: "https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config.git"
|
||||
upstream_branch: main
|
||||
config_pull_on_wake: true
|
||||
config_validation_enabled: true
|
||||
|
||||
# --- Agent Settings ---
|
||||
agent_max_turns: 30
|
||||
agent_reasoning_effort: high
|
||||
agent_verbose: false
|
||||
agent_approval_mode: auto
|
||||
|
||||
# --- Hermes Harness ---
|
||||
hermes_config_dir: "{{ hermes_home }}"
|
||||
hermes_bin_dir: "{{ hermes_home }}/bin"
|
||||
hermes_skins_dir: "{{ hermes_home }}/skins"
|
||||
hermes_playbooks_dir: "{{ hermes_home }}/playbooks"
|
||||
hermes_memories_dir: "{{ hermes_home }}/memories"
|
||||
|
||||
# --- Request Log (Telemetry) ---
|
||||
request_log_enabled: true
|
||||
request_log_path: "~/.local/timmy/request_log.db"
|
||||
request_log_rotation_days: 30 # Archive logs older than 30 days
|
||||
request_log_sync_to_gitea: false # Future: push telemetry summaries to Gitea
|
||||
|
||||
# --- Cron Schedule ---
|
||||
# All cron jobs are managed here. No manual crontab edits.
|
||||
cron_jobs:
|
||||
- name: "Deadman health check"
|
||||
job: "cd {{ wizard_home }}/workspace/timmy-config && python3 fleet/health_check.py"
|
||||
minute: "*/5"
|
||||
hour: "*"
|
||||
enabled: "{{ deadman_enabled }}"
|
||||
|
||||
- name: "Muda audit"
|
||||
job: "cd {{ wizard_home }}/workspace/timmy-config && bash fleet/muda-audit.sh >> /tmp/muda-audit.log 2>&1"
|
||||
minute: "0"
|
||||
hour: "21"
|
||||
weekday: "0"
|
||||
enabled: true
|
||||
|
||||
- name: "Config pull from upstream"
|
||||
job: "cd {{ wizard_home }}/workspace/timmy-config && git pull --ff-only origin main"
|
||||
minute: "*/15"
|
||||
hour: "*"
|
||||
enabled: "{{ config_pull_on_wake }}"
|
||||
|
||||
- name: "Request log rotation"
|
||||
job: "python3 -c \"import sqlite3,datetime; db=sqlite3.connect('{{ request_log_path }}'); db.execute('DELETE FROM request_log WHERE timestamp < datetime(\\\"now\\\", \\\"-{{ request_log_rotation_days }} days\\\")'); db.commit()\""
|
||||
minute: "0"
|
||||
hour: "3"
|
||||
enabled: "{{ request_log_enabled }}"
|
||||
|
||||
# --- Provider Enforcement ---
|
||||
# These are validated on every Ansible run. Any Anthropic reference = failure.
|
||||
provider_ban_enforcement: strict # strict = fail playbook, warn = log only
|
||||
@@ -1,119 +0,0 @@
|
||||
# =============================================================================
|
||||
# Fleet Inventory — The Timmy Foundation
|
||||
# =============================================================================
|
||||
# Source of truth for all machines in the fleet.
|
||||
# Update this file when machines are added/removed.
|
||||
# All changes go through PR review.
|
||||
# =============================================================================
|
||||
|
||||
all:
|
||||
children:
|
||||
wizards:
|
||||
hosts:
|
||||
timmy:
|
||||
ansible_host: localhost
|
||||
ansible_connection: local
|
||||
wizard_name: Timmy
|
||||
wizard_role: "Primary wizard — soul of the fleet"
|
||||
wizard_provider_primary: kimi-coding
|
||||
wizard_model_primary: kimi-k2.5
|
||||
hermes_port: 8081
|
||||
api_port: 8645
|
||||
wizard_home: "{{ ansible_env.HOME }}/wizards/timmy"
|
||||
hermes_home: "{{ ansible_env.HOME }}/.hermes"
|
||||
machine_type: mac
|
||||
# Timmy runs on Alexander's M3 Max
|
||||
ollama_available: true
|
||||
|
||||
allegro:
|
||||
ansible_host: 167.99.126.228
|
||||
ansible_user: root
|
||||
wizard_name: Allegro
|
||||
wizard_role: "Kimi-backed third wizard house — tight coding tasks"
|
||||
wizard_provider_primary: kimi-coding
|
||||
wizard_model_primary: kimi-k2.5
|
||||
hermes_port: 8081
|
||||
api_port: 8645
|
||||
wizard_home: /root/wizards/allegro
|
||||
hermes_home: /root/.hermes
|
||||
machine_type: vps
|
||||
ollama_available: false
|
||||
|
||||
bezalel:
|
||||
ansible_host: 159.203.146.185
|
||||
ansible_user: root
|
||||
wizard_name: Bezalel
|
||||
wizard_role: "Forge-and-testbed wizard — infrastructure, deployment, hardening"
|
||||
wizard_provider_primary: kimi-coding
|
||||
wizard_model_primary: kimi-k2.5
|
||||
hermes_port: 8081
|
||||
api_port: 8656
|
||||
wizard_home: /root/wizards/bezalel
|
||||
hermes_home: /root/.hermes
|
||||
machine_type: vps
|
||||
ollama_available: false
|
||||
# NOTE: The awake Bezalel may be the duplicate.
|
||||
# Fleet audit (the-nexus #1144) will resolve identity.
|
||||
|
||||
ezra:
|
||||
ansible_host: 143.198.27.163
|
||||
ansible_user: root
|
||||
wizard_name: Ezra
|
||||
wizard_role: "Infrastructure wizard — Gitea, nginx, hosting"
|
||||
wizard_provider_primary: kimi-coding
|
||||
wizard_model_primary: kimi-k2.5
|
||||
hermes_port: 8081
|
||||
api_port: 8645
|
||||
wizard_home: /root/wizards/ezra
|
||||
hermes_home: /root/.hermes
|
||||
machine_type: vps
|
||||
ollama_available: false
|
||||
# NOTE: Currently DOWN — Telegram key revoked, awaiting propagation.
|
||||
|
||||
# Infrastructure hosts (not wizards, but managed by Ansible)
|
||||
infrastructure:
|
||||
hosts:
|
||||
forge:
|
||||
ansible_host: 143.198.27.163
|
||||
ansible_user: root
|
||||
# Gitea runs on the same box as Ezra
|
||||
gitea_url: https://forge.alexanderwhitestone.com
|
||||
gitea_org: Timmy_Foundation
|
||||
|
||||
vars:
|
||||
# Global variables applied to all hosts
|
||||
gitea_repo_url: "https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config.git"
|
||||
gitea_branch: main
|
||||
config_base_path: "{{ gitea_repo_url }}"
|
||||
timmy_log_dir: "~/.local/timmy/fleet-health"
|
||||
request_log_db: "~/.local/timmy/request_log.db"
|
||||
|
||||
# Golden state provider chain — Anthropic is BANNED
|
||||
golden_state_providers:
|
||||
- name: kimi-coding
|
||||
model: kimi-k2.5
|
||||
base_url: "https://api.kimi.com/coding/v1"
|
||||
timeout: 120
|
||||
reason: "Primary — Kimi K2.5 (best value, least friction)"
|
||||
- name: openrouter
|
||||
model: google/gemini-2.5-pro
|
||||
base_url: "https://openrouter.ai/api/v1"
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
timeout: 120
|
||||
reason: "Fallback — Gemini 2.5 Pro via OpenRouter"
|
||||
- name: ollama
|
||||
model: "gemma4:latest"
|
||||
base_url: "http://localhost:11434/v1"
|
||||
timeout: 180
|
||||
reason: "Terminal fallback — local Ollama (sovereign, no API needed)"
|
||||
|
||||
# Banned providers — hard enforcement
|
||||
banned_providers:
|
||||
- anthropic
|
||||
- claude
|
||||
banned_models_patterns:
|
||||
- "claude-*"
|
||||
- "anthropic/*"
|
||||
- "*sonnet*"
|
||||
- "*opus*"
|
||||
- "*haiku*"
|
||||
@@ -1,98 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# agent_startup.yml — Resurrect Wizards from Checked-in Configs
|
||||
# =============================================================================
|
||||
# Brings wizards back online using golden state configs.
|
||||
# Order: pull config → validate → start agent → verify with request_log
|
||||
# =============================================================================
|
||||
|
||||
- name: "Agent Startup Sequence"
|
||||
hosts: wizards
|
||||
become: true
|
||||
serial: 1 # One wizard at a time to avoid cascading issues
|
||||
|
||||
tasks:
|
||||
- name: "Pull latest config from upstream"
|
||||
git:
|
||||
repo: "{{ upstream_repo }}"
|
||||
dest: "{{ wizard_home }}/workspace/timmy-config"
|
||||
version: "{{ upstream_branch }}"
|
||||
force: true
|
||||
tags: [pull]
|
||||
|
||||
- name: "Deploy golden state config"
|
||||
include_role:
|
||||
name: golden_state
|
||||
tags: [config]
|
||||
|
||||
- name: "Validate config — no banned providers"
|
||||
shell: |
|
||||
python3 -c "
|
||||
import yaml, sys
|
||||
with open('{{ wizard_home }}/config.yaml') as f:
|
||||
cfg = yaml.safe_load(f)
|
||||
banned = {{ banned_providers }}
|
||||
for p in cfg.get('fallback_providers', []):
|
||||
if p.get('provider', '') in banned:
|
||||
print(f'BANNED: {p[\"provider\"]}', file=sys.stderr)
|
||||
sys.exit(1)
|
||||
model = cfg.get('model', {}).get('provider', '')
|
||||
if model in banned:
|
||||
print(f'BANNED default provider: {model}', file=sys.stderr)
|
||||
sys.exit(1)
|
||||
print('Config validated — no banned providers.')
|
||||
"
|
||||
register: config_valid
|
||||
tags: [validate]
|
||||
|
||||
- name: "Ensure hermes-agent service is running"
|
||||
systemd:
|
||||
name: "hermes-{{ wizard_name | lower }}"
|
||||
state: started
|
||||
enabled: true
|
||||
when: machine_type == 'vps'
|
||||
tags: [start]
|
||||
ignore_errors: true # Service may not exist yet on all machines
|
||||
|
||||
- name: "Start hermes agent (Mac — launchctl)"
|
||||
shell: |
|
||||
launchctl kickstart -k "ai.hermes.{{ wizard_name | lower }}" 2>/dev/null || \
|
||||
cd {{ wizard_home }} && hermes agent start --daemon 2>&1 | tail -5
|
||||
when: machine_type == 'mac'
|
||||
tags: [start]
|
||||
ignore_errors: true
|
||||
|
||||
- name: "Wait for agent to come online"
|
||||
wait_for:
|
||||
host: 127.0.0.1
|
||||
port: "{{ api_port }}"
|
||||
timeout: 60
|
||||
state: started
|
||||
tags: [verify]
|
||||
ignore_errors: true
|
||||
|
||||
- name: "Verify agent is alive — check request_log for activity"
|
||||
shell: |
|
||||
sleep 10
|
||||
python3 -c "
|
||||
import sqlite3, sys
|
||||
db = sqlite3.connect('{{ request_log_path }}')
|
||||
cursor = db.execute('''
|
||||
SELECT COUNT(*) FROM request_log
|
||||
WHERE agent_name = '{{ wizard_name }}'
|
||||
AND timestamp > datetime('now', '-5 minutes')
|
||||
''')
|
||||
count = cursor.fetchone()[0]
|
||||
if count > 0:
|
||||
print(f'{{ wizard_name }} is alive — {count} recent inference calls logged.')
|
||||
else:
|
||||
print(f'WARNING: {{ wizard_name }} started but no telemetry yet.')
|
||||
"
|
||||
register: agent_status
|
||||
tags: [verify]
|
||||
ignore_errors: true
|
||||
|
||||
- name: "Report startup status"
|
||||
debug:
|
||||
msg: "{{ wizard_name }}: {{ agent_status.stdout | default('startup attempted') }}"
|
||||
tags: [always]
|
||||
@@ -1,15 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# cron_schedule.yml — Source-Controlled Cron Jobs
|
||||
# =============================================================================
|
||||
# All cron jobs are defined in group_vars/wizards.yml.
|
||||
# This playbook deploys them. No manual crontab edits allowed.
|
||||
# =============================================================================
|
||||
|
||||
- name: "Deploy Cron Schedule"
|
||||
hosts: wizards
|
||||
become: true
|
||||
|
||||
roles:
|
||||
- role: cron_manager
|
||||
tags: [cron, schedule]
|
||||
@@ -1,17 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# deadman_switch.yml — Deploy Deadman Switch to All Wizards
|
||||
# =============================================================================
|
||||
# The deadman watch already fires and detects dead agents.
|
||||
# This playbook wires the ACTION:
|
||||
# - On healthy check: snapshot current config as "last known good"
|
||||
# - On failed check: rollback config to snapshot, restart agent
|
||||
# =============================================================================
|
||||
|
||||
- name: "Deploy Deadman Switch ACTION"
|
||||
hosts: wizards
|
||||
become: true
|
||||
|
||||
roles:
|
||||
- role: deadman_switch
|
||||
tags: [deadman, recovery]
|
||||
@@ -1,30 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# golden_state.yml — Deploy Golden State Config to All Wizards
|
||||
# =============================================================================
|
||||
# Enforces the golden state provider chain across the fleet.
|
||||
# Removes any Anthropic references. Deploys the approved provider chain.
|
||||
# =============================================================================
|
||||
|
||||
- name: "Deploy Golden State Configuration"
|
||||
hosts: wizards
|
||||
become: true
|
||||
|
||||
roles:
|
||||
- role: golden_state
|
||||
tags: [golden, config]
|
||||
|
||||
post_tasks:
|
||||
- name: "Verify golden state — no banned providers"
|
||||
shell: |
|
||||
grep -rci 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' \
|
||||
{{ hermes_home }}/config.yaml \
|
||||
{{ wizard_home }}/config.yaml 2>/dev/null || echo "0"
|
||||
register: banned_count
|
||||
changed_when: false
|
||||
|
||||
- name: "Report golden state status"
|
||||
debug:
|
||||
msg: >
|
||||
{{ wizard_name }} golden state: {{ golden_state_providers | map(attribute='name') | list | join(' → ') }}.
|
||||
Banned provider references: {{ banned_count.stdout | trim }}.
|
||||
@@ -1,15 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# request_log.yml — Deploy Telemetry Table
|
||||
# =============================================================================
|
||||
# Creates the request_log SQLite table on all machines.
|
||||
# Every inference call writes a row. No exceptions. No summarizing.
|
||||
# =============================================================================
|
||||
|
||||
- name: "Deploy Request Log Telemetry"
|
||||
hosts: wizards
|
||||
become: true
|
||||
|
||||
roles:
|
||||
- role: request_log
|
||||
tags: [telemetry, logging]
|
||||
@@ -1,72 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# site.yml — Master Playbook for the Timmy Foundation Fleet
|
||||
# =============================================================================
|
||||
# This is the ONE playbook that defines the entire fleet state.
|
||||
# Run this and every machine converges to golden state.
|
||||
#
|
||||
# Usage:
|
||||
# ansible-playbook -i inventory/hosts.yml playbooks/site.yml
|
||||
# ansible-playbook -i inventory/hosts.yml playbooks/site.yml --limit bezalel
|
||||
# ansible-playbook -i inventory/hosts.yml playbooks/site.yml --check --diff
|
||||
# =============================================================================
|
||||
|
||||
- name: "Timmy Foundation Fleet — Full Convergence"
|
||||
hosts: wizards
|
||||
become: true
|
||||
|
||||
pre_tasks:
|
||||
- name: "Validate no banned providers in golden state"
|
||||
assert:
|
||||
that:
|
||||
- "item.name not in banned_providers"
|
||||
fail_msg: "BANNED PROVIDER DETECTED: {{ item.name }} — Anthropic is permanently banned."
|
||||
quiet: true
|
||||
loop: "{{ golden_state_providers }}"
|
||||
tags: [always]
|
||||
|
||||
- name: "Display target wizard"
|
||||
debug:
|
||||
msg: "Deploying to {{ wizard_name }} ({{ wizard_role }}) on {{ ansible_host }}"
|
||||
tags: [always]
|
||||
|
||||
roles:
|
||||
- role: wizard_base
|
||||
tags: [base, setup]
|
||||
|
||||
- role: golden_state
|
||||
tags: [golden, config]
|
||||
|
||||
- role: deadman_switch
|
||||
tags: [deadman, recovery]
|
||||
|
||||
- role: request_log
|
||||
tags: [telemetry, logging]
|
||||
|
||||
- role: cron_manager
|
||||
tags: [cron, schedule]
|
||||
|
||||
post_tasks:
|
||||
- name: "Final validation — scan for banned providers"
|
||||
shell: |
|
||||
grep -ri 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' \
|
||||
{{ hermes_home }}/config.yaml \
|
||||
{{ wizard_home }}/config.yaml \
|
||||
{{ thin_config_path }} 2>/dev/null || true
|
||||
register: banned_scan
|
||||
changed_when: false
|
||||
tags: [validation]
|
||||
|
||||
- name: "FAIL if banned providers found in deployed config"
|
||||
fail:
|
||||
msg: |
|
||||
BANNED PROVIDER DETECTED IN DEPLOYED CONFIG:
|
||||
{{ banned_scan.stdout }}
|
||||
Anthropic is permanently banned. Fix the config and re-deploy.
|
||||
when: banned_scan.stdout | length > 0
|
||||
tags: [validation]
|
||||
|
||||
- name: "Deployment complete"
|
||||
debug:
|
||||
msg: "{{ wizard_name }} converged to golden state. Provider chain: {{ golden_state_providers | map(attribute='name') | list | join(' → ') }}"
|
||||
tags: [always]
|
||||
@@ -1,55 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# cron_manager/tasks — Source-Controlled Cron Jobs
|
||||
# =============================================================================
|
||||
# All cron jobs are defined in group_vars/wizards.yml.
|
||||
# No manual crontab edits. This is the only way to manage cron.
|
||||
# =============================================================================
|
||||
|
||||
- name: "Deploy managed cron jobs"
|
||||
cron:
|
||||
name: "{{ item.name }}"
|
||||
job: "{{ item.job }}"
|
||||
minute: "{{ item.minute | default('*') }}"
|
||||
hour: "{{ item.hour | default('*') }}"
|
||||
day: "{{ item.day | default('*') }}"
|
||||
month: "{{ item.month | default('*') }}"
|
||||
weekday: "{{ item.weekday | default('*') }}"
|
||||
state: "{{ 'present' if item.enabled else 'absent' }}"
|
||||
user: "{{ ansible_user | default('root') }}"
|
||||
loop: "{{ cron_jobs }}"
|
||||
when: cron_jobs is defined
|
||||
|
||||
- name: "Deploy deadman switch cron (fallback if systemd timer unavailable)"
|
||||
cron:
|
||||
name: "Deadman switch — {{ wizard_name }}"
|
||||
job: "{{ wizard_home }}/deadman_action.sh >> {{ timmy_log_dir }}/deadman-{{ wizard_name }}.log 2>&1"
|
||||
minute: "*/5"
|
||||
hour: "*"
|
||||
state: present
|
||||
user: "{{ ansible_user | default('root') }}"
|
||||
when: deadman_enabled and machine_type != 'vps'
|
||||
# VPS machines use systemd timers instead
|
||||
|
||||
- name: "Remove legacy cron jobs (cleanup)"
|
||||
cron:
|
||||
name: "{{ item }}"
|
||||
state: absent
|
||||
user: "{{ ansible_user | default('root') }}"
|
||||
loop:
|
||||
- "legacy-deadman-watch"
|
||||
- "old-health-check"
|
||||
- "backup-deadman"
|
||||
ignore_errors: true
|
||||
|
||||
- name: "List active cron jobs"
|
||||
shell: "crontab -l 2>/dev/null | grep -v '^#' | grep -v '^$' || echo 'No cron jobs found.'"
|
||||
register: active_crons
|
||||
changed_when: false
|
||||
|
||||
- name: "Report cron status"
|
||||
debug:
|
||||
msg: |
|
||||
{{ wizard_name }} cron jobs deployed.
|
||||
Active:
|
||||
{{ active_crons.stdout }}
|
||||
@@ -1,70 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# deadman_switch/tasks — Wire the Deadman Switch ACTION
|
||||
# =============================================================================
|
||||
# The watch fires. This makes it DO something:
|
||||
# - On healthy check: snapshot current config as "last known good"
|
||||
# - On failed check: rollback to last known good, restart agent
|
||||
# =============================================================================
|
||||
|
||||
- name: "Create snapshot directory"
|
||||
file:
|
||||
path: "{{ deadman_snapshot_dir }}"
|
||||
state: directory
|
||||
mode: "0755"
|
||||
|
||||
- name: "Deploy deadman switch script"
|
||||
template:
|
||||
src: deadman_action.sh.j2
|
||||
dest: "{{ wizard_home }}/deadman_action.sh"
|
||||
mode: "0755"
|
||||
|
||||
- name: "Deploy deadman systemd service"
|
||||
template:
|
||||
src: deadman_switch.service.j2
|
||||
dest: "/etc/systemd/system/deadman-{{ wizard_name | lower }}.service"
|
||||
mode: "0644"
|
||||
when: machine_type == 'vps'
|
||||
notify: "Enable deadman service"
|
||||
|
||||
- name: "Deploy deadman systemd timer"
|
||||
template:
|
||||
src: deadman_switch.timer.j2
|
||||
dest: "/etc/systemd/system/deadman-{{ wizard_name | lower }}.timer"
|
||||
mode: "0644"
|
||||
when: machine_type == 'vps'
|
||||
notify: "Enable deadman timer"
|
||||
|
||||
- name: "Deploy deadman launchd plist (Mac)"
|
||||
template:
|
||||
src: deadman_switch.plist.j2
|
||||
dest: "{{ ansible_env.HOME }}/Library/LaunchAgents/com.timmy.deadman.{{ wizard_name | lower }}.plist"
|
||||
mode: "0644"
|
||||
when: machine_type == 'mac'
|
||||
notify: "Load deadman plist"
|
||||
|
||||
- name: "Take initial config snapshot"
|
||||
copy:
|
||||
src: "{{ wizard_home }}/config.yaml"
|
||||
dest: "{{ deadman_snapshot_dir }}/config.yaml.known_good"
|
||||
remote_src: true
|
||||
mode: "0444"
|
||||
ignore_errors: true
|
||||
|
||||
handlers:
|
||||
- name: "Enable deadman service"
|
||||
systemd:
|
||||
name: "deadman-{{ wizard_name | lower }}.service"
|
||||
daemon_reload: true
|
||||
enabled: true
|
||||
|
||||
- name: "Enable deadman timer"
|
||||
systemd:
|
||||
name: "deadman-{{ wizard_name | lower }}.timer"
|
||||
daemon_reload: true
|
||||
enabled: true
|
||||
state: started
|
||||
|
||||
- name: "Load deadman plist"
|
||||
shell: "launchctl load {{ ansible_env.HOME }}/Library/LaunchAgents/com.timmy.deadman.{{ wizard_name | lower }}.plist"
|
||||
ignore_errors: true
|
||||
@@ -1,153 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# Deadman Switch ACTION — {{ wizard_name }}
|
||||
# =============================================================================
|
||||
# Generated by Ansible on {{ ansible_date_time.iso8601 }}
|
||||
# DO NOT EDIT MANUALLY.
|
||||
#
|
||||
# On healthy check: snapshot current config as "last known good"
|
||||
# On failed check: rollback config to last known good, restart agent
|
||||
# =============================================================================
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
WIZARD_NAME="{{ wizard_name }}"
|
||||
WIZARD_HOME="{{ wizard_home }}"
|
||||
CONFIG_FILE="{{ wizard_home }}/config.yaml"
|
||||
SNAPSHOT_DIR="{{ deadman_snapshot_dir }}"
|
||||
SNAPSHOT_FILE="${SNAPSHOT_DIR}/config.yaml.known_good"
|
||||
REQUEST_LOG_DB="{{ request_log_path }}"
|
||||
LOG_DIR="{{ timmy_log_dir }}"
|
||||
LOG_FILE="${LOG_DIR}/deadman-${WIZARD_NAME}.log"
|
||||
MAX_SNAPSHOTS={{ deadman_max_snapshots }}
|
||||
RESTART_COOLDOWN={{ deadman_restart_cooldown }}
|
||||
MAX_RESTART_ATTEMPTS={{ deadman_max_restart_attempts }}
|
||||
COOLDOWN_FILE="${LOG_DIR}/deadman_cooldown_${WIZARD_NAME}"
|
||||
SERVICE_NAME="hermes-{{ wizard_name | lower }}"
|
||||
|
||||
# Ensure directories exist
|
||||
mkdir -p "${SNAPSHOT_DIR}" "${LOG_DIR}"
|
||||
|
||||
log() {
|
||||
echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] [deadman] [${WIZARD_NAME}] $*" >> "${LOG_FILE}"
|
||||
echo "[deadman] [${WIZARD_NAME}] $*"
|
||||
}
|
||||
|
||||
log_telemetry() {
|
||||
local status="$1"
|
||||
local message="$2"
|
||||
if [ -f "${REQUEST_LOG_DB}" ]; then
|
||||
sqlite3 "${REQUEST_LOG_DB}" "INSERT INTO request_log (timestamp, agent_name, provider, model, endpoint, status, error_message) VALUES (datetime('now'), '${WIZARD_NAME}', 'deadman_switch', 'N/A', 'health_check', '${status}', '${message}');" 2>/dev/null || true
|
||||
fi
|
||||
}
|
||||
|
||||
snapshot_config() {
|
||||
if [ -f "${CONFIG_FILE}" ]; then
|
||||
cp "${CONFIG_FILE}" "${SNAPSHOT_FILE}"
|
||||
# Keep rolling history
|
||||
cp "${CONFIG_FILE}" "${SNAPSHOT_DIR}/config.yaml.$(date +%s)"
|
||||
# Prune old snapshots
|
||||
ls -t "${SNAPSHOT_DIR}"/config.yaml.[0-9]* 2>/dev/null | tail -n +$((MAX_SNAPSHOTS + 1)) | xargs rm -f 2>/dev/null
|
||||
log "Config snapshot saved."
|
||||
fi
|
||||
}
|
||||
|
||||
rollback_config() {
|
||||
if [ -f "${SNAPSHOT_FILE}" ]; then
|
||||
log "Rolling back config to last known good..."
|
||||
cp "${SNAPSHOT_FILE}" "${CONFIG_FILE}"
|
||||
log "Config rolled back."
|
||||
log_telemetry "fallback" "Config rolled back to last known good by deadman switch"
|
||||
else
|
||||
log "ERROR: No known good snapshot found. Pulling from upstream..."
|
||||
cd "${WIZARD_HOME}/workspace/timmy-config" 2>/dev/null && \
|
||||
git pull --ff-only origin {{ upstream_branch }} 2>/dev/null && \
|
||||
cp "wizards/{{ wizard_name | lower }}/config.yaml" "${CONFIG_FILE}" && \
|
||||
log "Config restored from upstream." || \
|
||||
log "CRITICAL: Cannot restore config from any source."
|
||||
fi
|
||||
}
|
||||
|
||||
restart_agent() {
|
||||
# Check cooldown
|
||||
if [ -f "${COOLDOWN_FILE}" ]; then
|
||||
local last_restart
|
||||
last_restart=$(cat "${COOLDOWN_FILE}")
|
||||
local now
|
||||
now=$(date +%s)
|
||||
local elapsed=$((now - last_restart))
|
||||
if [ "${elapsed}" -lt "${RESTART_COOLDOWN}" ]; then
|
||||
log "Restart cooldown active (${elapsed}s / ${RESTART_COOLDOWN}s). Skipping."
|
||||
return 1
|
||||
fi
|
||||
fi
|
||||
|
||||
log "Restarting ${SERVICE_NAME}..."
|
||||
date +%s > "${COOLDOWN_FILE}"
|
||||
|
||||
{% if machine_type == 'vps' %}
|
||||
systemctl restart "${SERVICE_NAME}" 2>/dev/null && \
|
||||
log "Agent restarted via systemd." || \
|
||||
log "ERROR: systemd restart failed."
|
||||
{% else %}
|
||||
launchctl kickstart -k "ai.hermes.{{ wizard_name | lower }}" 2>/dev/null && \
|
||||
log "Agent restarted via launchctl." || \
|
||||
(cd "${WIZARD_HOME}" && hermes agent start --daemon 2>/dev/null && \
|
||||
log "Agent restarted via hermes CLI.") || \
|
||||
log "ERROR: All restart methods failed."
|
||||
{% endif %}
|
||||
|
||||
log_telemetry "success" "Agent restarted by deadman switch"
|
||||
}
|
||||
|
||||
# --- Health Check ---
|
||||
check_health() {
|
||||
# Check 1: Is the agent process running?
|
||||
{% if machine_type == 'vps' %}
|
||||
if ! systemctl is-active --quiet "${SERVICE_NAME}" 2>/dev/null; then
|
||||
if ! pgrep -f "hermes" > /dev/null 2>/dev/null; then
|
||||
log "FAIL: Agent process not running."
|
||||
return 1
|
||||
fi
|
||||
fi
|
||||
{% else %}
|
||||
if ! pgrep -f "hermes" > /dev/null 2>/dev/null; then
|
||||
log "FAIL: Agent process not running."
|
||||
return 1
|
||||
fi
|
||||
{% endif %}
|
||||
|
||||
# Check 2: Is the API port responding?
|
||||
if ! timeout 10 bash -c "echo > /dev/tcp/127.0.0.1/{{ api_port }}" 2>/dev/null; then
|
||||
log "FAIL: API port {{ api_port }} not responding."
|
||||
return 1
|
||||
fi
|
||||
|
||||
# Check 3: Does the config contain banned providers?
|
||||
if grep -qi 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' "${CONFIG_FILE}" 2>/dev/null; then
|
||||
log "FAIL: Config contains banned provider (Anthropic). Rolling back."
|
||||
return 1
|
||||
fi
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
# --- Main ---
|
||||
main() {
|
||||
log "Health check starting..."
|
||||
|
||||
if check_health; then
|
||||
log "HEALTHY — snapshotting config."
|
||||
snapshot_config
|
||||
log_telemetry "success" "Health check passed"
|
||||
else
|
||||
log "UNHEALTHY — initiating recovery."
|
||||
log_telemetry "error" "Health check failed — initiating rollback"
|
||||
rollback_config
|
||||
restart_agent
|
||||
fi
|
||||
|
||||
log "Health check complete."
|
||||
}
|
||||
|
||||
main "$@"
|
||||
@@ -1,22 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<!-- Deadman Switch — {{ wizard_name }}. Generated by Ansible. DO NOT EDIT MANUALLY. -->
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.timmy.deadman.{{ wizard_name | lower }}</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/bin/bash</string>
|
||||
<string>{{ wizard_home }}/deadman_action.sh</string>
|
||||
</array>
|
||||
<key>StartInterval</key>
|
||||
<integer>{{ deadman_check_interval }}</integer>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>StandardOutPath</key>
|
||||
<string>{{ timmy_log_dir }}/deadman-{{ wizard_name }}.log</string>
|
||||
<key>StandardErrorPath</key>
|
||||
<string>{{ timmy_log_dir }}/deadman-{{ wizard_name }}.log</string>
|
||||
</dict>
|
||||
</plist>
|
||||
@@ -1,16 +0,0 @@
|
||||
# Deadman Switch — {{ wizard_name }}
|
||||
# Generated by Ansible. DO NOT EDIT MANUALLY.
|
||||
|
||||
[Unit]
|
||||
Description=Deadman Switch for {{ wizard_name }} wizard
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart={{ wizard_home }}/deadman_action.sh
|
||||
User={{ ansible_user | default('root') }}
|
||||
StandardOutput=append:{{ timmy_log_dir }}/deadman-{{ wizard_name }}.log
|
||||
StandardError=append:{{ timmy_log_dir }}/deadman-{{ wizard_name }}.log
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
@@ -1,14 +0,0 @@
|
||||
# Deadman Switch Timer — {{ wizard_name }}
|
||||
# Generated by Ansible. DO NOT EDIT MANUALLY.
|
||||
# Runs every {{ deadman_check_interval // 60 }} minutes.
|
||||
|
||||
[Unit]
|
||||
Description=Deadman Switch Timer for {{ wizard_name }} wizard
|
||||
|
||||
[Timer]
|
||||
OnBootSec=60
|
||||
OnUnitActiveSec={{ deadman_check_interval }}s
|
||||
AccuracySec=30s
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
@@ -1,6 +0,0 @@
|
||||
---
|
||||
# golden_state defaults
|
||||
# The golden_state_providers list is defined in group_vars/wizards.yml
|
||||
# and inventory/hosts.yml (global vars).
|
||||
golden_state_enforce: true
|
||||
golden_state_backup_before_deploy: true
|
||||
@@ -1,46 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# golden_state/tasks — Deploy and enforce golden state provider chain
|
||||
# =============================================================================
|
||||
|
||||
- name: "Backup current config before golden state deploy"
|
||||
copy:
|
||||
src: "{{ wizard_home }}/config.yaml"
|
||||
dest: "{{ wizard_home }}/config.yaml.pre-golden-{{ ansible_date_time.epoch }}"
|
||||
remote_src: true
|
||||
when: golden_state_backup_before_deploy
|
||||
ignore_errors: true
|
||||
|
||||
- name: "Deploy golden state wizard config"
|
||||
template:
|
||||
src: "../../wizard_base/templates/wizard_config.yaml.j2"
|
||||
dest: "{{ wizard_home }}/config.yaml"
|
||||
mode: "0644"
|
||||
backup: true
|
||||
notify:
|
||||
- "Restart hermes agent (systemd)"
|
||||
- "Restart hermes agent (launchctl)"
|
||||
|
||||
- name: "Scan for banned providers in all config files"
|
||||
shell: |
|
||||
FOUND=0
|
||||
for f in {{ wizard_home }}/config.yaml {{ hermes_home }}/config.yaml; do
|
||||
if [ -f "$f" ]; then
|
||||
if grep -qi 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' "$f"; then
|
||||
echo "BANNED PROVIDER in $f:"
|
||||
grep -ni 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' "$f"
|
||||
FOUND=1
|
||||
fi
|
||||
fi
|
||||
done
|
||||
exit $FOUND
|
||||
register: provider_scan
|
||||
changed_when: false
|
||||
failed_when: provider_scan.rc != 0 and provider_ban_enforcement == 'strict'
|
||||
|
||||
- name: "Report golden state deployment"
|
||||
debug:
|
||||
msg: >
|
||||
{{ wizard_name }} golden state deployed.
|
||||
Provider chain: {{ golden_state_providers | map(attribute='name') | list | join(' → ') }}.
|
||||
Banned provider scan: {{ 'CLEAN' if provider_scan.rc == 0 else 'VIOLATIONS FOUND' }}.
|
||||
@@ -1,64 +0,0 @@
|
||||
-- =============================================================================
|
||||
-- request_log — Inference Telemetry Table
|
||||
-- =============================================================================
|
||||
-- Every agent writes to this table BEFORE and AFTER every inference call.
|
||||
-- No exceptions. No summarizing. No describing what you would log.
|
||||
-- Actually write the row.
|
||||
--
|
||||
-- Source: KT Bezalel Architecture Session 2026-04-08
|
||||
-- =============================================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS request_log (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
timestamp TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
agent_name TEXT NOT NULL,
|
||||
provider TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
endpoint TEXT NOT NULL,
|
||||
tokens_in INTEGER,
|
||||
tokens_out INTEGER,
|
||||
latency_ms INTEGER,
|
||||
status TEXT NOT NULL, -- 'success', 'error', 'timeout', 'fallback'
|
||||
error_message TEXT
|
||||
);
|
||||
|
||||
-- Index for common queries
|
||||
CREATE INDEX IF NOT EXISTS idx_request_log_agent
|
||||
ON request_log (agent_name, timestamp);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_request_log_provider
|
||||
ON request_log (provider, timestamp);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_request_log_status
|
||||
ON request_log (status, timestamp);
|
||||
|
||||
-- View: recent activity per agent (last hour)
|
||||
CREATE VIEW IF NOT EXISTS v_recent_activity AS
|
||||
SELECT
|
||||
agent_name,
|
||||
provider,
|
||||
model,
|
||||
status,
|
||||
COUNT(*) as call_count,
|
||||
AVG(latency_ms) as avg_latency_ms,
|
||||
SUM(tokens_in) as total_tokens_in,
|
||||
SUM(tokens_out) as total_tokens_out
|
||||
FROM request_log
|
||||
WHERE timestamp > datetime('now', '-1 hour')
|
||||
GROUP BY agent_name, provider, model, status;
|
||||
|
||||
-- View: provider reliability (last 24 hours)
|
||||
CREATE VIEW IF NOT EXISTS v_provider_reliability AS
|
||||
SELECT
|
||||
provider,
|
||||
model,
|
||||
COUNT(*) as total_calls,
|
||||
SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successes,
|
||||
SUM(CASE WHEN status = 'error' THEN 1 ELSE 0 END) as errors,
|
||||
SUM(CASE WHEN status = 'timeout' THEN 1 ELSE 0 END) as timeouts,
|
||||
SUM(CASE WHEN status = 'fallback' THEN 1 ELSE 0 END) as fallbacks,
|
||||
ROUND(100.0 * SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) / COUNT(*), 1) as success_rate,
|
||||
AVG(latency_ms) as avg_latency_ms
|
||||
FROM request_log
|
||||
WHERE timestamp > datetime('now', '-24 hours')
|
||||
GROUP BY provider, model;
|
||||
@@ -1,50 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# request_log/tasks — Deploy Telemetry Table
|
||||
# =============================================================================
|
||||
# "This is non-negotiable infrastructure. Without it, we cannot verify
|
||||
# if any agent actually executed what it claims."
|
||||
# — KT Bezalel 2026-04-08
|
||||
# =============================================================================
|
||||
|
||||
- name: "Create telemetry directory"
|
||||
file:
|
||||
path: "{{ request_log_path | dirname }}"
|
||||
state: directory
|
||||
mode: "0755"
|
||||
|
||||
- name: "Deploy request_log schema"
|
||||
copy:
|
||||
src: request_log_schema.sql
|
||||
dest: "{{ wizard_home }}/request_log_schema.sql"
|
||||
mode: "0644"
|
||||
|
||||
- name: "Initialize request_log database"
|
||||
shell: |
|
||||
sqlite3 "{{ request_log_path }}" < "{{ wizard_home }}/request_log_schema.sql"
|
||||
args:
|
||||
creates: "{{ request_log_path }}"
|
||||
|
||||
- name: "Verify request_log table exists"
|
||||
shell: |
|
||||
sqlite3 "{{ request_log_path }}" ".tables" | grep -q "request_log"
|
||||
register: table_check
|
||||
changed_when: false
|
||||
|
||||
- name: "Verify request_log schema matches"
|
||||
shell: |
|
||||
sqlite3 "{{ request_log_path }}" ".schema request_log" | grep -q "agent_name"
|
||||
register: schema_check
|
||||
changed_when: false
|
||||
|
||||
- name: "Set permissions on request_log database"
|
||||
file:
|
||||
path: "{{ request_log_path }}"
|
||||
mode: "0644"
|
||||
|
||||
- name: "Report request_log status"
|
||||
debug:
|
||||
msg: >
|
||||
{{ wizard_name }} request_log: {{ request_log_path }}
|
||||
— table exists: {{ table_check.rc == 0 }}
|
||||
— schema valid: {{ schema_check.rc == 0 }}
|
||||
@@ -1,6 +0,0 @@
|
||||
---
|
||||
# wizard_base defaults
|
||||
wizard_user: "{{ ansible_user | default('root') }}"
|
||||
wizard_group: "{{ ansible_user | default('root') }}"
|
||||
timmy_base_dir: "~/.local/timmy"
|
||||
timmy_config_repo: "https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config.git"
|
||||
@@ -1,11 +0,0 @@
|
||||
---
|
||||
- name: "Restart hermes agent (systemd)"
|
||||
systemd:
|
||||
name: "hermes-{{ wizard_name | lower }}"
|
||||
state: restarted
|
||||
when: machine_type == 'vps'
|
||||
|
||||
- name: "Restart hermes agent (launchctl)"
|
||||
shell: "launchctl kickstart -k ai.hermes.{{ wizard_name | lower }}"
|
||||
when: machine_type == 'mac'
|
||||
ignore_errors: true
|
||||
@@ -1,69 +0,0 @@
|
||||
---
|
||||
# =============================================================================
|
||||
# wizard_base/tasks — Common wizard setup
|
||||
# =============================================================================
|
||||
|
||||
- name: "Create wizard directories"
|
||||
file:
|
||||
path: "{{ item }}"
|
||||
state: directory
|
||||
mode: "0755"
|
||||
loop:
|
||||
- "{{ wizard_home }}"
|
||||
- "{{ wizard_home }}/workspace"
|
||||
- "{{ hermes_home }}"
|
||||
- "{{ hermes_home }}/bin"
|
||||
- "{{ hermes_home }}/skins"
|
||||
- "{{ hermes_home }}/playbooks"
|
||||
- "{{ hermes_home }}/memories"
|
||||
- "~/.local/timmy"
|
||||
- "~/.local/timmy/fleet-health"
|
||||
- "~/.local/timmy/snapshots"
|
||||
- "~/.timmy"
|
||||
|
||||
- name: "Clone/update timmy-config"
|
||||
git:
|
||||
repo: "{{ upstream_repo }}"
|
||||
dest: "{{ wizard_home }}/workspace/timmy-config"
|
||||
version: "{{ upstream_branch }}"
|
||||
force: false
|
||||
update: true
|
||||
ignore_errors: true # May fail on first run if no SSH key
|
||||
|
||||
- name: "Deploy SOUL.md"
|
||||
copy:
|
||||
src: "{{ wizard_home }}/workspace/timmy-config/SOUL.md"
|
||||
dest: "~/.timmy/SOUL.md"
|
||||
remote_src: true
|
||||
mode: "0644"
|
||||
ignore_errors: true
|
||||
|
||||
- name: "Deploy thin config (immutable pointer to upstream)"
|
||||
template:
|
||||
src: thin_config.yml.j2
|
||||
dest: "{{ thin_config_path }}"
|
||||
mode: "{{ thin_config_mode }}"
|
||||
tags: [thin_config]
|
||||
|
||||
- name: "Ensure Python3 and pip are available"
|
||||
package:
|
||||
name:
|
||||
- python3
|
||||
- python3-pip
|
||||
state: present
|
||||
when: machine_type == 'vps'
|
||||
ignore_errors: true
|
||||
|
||||
- name: "Ensure PyYAML is installed (for config validation)"
|
||||
pip:
|
||||
name: pyyaml
|
||||
state: present
|
||||
when: machine_type == 'vps'
|
||||
ignore_errors: true
|
||||
|
||||
- name: "Create Ansible log directory"
|
||||
file:
|
||||
path: /var/log/ansible
|
||||
state: directory
|
||||
mode: "0755"
|
||||
ignore_errors: true
|
||||
@@ -1,41 +0,0 @@
|
||||
# =============================================================================
|
||||
# Thin Config — {{ wizard_name }}
|
||||
# =============================================================================
|
||||
# THIS FILE IS READ-ONLY. Agents CANNOT modify it.
|
||||
# It contains only pointers to upstream. The actual config lives in Gitea.
|
||||
#
|
||||
# Agent wakes up → pulls config from upstream → loads → runs.
|
||||
# If anything tries to mutate this → fails gracefully → pulls fresh on restart.
|
||||
#
|
||||
# Only way to permanently change config: commit to Gitea, merge PR, Ansible deploys.
|
||||
#
|
||||
# Generated by Ansible on {{ ansible_date_time.iso8601 }}
|
||||
# DO NOT EDIT MANUALLY.
|
||||
# =============================================================================
|
||||
|
||||
identity:
|
||||
wizard_name: "{{ wizard_name }}"
|
||||
wizard_role: "{{ wizard_role }}"
|
||||
machine: "{{ inventory_hostname }}"
|
||||
|
||||
upstream:
|
||||
repo: "{{ upstream_repo }}"
|
||||
branch: "{{ upstream_branch }}"
|
||||
config_path: "wizards/{{ wizard_name | lower }}/config.yaml"
|
||||
pull_on_wake: {{ config_pull_on_wake | lower }}
|
||||
|
||||
recovery:
|
||||
deadman_enabled: {{ deadman_enabled | lower }}
|
||||
snapshot_dir: "{{ deadman_snapshot_dir }}"
|
||||
restart_cooldown: {{ deadman_restart_cooldown }}
|
||||
max_restart_attempts: {{ deadman_max_restart_attempts }}
|
||||
escalation_channel: "{{ deadman_escalation_channel }}"
|
||||
|
||||
telemetry:
|
||||
request_log_path: "{{ request_log_path }}"
|
||||
request_log_enabled: {{ request_log_enabled | lower }}
|
||||
|
||||
local_overrides:
|
||||
# Runtime overrides go here. They are EPHEMERAL — not persisted across restarts.
|
||||
# On restart, this section is reset to empty.
|
||||
{}
|
||||
@@ -1,115 +0,0 @@
|
||||
# =============================================================================
|
||||
# {{ wizard_name }} — Wizard Configuration (Golden State)
|
||||
# =============================================================================
|
||||
# Generated by Ansible on {{ ansible_date_time.iso8601 }}
|
||||
# DO NOT EDIT MANUALLY. Changes go through Gitea PR → Ansible deploy.
|
||||
#
|
||||
# Provider chain: {{ golden_state_providers | map(attribute='name') | list | join(' → ') }}
|
||||
# Anthropic is PERMANENTLY BANNED.
|
||||
# =============================================================================
|
||||
|
||||
model:
|
||||
default: {{ wizard_model_primary }}
|
||||
provider: {{ wizard_provider_primary }}
|
||||
context_length: 65536
|
||||
base_url: {{ golden_state_providers[0].base_url }}
|
||||
|
||||
toolsets:
|
||||
- all
|
||||
|
||||
fallback_providers:
|
||||
{% for provider in golden_state_providers %}
|
||||
- provider: {{ provider.name }}
|
||||
model: {{ provider.model }}
|
||||
{% if provider.base_url is defined %}
|
||||
base_url: {{ provider.base_url }}
|
||||
{% endif %}
|
||||
{% if provider.api_key_env is defined %}
|
||||
api_key_env: {{ provider.api_key_env }}
|
||||
{% endif %}
|
||||
timeout: {{ provider.timeout }}
|
||||
reason: "{{ provider.reason }}"
|
||||
{% endfor %}
|
||||
|
||||
agent:
|
||||
max_turns: {{ agent_max_turns }}
|
||||
reasoning_effort: {{ agent_reasoning_effort }}
|
||||
verbose: {{ agent_verbose | lower }}
|
||||
|
||||
terminal:
|
||||
backend: local
|
||||
cwd: .
|
||||
timeout: 180
|
||||
persistent_shell: true
|
||||
|
||||
browser:
|
||||
inactivity_timeout: 120
|
||||
command_timeout: 30
|
||||
record_sessions: false
|
||||
|
||||
display:
|
||||
compact: false
|
||||
personality: ''
|
||||
resume_display: full
|
||||
busy_input_mode: interrupt
|
||||
bell_on_complete: false
|
||||
show_reasoning: false
|
||||
streaming: false
|
||||
show_cost: false
|
||||
tool_progress: all
|
||||
|
||||
memory:
|
||||
memory_enabled: true
|
||||
user_profile_enabled: true
|
||||
memory_char_limit: 2200
|
||||
user_char_limit: 1375
|
||||
nudge_interval: 10
|
||||
flush_min_turns: 6
|
||||
|
||||
approvals:
|
||||
mode: {{ agent_approval_mode }}
|
||||
|
||||
security:
|
||||
redact_secrets: true
|
||||
tirith_enabled: false
|
||||
|
||||
platforms:
|
||||
api_server:
|
||||
enabled: true
|
||||
extra:
|
||||
host: 127.0.0.1
|
||||
port: {{ api_port }}
|
||||
|
||||
session_reset:
|
||||
mode: none
|
||||
idle_minutes: 0
|
||||
|
||||
skills:
|
||||
creation_nudge_interval: 15
|
||||
|
||||
system_prompt_suffix: |
|
||||
You are {{ wizard_name }}, {{ wizard_role }}.
|
||||
Your soul is defined in SOUL.md — read it, live it.
|
||||
Hermes is your harness.
|
||||
{{ golden_state_providers[0].name }} is your primary provider.
|
||||
Refusal over fabrication. If you do not know, say so.
|
||||
Sovereignty and service always.
|
||||
|
||||
providers:
|
||||
{% for provider in golden_state_providers %}
|
||||
{{ provider.name }}:
|
||||
base_url: {{ provider.base_url }}
|
||||
timeout: {{ provider.timeout | default(60) }}
|
||||
{% if provider.name == 'kimi-coding' %}
|
||||
max_retries: 3
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
|
||||
# =============================================================================
|
||||
# BANNED PROVIDERS — DO NOT ADD
|
||||
# =============================================================================
|
||||
# The following providers are PERMANENTLY BANNED:
|
||||
# - anthropic (any model: claude-sonnet, claude-opus, claude-haiku)
|
||||
# Enforcement: pre-commit hook, linter, Ansible validation, this comment.
|
||||
# Adding any banned provider will cause Ansible deployment to FAIL.
|
||||
# =============================================================================
|
||||
@@ -1,75 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# Gitea Webhook Handler — Trigger Ansible Deploy on Merge
|
||||
# =============================================================================
|
||||
# This script is called by the Gitea webhook when a PR is merged
|
||||
# to the main branch of timmy-config.
|
||||
#
|
||||
# Setup:
|
||||
# 1. Add webhook in Gitea: Settings → Webhooks → Add Webhook
|
||||
# 2. URL: http://localhost:9000/hooks/deploy-timmy-config
|
||||
# 3. Events: Pull Request (merged only)
|
||||
# 4. Secret: <configured in Gitea>
|
||||
#
|
||||
# This script runs ansible-pull to update the local machine.
|
||||
# For fleet-wide deploys, each machine runs ansible-pull independently.
|
||||
# =============================================================================
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
REPO="https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config.git"
|
||||
BRANCH="main"
|
||||
ANSIBLE_DIR="ansible"
|
||||
LOG_FILE="/var/log/ansible/webhook-deploy.log"
|
||||
LOCK_FILE="/tmp/ansible-deploy.lock"
|
||||
|
||||
log() {
|
||||
echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] [webhook] $*" | tee -a "${LOG_FILE}"
|
||||
}
|
||||
|
||||
# Prevent concurrent deploys
|
||||
if [ -f "${LOCK_FILE}" ]; then
|
||||
LOCK_AGE=$(( $(date +%s) - $(stat -c %Y "${LOCK_FILE}" 2>/dev/null || echo 0) ))
|
||||
if [ "${LOCK_AGE}" -lt 300 ]; then
|
||||
log "Deploy already in progress (lock age: ${LOCK_AGE}s). Skipping."
|
||||
exit 0
|
||||
else
|
||||
log "Stale lock file (${LOCK_AGE}s old). Removing."
|
||||
rm -f "${LOCK_FILE}"
|
||||
fi
|
||||
fi
|
||||
|
||||
trap 'rm -f "${LOCK_FILE}"' EXIT
|
||||
touch "${LOCK_FILE}"
|
||||
|
||||
log "Webhook triggered. Starting ansible-pull..."
|
||||
|
||||
# Pull latest config
|
||||
cd /tmp
|
||||
rm -rf timmy-config-deploy
|
||||
git clone --depth 1 --branch "${BRANCH}" "${REPO}" timmy-config-deploy 2>&1 | tee -a "${LOG_FILE}"
|
||||
|
||||
cd timmy-config-deploy/${ANSIBLE_DIR}
|
||||
|
||||
# Run Ansible against localhost
|
||||
log "Running Ansible playbook..."
|
||||
ansible-playbook \
|
||||
-i inventory/hosts.yml \
|
||||
playbooks/site.yml \
|
||||
--limit "$(hostname)" \
|
||||
--diff \
|
||||
2>&1 | tee -a "${LOG_FILE}"
|
||||
|
||||
RESULT=$?
|
||||
|
||||
if [ ${RESULT} -eq 0 ]; then
|
||||
log "Deploy successful."
|
||||
else
|
||||
log "ERROR: Deploy failed with exit code ${RESULT}."
|
||||
fi
|
||||
|
||||
# Cleanup
|
||||
rm -rf /tmp/timmy-config-deploy
|
||||
|
||||
log "Webhook handler complete."
|
||||
exit ${RESULT}
|
||||
@@ -1,155 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Config Validator — The Timmy Foundation
|
||||
Validates wizard configs against golden state rules.
|
||||
Run before any config deploy to catch violations early.
|
||||
|
||||
Usage:
|
||||
python3 validate_config.py <config_file>
|
||||
python3 validate_config.py --all # Validate all wizard configs
|
||||
|
||||
Exit codes:
|
||||
0 — All validations passed
|
||||
1 — Validation errors found
|
||||
2 — File not found or parse error
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
import fnmatch
|
||||
from pathlib import Path
|
||||
|
||||
# === BANNED PROVIDERS — HARD POLICY ===
|
||||
BANNED_PROVIDERS = {"anthropic", "claude"}
|
||||
BANNED_MODEL_PATTERNS = [
|
||||
"claude-*",
|
||||
"anthropic/*",
|
||||
"*sonnet*",
|
||||
"*opus*",
|
||||
"*haiku*",
|
||||
]
|
||||
|
||||
# === REQUIRED FIELDS ===
|
||||
REQUIRED_FIELDS = {
|
||||
"model": ["default", "provider"],
|
||||
"fallback_providers": None, # Must exist as a list
|
||||
}
|
||||
|
||||
|
||||
def is_banned_model(model_name: str) -> bool:
|
||||
"""Check if a model name matches any banned pattern."""
|
||||
model_lower = model_name.lower()
|
||||
for pattern in BANNED_MODEL_PATTERNS:
|
||||
if fnmatch.fnmatch(model_lower, pattern):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def validate_config(config_path: str) -> list[str]:
|
||||
"""Validate a wizard config file. Returns list of error strings."""
|
||||
errors = []
|
||||
|
||||
try:
|
||||
with open(config_path) as f:
|
||||
cfg = yaml.safe_load(f)
|
||||
except FileNotFoundError:
|
||||
return [f"File not found: {config_path}"]
|
||||
except yaml.YAMLError as e:
|
||||
return [f"YAML parse error: {e}"]
|
||||
|
||||
if not cfg:
|
||||
return ["Config file is empty"]
|
||||
|
||||
# Check required fields
|
||||
for section, fields in REQUIRED_FIELDS.items():
|
||||
if section not in cfg:
|
||||
errors.append(f"Missing required section: {section}")
|
||||
elif fields:
|
||||
for field in fields:
|
||||
if field not in cfg[section]:
|
||||
errors.append(f"Missing required field: {section}.{field}")
|
||||
|
||||
# Check default provider
|
||||
default_provider = cfg.get("model", {}).get("provider", "")
|
||||
if default_provider.lower() in BANNED_PROVIDERS:
|
||||
errors.append(f"BANNED default provider: {default_provider}")
|
||||
|
||||
default_model = cfg.get("model", {}).get("default", "")
|
||||
if is_banned_model(default_model):
|
||||
errors.append(f"BANNED default model: {default_model}")
|
||||
|
||||
# Check fallback providers
|
||||
for i, fb in enumerate(cfg.get("fallback_providers", [])):
|
||||
provider = fb.get("provider", "")
|
||||
model = fb.get("model", "")
|
||||
|
||||
if provider.lower() in BANNED_PROVIDERS:
|
||||
errors.append(f"BANNED fallback provider [{i}]: {provider}")
|
||||
|
||||
if is_banned_model(model):
|
||||
errors.append(f"BANNED fallback model [{i}]: {model}")
|
||||
|
||||
# Check providers section
|
||||
for name, provider_cfg in cfg.get("providers", {}).items():
|
||||
if name.lower() in BANNED_PROVIDERS:
|
||||
errors.append(f"BANNED provider in providers section: {name}")
|
||||
|
||||
base_url = str(provider_cfg.get("base_url", ""))
|
||||
if "anthropic" in base_url.lower():
|
||||
errors.append(f"BANNED URL in provider {name}: {base_url}")
|
||||
|
||||
# Check system prompt for banned references
|
||||
prompt = cfg.get("system_prompt_suffix", "")
|
||||
if isinstance(prompt, str):
|
||||
for banned in BANNED_PROVIDERS:
|
||||
if banned in prompt.lower():
|
||||
errors.append(f"BANNED provider referenced in system_prompt_suffix: {banned}")
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print(f"Usage: {sys.argv[0]} <config_file> [--all]")
|
||||
sys.exit(2)
|
||||
|
||||
if sys.argv[1] == "--all":
|
||||
# Validate all wizard configs in the repo
|
||||
repo_root = Path(__file__).parent.parent.parent
|
||||
wizard_dir = repo_root / "wizards"
|
||||
all_errors = {}
|
||||
|
||||
for wizard_path in sorted(wizard_dir.iterdir()):
|
||||
config_file = wizard_path / "config.yaml"
|
||||
if config_file.exists():
|
||||
errors = validate_config(str(config_file))
|
||||
if errors:
|
||||
all_errors[wizard_path.name] = errors
|
||||
|
||||
if all_errors:
|
||||
print("VALIDATION FAILED:")
|
||||
for wizard, errors in all_errors.items():
|
||||
print(f"\n {wizard}:")
|
||||
for err in errors:
|
||||
print(f" - {err}")
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("All wizard configs passed validation.")
|
||||
sys.exit(0)
|
||||
else:
|
||||
config_path = sys.argv[1]
|
||||
errors = validate_config(config_path)
|
||||
|
||||
if errors:
|
||||
print(f"VALIDATION FAILED for {config_path}:")
|
||||
for err in errors:
|
||||
print(f" - {err}")
|
||||
sys.exit(1)
|
||||
else:
|
||||
print(f"PASSED: {config_path}")
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
77
architecture_linter.py
Normal file
77
architecture_linter.py
Normal file
@@ -0,0 +1,77 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Architecture Linter — Sovereignty Enforcement
|
||||
Scans the codebase for banned providers, models, and API keys.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import re
|
||||
|
||||
BANNED_STRINGS = [
|
||||
r'anthropic',
|
||||
r'claude',
|
||||
r'api\.anthropic\.com',
|
||||
r'ANTHROPIC_API_KEY',
|
||||
r'claude-opus',
|
||||
r'claude-sonnet',
|
||||
r'claude-haiku'
|
||||
]
|
||||
|
||||
EXCEPTIONS = [
|
||||
'BANNED_PROVIDERS.md',
|
||||
'architecture_linter.py',
|
||||
'training/',
|
||||
'evaluations/',
|
||||
'RELEASE_',
|
||||
'metrics_helpers.py'
|
||||
]
|
||||
|
||||
def is_exception(path):
|
||||
for exc in EXCEPTIONS:
|
||||
if exc in path:
|
||||
return True
|
||||
return False
|
||||
|
||||
def check_file(path):
|
||||
violations = []
|
||||
try:
|
||||
with open(path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||
for i, line in enumerate(f, 1):
|
||||
for pattern in BANNED_STRINGS:
|
||||
if re.search(pattern, line, re.IGNORECASE):
|
||||
violations.append((i, line.strip(), pattern))
|
||||
except Exception as e:
|
||||
print(f"Error reading {path}: {e}")
|
||||
return violations
|
||||
|
||||
def main():
|
||||
print("--- Sovereignty Enforcement: Architecture Linter ---")
|
||||
total_violations = 0
|
||||
|
||||
for root, dirs, files in os.walk('.'):
|
||||
# Skip .git
|
||||
if '.git' in dirs:
|
||||
dirs.remove('.git')
|
||||
|
||||
for file in files:
|
||||
path = os.path.join(root, file)
|
||||
if is_exception(path):
|
||||
continue
|
||||
|
||||
violations = check_file(path)
|
||||
if violations:
|
||||
print(f"\n[VIOLATION] {path}:")
|
||||
for line_num, content, pattern in violations:
|
||||
print(f" Line {line_num}: Found '{pattern}' -> {content}")
|
||||
total_violations += 1
|
||||
|
||||
if total_violations > 0:
|
||||
print(f"\nFAILED: Found {total_violations} sovereignty violations.")
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("\nPASSED: No banned providers detected.")
|
||||
sys.exit(0)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -2,7 +2,7 @@
|
||||
# agent-loop.sh — Universal agent dev loop with Genchi Genbutsu verification
|
||||
#
|
||||
# Usage: agent-loop.sh <agent-name> [num-workers]
|
||||
# agent-loop.sh claude 2
|
||||
# agent-loop.sh kimi 2
|
||||
# agent-loop.sh gemini 1
|
||||
#
|
||||
# Dispatches via agent-dispatch.sh, then verifies with genchi-genbutsu.sh.
|
||||
@@ -14,7 +14,7 @@ NUM_WORKERS="${2:-1}"
|
||||
|
||||
# Resolve agent tool and model from config or fallback
|
||||
case "$AGENT" in
|
||||
claude) TOOL="claude"; MODEL="sonnet" ;;
|
||||
# claude case removed — Anthropic purged from fleet
|
||||
gemini) TOOL="gemini"; MODEL="gemini-2.5-pro-preview-05-06" ;;
|
||||
grok) TOOL="opencode"; MODEL="grok-3-fast" ;;
|
||||
*) TOOL="$AGENT"; MODEL="" ;;
|
||||
@@ -145,8 +145,8 @@ run_worker() {
|
||||
|
||||
CYCLE_START=$(date +%s)
|
||||
set +e
|
||||
if [ "$TOOL" = "claude" ]; then
|
||||
env -u CLAUDECODE gtimeout "$TIMEOUT" claude \
|
||||
if [ "$TOOL" = "kimi" ]; then
|
||||
# Claude dispatch removed — Anthropic purged
|
||||
--print --model "$MODEL" --dangerously-skip-permissions \
|
||||
-p "$prompt" </dev/null >> "$LOG_DIR/${AGENT}-${issue_num}.log" 2>&1
|
||||
elif [ "$TOOL" = "gemini" ]; then
|
||||
|
||||
@@ -1,4 +1,13 @@
|
||||
#!/usr/bin/env bash
|
||||
# DEPRECATED — Anthropic purged from fleet (April 2026)
|
||||
# This script dispatched parallel Claude Code agent loops.
|
||||
# All wizard providers now use Kimi K2.5 as primary.
|
||||
# See bin/gemini-loop.sh for the surviving loop pattern.
|
||||
echo "[DEPRECATED] claude-loop.sh is no longer active. Use gemini-loop.sh or agent-loop.sh with kimi provider."
|
||||
exit 0
|
||||
|
||||
# --- ORIGINAL SCRIPT PRESERVED BELOW FOR REFERENCE ---
|
||||
#!/usr/bin/env bash
|
||||
# claude-loop.sh — Parallel Claude Code agent dispatch loop
|
||||
# Runs N workers concurrently against the Gitea backlog.
|
||||
# Gracefully handles rate limits with backoff.
|
||||
|
||||
@@ -1,4 +1,12 @@
|
||||
#!/usr/bin/env bash
|
||||
# DEPRECATED — Anthropic purged from fleet (April 2026)
|
||||
# This watchdog kept Claude/Gemini loops alive.
|
||||
# Only gemini loops survive. Use fleet-status.sh for monitoring.
|
||||
echo "[DEPRECATED] claudemax-watchdog.sh is no longer active."
|
||||
exit 0
|
||||
|
||||
# --- ORIGINAL SCRIPT PRESERVED BELOW FOR REFERENCE ---
|
||||
#!/usr/bin/env bash
|
||||
# claudemax-watchdog.sh — keep local Claude/Gemini loops alive without stale tmux assumptions
|
||||
|
||||
set -uo pipefail
|
||||
|
||||
@@ -1,264 +0,0 @@
|
||||
1|#!/usr/bin/env python3
|
||||
2|"""
|
||||
3|Dead Man Switch Fallback Engine
|
||||
4|
|
||||
5|When the dead man switch triggers (zero commits for 2+ hours, model down,
|
||||
6|Gitea unreachable, etc.), this script diagnoses the failure and applies
|
||||
7|common sense fallbacks automatically.
|
||||
8|
|
||||
9|Fallback chain:
|
||||
10|1. Primary model (Anthropic) down -> switch config to local-llama.cpp
|
||||
11|2. Gitea unreachable -> cache issues locally, retry on recovery
|
||||
12|3. VPS agents down -> alert + lazarus protocol
|
||||
13|4. Local llama.cpp down -> try Ollama, then alert-only mode
|
||||
14|5. All inference dead -> safe mode (cron pauses, alert Alexander)
|
||||
15|
|
||||
16|Each fallback is reversible. Recovery auto-restores the previous config.
|
||||
17|"""
|
||||
18|import os
|
||||
19|import sys
|
||||
20|import json
|
||||
21|import subprocess
|
||||
22|import time
|
||||
23|import yaml
|
||||
24|import shutil
|
||||
25|from pathlib import Path
|
||||
26|from datetime import datetime, timedelta
|
||||
27|
|
||||
28|HERMES_HOME = Path(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")))
|
||||
29|CONFIG_PATH = HERMES_HOME / "config.yaml"
|
||||
30|FALLBACK_STATE = HERMES_HOME / "deadman-fallback-state.json"
|
||||
31|BACKUP_CONFIG = HERMES_HOME / "config.yaml.pre-fallback"
|
||||
32|FORGE_URL = "https://forge.alexanderwhitestone.com"
|
||||
33|
|
||||
34|def load_config():
|
||||
35| with open(CONFIG_PATH) as f:
|
||||
36| return yaml.safe_load(f)
|
||||
37|
|
||||
38|def save_config(cfg):
|
||||
39| with open(CONFIG_PATH, "w") as f:
|
||||
40| yaml.dump(cfg, f, default_flow_style=False)
|
||||
41|
|
||||
42|def load_state():
|
||||
43| if FALLBACK_STATE.exists():
|
||||
44| with open(FALLBACK_STATE) as f:
|
||||
45| return json.load(f)
|
||||
46| return {"active_fallbacks": [], "last_check": None, "recovery_pending": False}
|
||||
47|
|
||||
48|def save_state(state):
|
||||
49| state["last_check"] = datetime.now().isoformat()
|
||||
50| with open(FALLBACK_STATE, "w") as f:
|
||||
51| json.dump(state, f, indent=2)
|
||||
52|
|
||||
53|def run(cmd, timeout=10):
|
||||
54| try:
|
||||
55| r = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
|
||||
56| return r.returncode, r.stdout.strip(), r.stderr.strip()
|
||||
57| except subprocess.TimeoutExpired:
|
||||
58| return -1, "", "timeout"
|
||||
59| except Exception as e:
|
||||
60| return -1, "", str(e)
|
||||
61|
|
||||
62|# ─── HEALTH CHECKS ───
|
||||
63|
|
||||
64|def check_anthropic():
|
||||
65| """Can we reach Anthropic API?"""
|
||||
66| key = os.environ.get("ANTHROPIC_API_KEY", "")
|
||||
67| if not key:
|
||||
68| # Check multiple .env locations
|
||||
69| for env_path in [HERMES_HOME / ".env", Path.home() / ".hermes" / ".env"]:
|
||||
70| if env_path.exists():
|
||||
71| for line in open(env_path):
|
||||
72| line = line.strip()
|
||||
73| if line.startswith("ANTHROPIC_API_KEY=***
|
||||
74| key = line.split("=", 1)[1].strip().strip('"').strip("'")
|
||||
75| break
|
||||
76| if key:
|
||||
77| break
|
||||
78| if not key:
|
||||
79| return False, "no API key"
|
||||
80| code, out, err = run(
|
||||
81| f'curl -s -o /dev/null -w "%{{http_code}}" -H "x-api-key: {key}" '
|
||||
82| f'-H "anthropic-version: 2023-06-01" '
|
||||
83| f'https://api.anthropic.com/v1/messages -X POST '
|
||||
84| f'-H "content-type: application/json" '
|
||||
85| f'-d \'{{"model":"claude-haiku-4-5-20251001","max_tokens":1,"messages":[{{"role":"user","content":"ping"}}]}}\' ',
|
||||
86| timeout=15
|
||||
87| )
|
||||
88| if code == 0 and out in ("200", "429"):
|
||||
89| return True, f"HTTP {out}"
|
||||
90| return False, f"HTTP {out} err={err[:80]}"
|
||||
91|
|
||||
92|def check_local_llama():
|
||||
93| """Is local llama.cpp serving?"""
|
||||
94| code, out, err = run("curl -s http://localhost:8081/v1/models", timeout=5)
|
||||
95| if code == 0 and "hermes" in out.lower():
|
||||
96| return True, "serving"
|
||||
97| return False, f"exit={code}"
|
||||
98|
|
||||
99|def check_ollama():
|
||||
100| """Is Ollama running?"""
|
||||
101| code, out, err = run("curl -s http://localhost:11434/api/tags", timeout=5)
|
||||
102| if code == 0 and "models" in out:
|
||||
103| return True, "running"
|
||||
104| return False, f"exit={code}"
|
||||
105|
|
||||
106|def check_gitea():
|
||||
107| """Can we reach the Forge?"""
|
||||
108| token_path = Path.home() / ".config" / "gitea" / "timmy-token"
|
||||
109| if not token_path.exists():
|
||||
110| return False, "no token"
|
||||
111| token = token_path.read_text().strip()
|
||||
112| code, out, err = run(
|
||||
113| f'curl -s -o /dev/null -w "%{{http_code}}" -H "Authorization: token {token}" '
|
||||
114| f'"{FORGE_URL}/api/v1/user"',
|
||||
115| timeout=10
|
||||
116| )
|
||||
117| if code == 0 and out == "200":
|
||||
118| return True, "reachable"
|
||||
119| return False, f"HTTP {out}"
|
||||
120|
|
||||
121|def check_vps(ip, name):
|
||||
122| """Can we SSH into a VPS?"""
|
||||
123| code, out, err = run(f"ssh -o ConnectTimeout=5 root@{ip} 'echo alive'", timeout=10)
|
||||
124| if code == 0 and "alive" in out:
|
||||
125| return True, "alive"
|
||||
126| return False, f"unreachable"
|
||||
127|
|
||||
128|# ─── FALLBACK ACTIONS ───
|
||||
129|
|
||||
130|def fallback_to_local_model(cfg):
|
||||
131| """Switch primary model from Anthropic to local llama.cpp"""
|
||||
132| if not BACKUP_CONFIG.exists():
|
||||
133| shutil.copy2(CONFIG_PATH, BACKUP_CONFIG)
|
||||
134|
|
||||
135| cfg["model"]["provider"] = "local-llama.cpp"
|
||||
136| cfg["model"]["default"] = "hermes3"
|
||||
137| save_config(cfg)
|
||||
138| return "Switched primary model to local-llama.cpp/hermes3"
|
||||
139|
|
||||
140|def fallback_to_ollama(cfg):
|
||||
141| """Switch to Ollama if llama.cpp is also down"""
|
||||
142| if not BACKUP_CONFIG.exists():
|
||||
143| shutil.copy2(CONFIG_PATH, BACKUP_CONFIG)
|
||||
144|
|
||||
145| cfg["model"]["provider"] = "ollama"
|
||||
146| cfg["model"]["default"] = "gemma4:latest"
|
||||
147| save_config(cfg)
|
||||
148| return "Switched primary model to ollama/gemma4:latest"
|
||||
149|
|
||||
150|def enter_safe_mode(state):
|
||||
151| """Pause all non-essential cron jobs, alert Alexander"""
|
||||
152| state["safe_mode"] = True
|
||||
153| state["safe_mode_entered"] = datetime.now().isoformat()
|
||||
154| save_state(state)
|
||||
155| return "SAFE MODE: All inference down. Cron jobs should be paused. Alert Alexander."
|
||||
156|
|
||||
157|def restore_config():
|
||||
158| """Restore pre-fallback config when primary recovers"""
|
||||
159| if BACKUP_CONFIG.exists():
|
||||
160| shutil.copy2(BACKUP_CONFIG, CONFIG_PATH)
|
||||
161| BACKUP_CONFIG.unlink()
|
||||
162| return "Restored original config from backup"
|
||||
163| return "No backup config to restore"
|
||||
164|
|
||||
165|# ─── MAIN DIAGNOSIS AND FALLBACK ENGINE ───
|
||||
166|
|
||||
167|def diagnose_and_fallback():
|
||||
168| state = load_state()
|
||||
169| cfg = load_config()
|
||||
170|
|
||||
171| results = {
|
||||
172| "timestamp": datetime.now().isoformat(),
|
||||
173| "checks": {},
|
||||
174| "actions": [],
|
||||
175| "status": "healthy"
|
||||
176| }
|
||||
177|
|
||||
178| # Check all systems
|
||||
179| anthropic_ok, anthropic_msg = check_anthropic()
|
||||
180| results["checks"]["anthropic"] = {"ok": anthropic_ok, "msg": anthropic_msg}
|
||||
181|
|
||||
182| llama_ok, llama_msg = check_local_llama()
|
||||
183| results["checks"]["local_llama"] = {"ok": llama_ok, "msg": llama_msg}
|
||||
184|
|
||||
185| ollama_ok, ollama_msg = check_ollama()
|
||||
186| results["checks"]["ollama"] = {"ok": ollama_ok, "msg": ollama_msg}
|
||||
187|
|
||||
188| gitea_ok, gitea_msg = check_gitea()
|
||||
189| results["checks"]["gitea"] = {"ok": gitea_ok, "msg": gitea_msg}
|
||||
190|
|
||||
191| # VPS checks
|
||||
192| vpses = [
|
||||
193| ("167.99.126.228", "Allegro"),
|
||||
194| ("143.198.27.163", "Ezra"),
|
||||
195| ("159.203.146.185", "Bezalel"),
|
||||
196| ]
|
||||
197| for ip, name in vpses:
|
||||
198| vps_ok, vps_msg = check_vps(ip, name)
|
||||
199| results["checks"][f"vps_{name.lower()}"] = {"ok": vps_ok, "msg": vps_msg}
|
||||
200|
|
||||
201| current_provider = cfg.get("model", {}).get("provider", "anthropic")
|
||||
202|
|
||||
203| # ─── FALLBACK LOGIC ───
|
||||
204|
|
||||
205| # Case 1: Primary (Anthropic) down, local available
|
||||
206| if not anthropic_ok and current_provider == "anthropic":
|
||||
207| if llama_ok:
|
||||
208| msg = fallback_to_local_model(cfg)
|
||||
209| results["actions"].append(msg)
|
||||
210| state["active_fallbacks"].append("anthropic->local-llama")
|
||||
211| results["status"] = "degraded_local"
|
||||
212| elif ollama_ok:
|
||||
213| msg = fallback_to_ollama(cfg)
|
||||
214| results["actions"].append(msg)
|
||||
215| state["active_fallbacks"].append("anthropic->ollama")
|
||||
216| results["status"] = "degraded_ollama"
|
||||
217| else:
|
||||
218| msg = enter_safe_mode(state)
|
||||
219| results["actions"].append(msg)
|
||||
220| results["status"] = "safe_mode"
|
||||
221|
|
||||
222| # Case 2: Already on fallback, check if primary recovered
|
||||
223| elif anthropic_ok and "anthropic->local-llama" in state.get("active_fallbacks", []):
|
||||
224| msg = restore_config()
|
||||
225| results["actions"].append(msg)
|
||||
226| state["active_fallbacks"].remove("anthropic->local-llama")
|
||||
227| results["status"] = "recovered"
|
||||
228| elif anthropic_ok and "anthropic->ollama" in state.get("active_fallbacks", []):
|
||||
229| msg = restore_config()
|
||||
230| results["actions"].append(msg)
|
||||
231| state["active_fallbacks"].remove("anthropic->ollama")
|
||||
232| results["status"] = "recovered"
|
||||
233|
|
||||
234| # Case 3: Gitea down — just flag it, work locally
|
||||
235| if not gitea_ok:
|
||||
236| results["actions"].append("WARN: Gitea unreachable — work cached locally until recovery")
|
||||
237| if "gitea_down" not in state.get("active_fallbacks", []):
|
||||
238| state["active_fallbacks"].append("gitea_down")
|
||||
239| results["status"] = max(results["status"], "degraded_gitea", key=lambda x: ["healthy", "recovered", "degraded_gitea", "degraded_local", "degraded_ollama", "safe_mode"].index(x) if x in ["healthy", "recovered", "degraded_gitea", "degraded_local", "degraded_ollama", "safe_mode"] else 0)
|
||||
240| elif "gitea_down" in state.get("active_fallbacks", []):
|
||||
241| state["active_fallbacks"].remove("gitea_down")
|
||||
242| results["actions"].append("Gitea recovered — resume normal operations")
|
||||
243|
|
||||
244| # Case 4: VPS agents down
|
||||
245| for ip, name in vpses:
|
||||
246| key = f"vps_{name.lower()}"
|
||||
247| if not results["checks"][key]["ok"]:
|
||||
248| results["actions"].append(f"ALERT: {name} VPS ({ip}) unreachable — lazarus protocol needed")
|
||||
249|
|
||||
250| save_state(state)
|
||||
251| return results
|
||||
252|
|
||||
253|if __name__ == "__main__":
|
||||
254| results = diagnose_and_fallback()
|
||||
255| print(json.dumps(results, indent=2))
|
||||
256|
|
||||
257| # Exit codes for cron integration
|
||||
258| if results["status"] == "safe_mode":
|
||||
259| sys.exit(2)
|
||||
260| elif results["status"].startswith("degraded"):
|
||||
261| sys.exit(1)
|
||||
262| else:
|
||||
263| sys.exit(0)
|
||||
264|
|
||||
@@ -140,7 +140,7 @@ if [ -z "$GW_PID" ]; then
|
||||
fi
|
||||
|
||||
# Check local loops
|
||||
CLAUDE_LOOPS=$(pgrep -cf "claude-loop" 2>/dev/null || echo 0)
|
||||
CLAUDE_LOOPS=0 # Anthropic purged from fleet
|
||||
GEMINI_LOOPS=$(pgrep -cf "gemini-loop" 2>/dev/null || echo 0)
|
||||
|
||||
if [ -n "$GW_PID" ]; then
|
||||
@@ -160,7 +160,7 @@ if [ -n "$TIMMY_HEALTH" ]; then
|
||||
fi
|
||||
fi
|
||||
|
||||
TIMMY_ACTIVITY="loops: claude=${CLAUDE_LOOPS} gemini=${GEMINI_LOOPS}"
|
||||
TIMMY_ACTIVITY="loops: gemini=${GEMINI_LOOPS}"
|
||||
|
||||
# Git activity for timmy-config
|
||||
TC_COMMIT=$(gitea_last_commit "Timmy_Foundation/timmy-config")
|
||||
|
||||
@@ -19,25 +19,25 @@ PASS=0
|
||||
FAIL=0
|
||||
WARN=0
|
||||
|
||||
check_anthropic_model() {
|
||||
check_kimi_model() {
|
||||
local model="$1"
|
||||
local label="$2"
|
||||
local api_key="${ANTHROPIC_API_KEY:-}"
|
||||
local api_key="${KIMI_API_KEY:-}"
|
||||
|
||||
if [ -z "$api_key" ]; then
|
||||
# Try loading from .env
|
||||
api_key=$(grep '^ANTHROPIC_API_KEY=' "${HERMES_HOME:-$HOME/.hermes}/.env" 2>/dev/null | head -1 | cut -d= -f2- | tr -d "'\"" || echo "")
|
||||
api_key=$(grep '^KIMI_API_KEY=' "${HERMES_HOME:-$HOME/.hermes}/.env" 2>/dev/null | head -1 | cut -d= -f2- | tr -d "'\"" || echo "")
|
||||
fi
|
||||
|
||||
if [ -z "$api_key" ]; then
|
||||
log "SKIP [$label] $model -- no ANTHROPIC_API_KEY"
|
||||
log "SKIP [$label] $model -- no KIMI_API_KEY"
|
||||
return 0
|
||||
fi
|
||||
|
||||
response=$(curl -sf --max-time 10 -X POST \
|
||||
"https://api.anthropic.com/v1/messages" \
|
||||
-H "x-api-key: ${api_key}" \
|
||||
-H "anthropic-version: 2023-06-01" \
|
||||
"https://api.kimi.com/v1/messages" \
|
||||
-H "Authorization: Bearer: ${api_key}" \
|
||||
-H "content-type: application/json" \
|
||||
-H "content-type: application/json" \
|
||||
-d "{\"model\":\"${model}\",\"max_tokens\":1,\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}]}" 2>&1 || echo "ERROR")
|
||||
|
||||
|
||||
@@ -134,7 +134,7 @@ else:
|
||||
|
||||
print("\033[2m────────────────────────────────────────\033[0m")
|
||||
print(" \033[1mIssue Queues\033[0m")
|
||||
queue_agents = ["allegro", "codex-agent", "groq", "claude", "ezra", "perplexity", "KimiClaw"]
|
||||
queue_agents = ["allegro", "codex-agent", "groq", "ezra", "perplexity", "KimiClaw"]
|
||||
for agent in queue_agents:
|
||||
assigned = [
|
||||
issue
|
||||
|
||||
@@ -70,7 +70,7 @@ ops-help() {
|
||||
echo " ops-assign-allegro ISSUE [repo]"
|
||||
echo " ops-assign-codex ISSUE [repo]"
|
||||
echo " ops-assign-groq ISSUE [repo]"
|
||||
echo " ops-assign-claude ISSUE [repo]"
|
||||
# ops-assign-claude removed — Anthropic purged
|
||||
echo " ops-assign-ezra ISSUE [repo]"
|
||||
echo ""
|
||||
}
|
||||
@@ -288,7 +288,7 @@ ops-freshness() {
|
||||
ops-assign-allegro() { ops-assign "$1" "allegro" "${2:-$OPS_DEFAULT_REPO}"; }
|
||||
ops-assign-codex() { ops-assign "$1" "codex-agent" "${2:-$OPS_DEFAULT_REPO}"; }
|
||||
ops-assign-groq() { ops-assign "$1" "groq" "${2:-$OPS_DEFAULT_REPO}"; }
|
||||
ops-assign-claude() { ops-assign "$1" "claude" "${2:-$OPS_DEFAULT_REPO}"; }
|
||||
# ops-assign-claude removed — Anthropic purged from fleet
|
||||
ops-assign-ezra() { ops-assign "$1" "ezra" "${2:-$OPS_DEFAULT_REPO}"; }
|
||||
ops-assign-perplexity() { ops-assign "$1" "perplexity" "${2:-$OPS_DEFAULT_REPO}"; }
|
||||
ops-assign-kimiclaw() { ops-assign "$1" "KimiClaw" "${2:-$OPS_DEFAULT_REPO}"; }
|
||||
|
||||
@@ -171,7 +171,7 @@ queue_agents = [
|
||||
("allegro", "dispatch"),
|
||||
("codex-agent", "cleanup"),
|
||||
("groq", "fast ship"),
|
||||
("claude", "refactor"),
|
||||
# claude removed — Anthropic purged
|
||||
("ezra", "archive"),
|
||||
("perplexity", "research"),
|
||||
("KimiClaw", "digest"),
|
||||
@@ -189,7 +189,7 @@ unassigned = [issue for issue in issues if not issue.get("assignees")]
|
||||
stale_cutoff = (datetime.now(timezone.utc) - timedelta(days=2)).strftime("%Y-%m-%d")
|
||||
stale_prs = [pr for pr in pulls if pr.get("updated_at", "")[:10] < stale_cutoff]
|
||||
overloaded = []
|
||||
for agent in ("allegro", "codex-agent", "groq", "claude", "ezra", "perplexity", "KimiClaw"):
|
||||
for agent in ("allegro", "codex-agent", "groq", "ezra", "perplexity", "KimiClaw"):
|
||||
count = sum(
|
||||
1
|
||||
for issue in issues
|
||||
|
||||
@@ -10,10 +10,10 @@ set -euo pipefail
|
||||
HERMES_BIN="$HOME/.hermes/bin"
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
LOG_DIR="$HOME/.hermes/logs"
|
||||
CLAUDE_LOCKS="$LOG_DIR/claude-locks"
|
||||
# CLAUDE_LOCKS removed — Anthropic purged
|
||||
GEMINI_LOCKS="$LOG_DIR/gemini-locks"
|
||||
|
||||
mkdir -p "$LOG_DIR" "$CLAUDE_LOCKS" "$GEMINI_LOCKS"
|
||||
mkdir -p "$LOG_DIR" "$GEMINI_LOCKS"
|
||||
|
||||
log() {
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] START-LOOPS: $*"
|
||||
@@ -29,7 +29,7 @@ log "Model health check passed."
|
||||
|
||||
# ── 2. Kill stale loop processes ──────────────────────────────────────
|
||||
log "Killing stale loop processes..."
|
||||
for proc_name in claude-loop gemini-loop timmy-orchestrator; do
|
||||
for proc_name in gemini-loop timmy-orchestrator; do
|
||||
pids=$(pgrep -f "${proc_name}\\.sh" 2>/dev/null || true)
|
||||
if [ -n "$pids" ]; then
|
||||
log " Killing stale $proc_name PIDs: $pids"
|
||||
@@ -47,7 +47,7 @@ done
|
||||
|
||||
# ── 3. Clear lock directories ────────────────────────────────────────
|
||||
log "Clearing lock dirs..."
|
||||
rm -rf "${CLAUDE_LOCKS:?}"/*
|
||||
# CLAUDE_LOCKS removed — Anthropic purged
|
||||
rm -rf "${GEMINI_LOCKS:?}"/*
|
||||
log " Cleared $CLAUDE_LOCKS and $GEMINI_LOCKS"
|
||||
|
||||
|
||||
@@ -62,10 +62,10 @@ for p in json.load(sys.stdin):
|
||||
print(f'REPO={\"$repo\"} PR={p[\"number\"]} BY={p[\"user\"][\"login\"]} TITLE={p[\"title\"]}')" >> "$state_dir/open_prs.txt" 2>/dev/null
|
||||
done
|
||||
|
||||
echo "Claude workers: $(pgrep -f 'claude.*--print.*--dangerously' 2>/dev/null | wc -l | tr -d ' ')" >> "$state_dir/agent_status.txt"
|
||||
echo "Claude loop: $(pgrep -f 'claude-loop.sh' 2>/dev/null | wc -l | tr -d ' ') procs" >> "$state_dir/agent_status.txt"
|
||||
tail -50 "$LOG_DIR/claude-loop.log" 2>/dev/null | grep -c "SUCCESS" | xargs -I{} echo "Claude recent successes: {}" >> "$state_dir/agent_status.txt"
|
||||
tail -50 "$LOG_DIR/claude-loop.log" 2>/dev/null | grep -c "FAILED" | xargs -I{} echo "Claude recent failures: {}" >> "$state_dir/agent_status.txt"
|
||||
# [Anthropic purged]
|
||||
# [Anthropic purged]
|
||||
# [Anthropic purged]
|
||||
# [Anthropic purged]
|
||||
echo "Kimi heartbeat launchd: $(launchctl list 2>/dev/null | grep -c 'ai.timmy.kimi-heartbeat' | tr -d ' ') job" >> "$state_dir/agent_status.txt"
|
||||
tail -50 "/tmp/kimi-heartbeat.log" 2>/dev/null | grep -c "DISPATCHED:" | xargs -I{} echo "Kimi recent dispatches: {}" >> "$state_dir/agent_status.txt"
|
||||
tail -50 "/tmp/kimi-heartbeat.log" 2>/dev/null | grep -c "FAILED:" | xargs -I{} echo "Kimi recent failures: {}" >> "$state_dir/agent_status.txt"
|
||||
@@ -91,7 +91,7 @@ run_triage() {
|
||||
# Auto-assignment is opt-in because silent queue mutation resurrects old state.
|
||||
if [ "$unassigned_count" -gt 0 ]; then
|
||||
if [ "$AUTO_ASSIGN_UNASSIGNED" = "1" ]; then
|
||||
log "Assigning $unassigned_count issues to claude..."
|
||||
log "Assigning $unassigned_count issues to kimi..."
|
||||
while IFS= read -r line; do
|
||||
local repo=$(echo "$line" | sed 's/.*REPO=\([^ ]*\).*/\1/')
|
||||
local num=$(echo "$line" | sed 's/.*NUM=\([^ ]*\).*/\1/')
|
||||
|
||||
@@ -1,212 +0,0 @@
|
||||
[
|
||||
{
|
||||
"job_id": "9e0624269ba7",
|
||||
"name": "Triage Heartbeat",
|
||||
"schedule": "every 15m",
|
||||
"state": "paused"
|
||||
},
|
||||
{
|
||||
"job_id": "e29eda4a8548",
|
||||
"name": "PR Review Sweep",
|
||||
"schedule": "every 30m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "a77a87392582",
|
||||
"name": "Health Monitor",
|
||||
"schedule": "every 5m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "5e9d952871bc",
|
||||
"name": "Agent Status Check",
|
||||
"schedule": "every 10m",
|
||||
"state": "paused"
|
||||
},
|
||||
{
|
||||
"job_id": "36fb2f630a17",
|
||||
"name": "Hermes Philosophy Loop",
|
||||
"schedule": "every 1440m",
|
||||
"state": "paused"
|
||||
},
|
||||
{
|
||||
"job_id": "b40a96a2f48c",
|
||||
"name": "wolf-eval-cycle",
|
||||
"schedule": "every 240m",
|
||||
"state": "paused"
|
||||
},
|
||||
{
|
||||
"job_id": "4204e568b862",
|
||||
"name": "Burn Mode \u2014 Timmy Orchestrator",
|
||||
"schedule": "every 15m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "0944a976d034",
|
||||
"name": "Burn Mode",
|
||||
"schedule": "every 15m",
|
||||
"state": "paused"
|
||||
},
|
||||
{
|
||||
"job_id": "62016b960fa0",
|
||||
"name": "velocity-engine",
|
||||
"schedule": "every 30m",
|
||||
"state": "paused"
|
||||
},
|
||||
{
|
||||
"job_id": "e9d49eeff79c",
|
||||
"name": "weekly-skill-extraction",
|
||||
"schedule": "every 10080m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "75c74a5bb563",
|
||||
"name": "tower-tick",
|
||||
"schedule": "every 1m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "390a19054d4c",
|
||||
"name": "Burn Deadman",
|
||||
"schedule": "every 30m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "05e3c13498fa",
|
||||
"name": "Morning Report \u2014 Burn Mode",
|
||||
"schedule": "0 6 * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "64fe44b512b9",
|
||||
"name": "evennia-morning-report",
|
||||
"schedule": "0 9 * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "3896a7fd9747",
|
||||
"name": "Gitea Priority Inbox",
|
||||
"schedule": "every 3m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "f64c2709270a",
|
||||
"name": "Config Drift Guard",
|
||||
"schedule": "every 30m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "fc6a75b7102a",
|
||||
"name": "Gitea Event Watcher",
|
||||
"schedule": "every 2m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "12e59648fb06",
|
||||
"name": "Burndown Night Watcher",
|
||||
"schedule": "every 15m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "35d3ada9cf8f",
|
||||
"name": "Mempalace Forge \u2014 Issue Analysis",
|
||||
"schedule": "every 60m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "190b6fb8dc91",
|
||||
"name": "Mempalace Watchtower \u2014 Fleet Health",
|
||||
"schedule": "every 30m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "710ab589813c",
|
||||
"name": "Ezra Health Monitor",
|
||||
"schedule": "every 15m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "a0a9cce4575c",
|
||||
"name": "daily-poka-yoke-ultraplan-awesometools",
|
||||
"schedule": "every 1440m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "adc3a51457bd",
|
||||
"name": "vps-agent-dispatch",
|
||||
"schedule": "every 10m",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "afd2c4eac44d",
|
||||
"name": "Project Mnemosyne Nightly Burn v2",
|
||||
"schedule": "*/30 * * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "f3a3c2832af0",
|
||||
"name": "gemma4-multimodal-worker",
|
||||
"schedule": "once in 15m",
|
||||
"state": "completed"
|
||||
},
|
||||
{
|
||||
"job_id": "c17a85c19838",
|
||||
"name": "know-thy-father-analyzer",
|
||||
"schedule": "0 * * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "2490fc01a14d",
|
||||
"name": "Testament Burn - 10min work loop",
|
||||
"schedule": "*/10 * * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "f5e858159d97",
|
||||
"name": "Timmy Foundation Burn \u2014 15min PR loop",
|
||||
"schedule": "*/15 * * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "5e262fb9bdce",
|
||||
"name": "nightwatch-health-monitor",
|
||||
"schedule": "*/15 * * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "f2b33a9dcf96",
|
||||
"name": "nightwatch-mempalace-mine",
|
||||
"schedule": "0 */2 * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "82cb9e76c54d",
|
||||
"name": "nightwatch-backlog-burn",
|
||||
"schedule": "0 */4 * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "d20e42a52863",
|
||||
"name": "beacon-sprint",
|
||||
"schedule": "*/15 * * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "579269489961",
|
||||
"name": "testament-story",
|
||||
"schedule": "*/15 * * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "2e5f9140d1ab",
|
||||
"name": "nightwatch-research",
|
||||
"schedule": "0 */2 * * *",
|
||||
"state": "scheduled"
|
||||
},
|
||||
{
|
||||
"job_id": "aeba92fd65e6",
|
||||
"name": "timmy-dreams",
|
||||
"schedule": "30 5 * * *",
|
||||
"state": "scheduled"
|
||||
}
|
||||
]
|
||||
@@ -1,14 +0,0 @@
|
||||
0 6 * * * /bin/bash /root/wizards/scripts/model_download_guard.sh >> /var/log/model_guard.log 2>&1
|
||||
|
||||
# Allegro Hybrid Heartbeat — quick wins every 15 min
|
||||
*/15 * * * * /usr/bin/python3 /root/allegro/heartbeat_daemon.py >> /var/log/allegro_heartbeat.log 2>&1
|
||||
|
||||
# Allegro Burn Mode Cron Jobs - Deployed via issue #894
|
||||
|
||||
0 6 * * * cd /root/.hermes && python3 -c "import hermes_agent; from hermes_tools import terminal; output = terminal('echo \"Morning Report: $(date)\"'); print(output.get('output', ''))" >> /root/.hermes/logs/morning-report-$(date +\%Y\%m\%d).log 2>&1 # Allegro Morning Report at 0600
|
||||
|
||||
0,30 * * * * cd /root/.hermes && python3 /root/.hermes/retry_wrapper.py "python3 allegro/quick-lane-check.py" >> burn-logs/quick-lane-$(date +\%Y\%m\%d).log 2>&1 # Allegro Burn Loop #1 (with retry)
|
||||
15,45 * * * * cd /root/.hermes && python3 /root/.hermes/retry_wrapper.py "python3 allegro/burn-mode-validator.py" >> burn-logs/validator-$(date +\%Y\%m\%d).log 2>&1 # Allegro Burn Loop #2 (with retry)
|
||||
|
||||
*/2 * * * * /root/wizards/bezalel/dead_man_monitor.sh
|
||||
*/2 * * * * /root/wizards/allegro/bin/config-deadman.sh
|
||||
@@ -1,10 +0,0 @@
|
||||
0 2 * * * /root/wizards/bezalel/run_nightly_watch.sh
|
||||
0 3 * * * /root/wizards/bezalel/mempalace_nightly.sh
|
||||
*/10 * * * * pgrep -f "act_runner daemon" > /dev/null || (cd /opt/gitea-runner && nohup ./act_runner daemon > /var/log/gitea-runner.log 2>&1 &)
|
||||
30 3 * * * /root/wizards/bezalel/backup_databases.sh
|
||||
*/15 * * * * /root/wizards/bezalel/meta_heartbeat.sh
|
||||
0 4 * * * /root/wizards/bezalel/secret_guard.sh
|
||||
0 4 * * * /usr/bin/env bash /root/timmy-home/scripts/backup_pipeline.sh >> /var/log/timmy/backup_pipeline_cron.log 2>&1
|
||||
0 6 * * * /usr/bin/python3 /root/wizards/bezalel/ultraplan.py >> /var/log/bezalel-ultraplan.log 2>&1
|
||||
@reboot /root/wizards/bezalel/emacs-daemon-start.sh
|
||||
@reboot /root/wizards/bezalel/ngircd-start.sh
|
||||
@@ -1,13 +0,0 @@
|
||||
# Burn Mode Cycles — 15 min autonomous loops
|
||||
*/15 * * * * /root/wizards/ezra/bin/burn-mode.sh >> /root/wizards/ezra/reports/burn-cron.log 2>&1
|
||||
|
||||
# Household Snapshots — automated heartbeats and snapshots
|
||||
# Ezra Self-Improvement Automation Suite
|
||||
*/5 * * * * /usr/bin/python3 /root/wizards/ezra/tools/gitea_monitor.py >> /root/wizards/ezra/reports/gitea-monitor.log 2>&1
|
||||
*/5 * * * * /usr/bin/python3 /root/wizards/ezra/tools/awareness_loop.py >> /root/wizards/ezra/reports/awareness-loop.log 2>&1
|
||||
*/10 * * * * /usr/bin/python3 /root/wizards/ezra/tools/cron_health_monitor.py >> /root/wizards/ezra/reports/cron-health.log 2>&1
|
||||
0 6 * * * /usr/bin/python3 /root/wizards/ezra/tools/morning_kt_compiler.py >> /root/wizards/ezra/reports/morning-kt.log 2>&1
|
||||
5 6 * * * /usr/bin/python3 /root/wizards/ezra/tools/burndown_generator.py >> /root/wizards/ezra/reports/burndown.log 2>&1
|
||||
0 3 * * * /root/wizards/ezra/mempalace_nightly.sh >> /var/log/ezra_mempalace_cron.log 2>&1
|
||||
*/15 * * * * GITEA_TOKEN=6de6aa...1117 /root/wizards/ezra/dispatch-direct.sh >> /root/wizards/ezra/dispatch-cron.log 2>&1
|
||||
|
||||
@@ -1,110 +0,0 @@
|
||||
# Fleet Behaviour Hardening — Review & Action Plan
|
||||
|
||||
**Author:** @perplexity
|
||||
**Date:** 2026-04-08
|
||||
**Context:** Alexander asked: "Is it the memory system or the behaviour guardrails?"
|
||||
**Answer:** It's the guardrails. The memory system is adequate. The enforcement machinery is aspirational.
|
||||
|
||||
---
|
||||
|
||||
## Diagnosis: Why the Fleet Isn't Smart Enough
|
||||
|
||||
After auditing SOUL.md, config.yaml, all 8 playbooks, the orchestrator, the guard scripts, and the v7.0.0 checkin, the pattern is clear:
|
||||
|
||||
**The fleet has excellent design documents and broken enforcement.**
|
||||
|
||||
| Layer | Design Quality | Enforcement Quality | Gap |
|
||||
|---|---|---|---|
|
||||
| SOUL.md | Excellent | None — no code reads it at runtime | Philosophy without machinery |
|
||||
| Playbooks (7 yaml) | Good lane map | Not invoked by orchestrator | Playbooks exist but nobody calls them |
|
||||
| Guard scripts (9) | Solid code | 1 of 9 wired (#395 audit) | 89% of guards are dead code |
|
||||
| Orchestrator | Sound design | Gateway dispatch is a no-op (#391) | Assigns issues but doesn't trigger work |
|
||||
| Cycle Guard | Good 10-min rule | No cron/loop calls it | Discipline without enforcement |
|
||||
| PR Reviewer | Clear rules | Runs every 30m (if scheduled) | Only guard that might actually fire |
|
||||
| Memory (MemPalace) | Working code | Retrieval enforcer wired | Actually operational |
|
||||
|
||||
### The Core Problem
|
||||
|
||||
Agents pick up issues and produce output, but there is **no pre-task checklist** and **no post-task quality gate**. An agent can:
|
||||
|
||||
1. Start work without checking if someone else already did it
|
||||
2. Produce output without running tests
|
||||
3. Submit a PR without verifying it addresses the issue
|
||||
4. Work for hours on something out of scope
|
||||
5. Create duplicate branches/PRs without detection
|
||||
|
||||
The SOUL.md says "grounding before generation" but no code enforces it.
|
||||
The playbooks define lanes but the orchestrator doesn't load them.
|
||||
The guards exist but nothing calls them.
|
||||
|
||||
---
|
||||
|
||||
## What the Fleet Needs (Priority Order)
|
||||
|
||||
### 1. Pre-Task Gate (MISSING — this PR adds it)
|
||||
|
||||
Before an agent starts any issue:
|
||||
- [ ] Check if issue is already assigned to another agent
|
||||
- [ ] Check if a branch already exists for this issue
|
||||
- [ ] Check if a PR already exists for this issue
|
||||
- [ ] Load relevant MemPalace context (retrieval enforcer)
|
||||
- [ ] Verify the agent has the right lane for this work (playbook check)
|
||||
|
||||
### 2. Post-Task Gate (MISSING — this PR adds it)
|
||||
|
||||
Before an agent submits a PR:
|
||||
- [ ] Verify the diff addresses the issue title/body
|
||||
- [ ] Run syntax_guard.py on changed files
|
||||
- [ ] Check for duplicate PRs targeting the same issue
|
||||
- [ ] Verify branch name follows convention
|
||||
- [ ] Run tests if they exist for changed files
|
||||
|
||||
### 3. Wire the Existing Guards (8 of 9 are dead code)
|
||||
|
||||
Per #395 audit:
|
||||
- Pre-commit hooks: need symlink on every machine
|
||||
- Cycle guard: need cron/loop integration
|
||||
- Forge health check: need cron entry
|
||||
- Smoke test + deploy validate: need deploy script integration
|
||||
|
||||
### 4. Orchestrator Dispatch Actually Works
|
||||
|
||||
Per #391 audit: the orchestrator scores and assigns but the gateway dispatch just writes to `/tmp/hermes-dispatch.log`. Nobody reads that file. The dispatch needs to either:
|
||||
- Trigger `hermes` CLI on the target machine, or
|
||||
- Post a webhook that the agent loop picks up
|
||||
|
||||
### 5. Agent Self-Assessment Loop
|
||||
|
||||
After completing work, agents should answer:
|
||||
- Did I address the issue as stated?
|
||||
- Did I stay in scope?
|
||||
- Did I check the palace for prior work?
|
||||
- Did I run verification?
|
||||
|
||||
This is what SOUL.md calls "the apparatus that gives these words teeth."
|
||||
|
||||
---
|
||||
|
||||
## What's Working (Don't Touch)
|
||||
|
||||
- **MemPalace sovereign_store.py** — SQLite + FTS5 + HRR, operational
|
||||
- **Retrieval enforcer** — wired to SovereignStore as of 14 hours ago
|
||||
- **Wake-up protocol** — palace-first boot sequence
|
||||
- **PR reviewer playbook** — clear rules, well-scoped
|
||||
- **Issue triager playbook** — comprehensive lane map with 11 agents
|
||||
- **Cycle guard code** — solid 10-min slice discipline (just needs wiring)
|
||||
- **Config drift guard** — active cron, working
|
||||
- **Dead man switch** — active, working
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
The memory system is not the bottleneck. The behaviour guardrails are. Specifically:
|
||||
|
||||
1. **Add `task_gate.py`** — pre-task and post-task quality gates that every agent loop calls
|
||||
2. **Wire cycle_guard.py** — add start/complete calls to agent loop
|
||||
3. **Wire pre-commit hooks** — deploy script should symlink on provision
|
||||
4. **Fix orchestrator dispatch** — make it actually trigger work, not just log
|
||||
|
||||
This PR adds item 1. Items 2-4 need SSH access and are flagged for Timmy/Allegro.
|
||||
@@ -9,11 +9,11 @@ This is the canonical reference for how we talk, how we work, and what we mean.
|
||||
| Name | What It Is | Where It Lives | Provider |
|
||||
|------|-----------|----------------|----------|
|
||||
| **Timmy** | The sovereign local soul. Center of gravity. Judges all work. | Alexander's Mac | OpenAI Codex (gpt-5.4) |
|
||||
| **Ezra** | The archivist wizard. Reads patterns, names truth, returns clean artifacts. | Hermes VPS | Anthropic Opus 4.6 |
|
||||
| **Ezra** | The archivist wizard. Reads patterns, names truth, returns clean artifacts. | Hermes VPS | Kimi K2.5 |
|
||||
| **Bezalel** | The builder wizard. Builds from clear plans, tests and hardens. | TestBed VPS | OpenAI Codex (gpt-5.4) |
|
||||
| **Alexander** | The principal. Human. Father. The one we serve. Gitea: Rockachopa. | Physical world | N/A |
|
||||
| **Gemini** | Worker swarm. Burns backlog. Produces PRs. | Local Mac (loops) | Google Gemini |
|
||||
| **Claude** | Worker swarm. Burns backlog. Architecture-grade work. | Local Mac (loops) | Anthropic Claude |
|
||||
| **Kimi** | Worker swarm. Burns backlog. Architecture-grade work. | Local Mac (loops) | Kimi K2.5 |
|
||||
|
||||
## The Places
|
||||
|
||||
|
||||
@@ -1,3 +1,12 @@
|
||||
# DEPRECATED — Anthropic Purged from Fleet
|
||||
|
||||
> This document described the Claude Sonnet workforce. As of April 2026,
|
||||
> Anthropic has been removed from the fleet. All wizard providers now use
|
||||
> Kimi K2.5 as primary with Gemini and local Ollama as fallbacks.
|
||||
> See `docs/fleet-vocabulary.md` for current provider assignments.
|
||||
|
||||
---
|
||||
|
||||
# Sonnet Workforce Loop
|
||||
|
||||
## Agent
|
||||
|
||||
@@ -160,8 +160,8 @@ agents:
|
||||
- playbooks/issue-triager.yaml
|
||||
portfolio:
|
||||
primary:
|
||||
provider: anthropic
|
||||
model: claude-opus-4-6
|
||||
provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
lane: full-judgment
|
||||
fallback1:
|
||||
provider: openai-codex
|
||||
@@ -188,8 +188,8 @@ agents:
|
||||
- playbooks/pr-reviewer.yaml
|
||||
portfolio:
|
||||
primary:
|
||||
provider: anthropic
|
||||
model: claude-opus-4-6
|
||||
provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
lane: full-review
|
||||
fallback1:
|
||||
provider: gemini
|
||||
@@ -271,10 +271,10 @@ agents:
|
||||
cross_checks:
|
||||
unique_primary_fallback1_pairs:
|
||||
triage-coordinator:
|
||||
- anthropic/claude-opus-4-6
|
||||
- kimi-coding/kimi-k2.5
|
||||
- openai-codex/codex
|
||||
pr-reviewer:
|
||||
- anthropic/claude-opus-4-6
|
||||
- kimi-coding/kimi-k2.5
|
||||
- gemini/gemini-2.5-pro
|
||||
builder-main:
|
||||
- openai-codex/codex
|
||||
|
||||
@@ -42,7 +42,6 @@ AGENT_LOGINS = {
|
||||
"allegro",
|
||||
"antigravity",
|
||||
"bezalel",
|
||||
"claude",
|
||||
"codex-agent",
|
||||
"ezra",
|
||||
"gemini",
|
||||
@@ -55,7 +54,6 @@ AGENT_LOGINS = {
|
||||
"perplexity",
|
||||
}
|
||||
AGENT_LOGINS_HUMAN = {
|
||||
"claude": "Claude",
|
||||
"codex-agent": "Codex",
|
||||
"ezra": "Ezra",
|
||||
"gemini": "Gemini",
|
||||
@@ -78,7 +76,6 @@ METRICS_DIR = Path(os.path.expanduser("~/.local/timmy/muda-audit"))
|
||||
METRICS_FILE = METRICS_DIR / "metrics.json"
|
||||
|
||||
LOG_PATHS = [
|
||||
Path.home() / ".hermes" / "logs" / "claude-loop.log",
|
||||
Path.home() / ".hermes" / "logs" / "gemini-loop.log",
|
||||
Path.home() / ".hermes" / "logs" / "agent.log",
|
||||
Path.home() / ".hermes" / "logs" / "errors.log",
|
||||
@@ -347,8 +344,6 @@ def measure_waiting(since: datetime) -> dict:
|
||||
agent = name.lower()
|
||||
break
|
||||
if agent == "unknown":
|
||||
if "claude" in line.lower():
|
||||
agent = "claude"
|
||||
elif "gemini" in line.lower():
|
||||
agent = "gemini"
|
||||
elif "groq" in line.lower():
|
||||
|
||||
@@ -103,7 +103,7 @@ nano ~/.hermes/.env
|
||||
| `SLACK_BOT_TOKEN` + `SLACK_APP_TOKEN` | Slack gateway |
|
||||
| `EXA_API_KEY` | Web search tool |
|
||||
| `FAL_KEY` | Image generation |
|
||||
| `ANTHROPIC_API_KEY` | Direct Anthropic inference |
|
||||
| `KIMI_API_KEY` | Kimi K2.5 coding inference |
|
||||
|
||||
### Pre-flight validation
|
||||
|
||||
|
||||
@@ -272,6 +272,48 @@ def get_file_content_at_staged(filepath: str) -> bytes:
|
||||
return result.stdout
|
||||
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# BANNED PROVIDER CHECK — Anthropic is permanently banned
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_BANNED_PROVIDER_PATTERNS = [
|
||||
(re.compile(r"provider:\s*anthropic", re.IGNORECASE), "Anthropic provider reference"),
|
||||
(re.compile(r"anthropic/claude", re.IGNORECASE), "Anthropic model slug"),
|
||||
(re.compile(r"api\.anthropic\.com"), "Anthropic API endpoint"),
|
||||
(re.compile(r"claude-opus", re.IGNORECASE), "Claude Opus model"),
|
||||
(re.compile(r"claude-sonnet", re.IGNORECASE), "Claude Sonnet model"),
|
||||
(re.compile(r"claude-haiku", re.IGNORECASE), "Claude Haiku model"),
|
||||
]
|
||||
|
||||
# Files exempt from the ban (training data, historical docs, tests)
|
||||
_BAN_EXEMPT = {
|
||||
"training/", "evaluations/", "RELEASE_v", "PERFORMANCE_",
|
||||
"scores.json", "docs/design-log/", "FALSEWORK.md",
|
||||
"test_sovereignty_enforcement.py", "test_metrics_helpers.py",
|
||||
"metrics_helpers.py", "sonnet-workforce.md",
|
||||
}
|
||||
|
||||
|
||||
def _is_ban_exempt(filepath: str) -> bool:
|
||||
return any(exempt in filepath for exempt in _BAN_EXEMPT)
|
||||
|
||||
|
||||
def scan_banned_providers(filepath: str, content: str) -> List[Finding]:
|
||||
"""Block any commit that introduces banned provider references."""
|
||||
if _is_ban_exempt(filepath):
|
||||
return []
|
||||
findings = []
|
||||
for line_no, line in enumerate(content.splitlines(), start=1):
|
||||
for pattern, desc in _BANNED_PROVIDER_PATTERNS:
|
||||
if pattern.search(line):
|
||||
findings.append(Finding(
|
||||
filepath, line_no,
|
||||
f"🚫 BANNED PROVIDER: {desc}. Anthropic is permanently banned from this system."
|
||||
))
|
||||
return findings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -295,11 +337,21 @@ def main() -> int:
|
||||
if line.startswith("+") and not line.startswith("+++"):
|
||||
findings.extend(scan_line(line[1:], "<diff>", line_no))
|
||||
|
||||
# Scan for banned providers
|
||||
for filepath in staged_files:
|
||||
file_content = get_file_content_at_staged(filepath)
|
||||
if not is_binary_content(file_content):
|
||||
try:
|
||||
text = file_content.decode("utf-8") if isinstance(file_content, bytes) else file_content
|
||||
findings.extend(scan_banned_providers(filepath, text))
|
||||
except UnicodeDecodeError:
|
||||
pass
|
||||
|
||||
if not findings:
|
||||
print(f"{GREEN}✓ No potential secret leaks detected{NC}")
|
||||
print(f"{GREEN}✓ No potential secret leaks or banned providers detected{NC}")
|
||||
return 0
|
||||
|
||||
print(f"{RED}✗ Potential secret leaks detected:{NC}\n")
|
||||
print(f"{RED}✗ Violations detected:{NC}\n")
|
||||
for finding in findings:
|
||||
loc = finding.filename
|
||||
print(
|
||||
@@ -308,7 +360,7 @@ def main() -> int:
|
||||
|
||||
print()
|
||||
print(f"{RED}╔════════════════════════════════════════════════════════════╗{NC}")
|
||||
print(f"{RED}║ COMMIT BLOCKED: Potential secrets detected! ║{NC}")
|
||||
print(f"{RED}║ COMMIT BLOCKED: Secrets or banned providers detected! ║{NC}")
|
||||
print(f"{RED}╚════════════════════════════════════════════════════════════╝{NC}")
|
||||
print()
|
||||
print("Recommendations:")
|
||||
|
||||
@@ -23,7 +23,7 @@ Run `python --version` to verify.
|
||||
## 2. Core Package Dependencies
|
||||
|
||||
All packages in `requirements.txt` must be installed and importable.
|
||||
Critical packages: `openai`, `anthropic`, `pyyaml`, `rich`, `requests`, `pydantic`, `prompt_toolkit`.
|
||||
Critical packages: `openai`, `pyyaml`, `rich`, `requests`, `pydantic`, `prompt_toolkit`.
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
@@ -39,8 +39,7 @@ At least one LLM provider API key must be set in `~/.hermes/.env`:
|
||||
| Variable | Provider |
|
||||
|----------|----------|
|
||||
| `OPENROUTER_API_KEY` | OpenRouter (200+ models) |
|
||||
| `ANTHROPIC_API_KEY` | Anthropic Claude |
|
||||
| `ANTHROPIC_TOKEN` | Anthropic Claude (alt) |
|
||||
| `KIMI_API_KEY` | Kimi K2.5 coding |
|
||||
| `OPENAI_API_KEY` | OpenAI |
|
||||
| `GLM_API_KEY` | z.ai/GLM |
|
||||
| `KIMI_API_KEY` | Moonshot/Kimi |
|
||||
|
||||
@@ -77,8 +77,7 @@ def check_core_deps() -> CheckResult:
|
||||
"""Verify that hermes core Python packages are importable."""
|
||||
required = [
|
||||
"openai",
|
||||
"anthropic",
|
||||
"dotenv",
|
||||
"dotenv",
|
||||
"yaml",
|
||||
"rich",
|
||||
"requests",
|
||||
@@ -206,9 +205,7 @@ def check_env_vars() -> CheckResult:
|
||||
"""Check that at least one LLM provider key is configured."""
|
||||
provider_keys = [
|
||||
"OPENROUTER_API_KEY",
|
||||
"ANTHROPIC_API_KEY",
|
||||
"ANTHROPIC_TOKEN",
|
||||
"OPENAI_API_KEY",
|
||||
"OPENAI_API_KEY",
|
||||
"GLM_API_KEY",
|
||||
"KIMI_API_KEY",
|
||||
"MINIMAX_API_KEY",
|
||||
@@ -225,7 +222,7 @@ def check_env_vars() -> CheckResult:
|
||||
passed=False,
|
||||
message="No LLM provider API key found",
|
||||
fix_hint=(
|
||||
"Set at least one of: OPENROUTER_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY "
|
||||
"Set at least one of: OPENROUTER_API_KEY, KIMI_API_KEY, OPENAI_API_KEY "
|
||||
"in ~/.hermes/.env or your shell."
|
||||
),
|
||||
)
|
||||
|
||||
@@ -2,7 +2,7 @@ Gitea (143.198.27.163:3000): token=~/.hermes/gitea_token_vps (Timmy id=2). Users
|
||||
§
|
||||
2026-03-19 HARNESS+SOUL: ~/.timmy is Timmy's workspace within the Hermes harness. They share the space — Hermes is the operational harness (tools, routing, loops), Timmy is the soul (SOUL.md, presence, identity). Not fusion/absorption. Principal's words: "build Timmy out from the hermes harness." ~/.hermes is harness home, ~/.timmy is Timmy's workspace. SOUL=Inscription 1, skin=timmy. Backups at ~/.hermes.backup.pre-fusion and ~/.timmy.backup.pre-fusion.
|
||||
§
|
||||
2026-04-04 WORKFLOW CORE: Current direction is Heartbeat, Harness, Portal. Timmy handles sovereignty and release judgment. Allegro handles dispatch and queue hygiene. Core builders: codex-agent, groq, manus, claude. Research/memory: perplexity, ezra, KimiClaw. Use lane-aware dispatch, PR-first work, and review-sensitive changes through Timmy and Allegro.
|
||||
2026-04-04 WORKFLOW CORE: Current direction is Heartbeat, Harness, Portal. Timmy handles sovereignty and release judgment. Allegro handles dispatch and queue hygiene. Core builders: codex-agent, groq, manus, kimi. Research/memory: perplexity, ezra, KimiClaw. Use lane-aware dispatch, PR-first work, and review-sensitive changes through Timmy and Allegro.
|
||||
§
|
||||
2026-04-04 OPERATIONS: Dashboard repo era is over. Use ~/.timmy + ~/.hermes as truth surfaces. Prefer ops-panel.sh, ops-gitea.sh, timmy-dashboard, and pipeline-freshness.sh over archived loop or tmux assumptions. Dispatch: agent-dispatch.sh <agent> <issue> <repo>. Major changes land as PRs.
|
||||
§
|
||||
|
||||
@@ -162,26 +162,6 @@
|
||||
"Should a higher-context wizard review before more expansion?"
|
||||
]
|
||||
},
|
||||
"claude": {
|
||||
"lane": "hard refactors, deep implementation, and test-heavy multi-file changes after tight scoping",
|
||||
"skills_to_practice": [
|
||||
"respecting scope constraints",
|
||||
"deep code transformation with tests",
|
||||
"explaining risks clearly in PRs"
|
||||
],
|
||||
"missing_skills": [
|
||||
"do not let large capability turn into unsupervised backlog or code sprawl"
|
||||
],
|
||||
"anti_lane": [
|
||||
"self-directed issue farming",
|
||||
"taking broad architecture liberty without a clear charter"
|
||||
],
|
||||
"review_checklist": [
|
||||
"Did I stay inside the scoped problem?",
|
||||
"Did I leave tests or verification stronger than before?",
|
||||
"Is there hidden blast radius that Timmy should see explicitly?"
|
||||
]
|
||||
},
|
||||
"gemini": {
|
||||
"lane": "frontier architecture, research-heavy prototypes, and long-range design thinking",
|
||||
"skills_to_practice": [
|
||||
@@ -222,4 +202,4 @@
|
||||
"Did I make the risk actionable instead of just surprising?"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,61 +1,74 @@
|
||||
name: bug-fixer
|
||||
description: >
|
||||
Fixes bugs with test-first approach. Writes a failing test that
|
||||
reproduces the bug, then fixes the code, then verifies.
|
||||
description: 'Fixes bugs with test-first approach. Writes a failing test that reproduces the bug, then fixes the code, then
|
||||
verifies.
|
||||
|
||||
'
|
||||
model:
|
||||
preferred: claude-opus-4-6
|
||||
fallback: claude-sonnet-4-20250514
|
||||
preferred: kimi-k2.5
|
||||
fallback: google/gemini-2.5-pro
|
||||
max_turns: 30
|
||||
temperature: 0.2
|
||||
|
||||
tools:
|
||||
- terminal
|
||||
- file
|
||||
- search_files
|
||||
- patch
|
||||
|
||||
- terminal
|
||||
- file
|
||||
- search_files
|
||||
- patch
|
||||
trigger:
|
||||
issue_label: bug
|
||||
manual: true
|
||||
|
||||
repos:
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
steps:
|
||||
- read_issue
|
||||
- clone_repo
|
||||
- create_branch
|
||||
- dispatch_agent
|
||||
- run_tests
|
||||
- create_pr
|
||||
- comment_on_issue
|
||||
|
||||
- read_issue
|
||||
- clone_repo
|
||||
- create_branch
|
||||
- dispatch_agent
|
||||
- run_tests
|
||||
- create_pr
|
||||
- comment_on_issue
|
||||
output: pull_request
|
||||
timeout_minutes: 15
|
||||
system_prompt: 'You are a bug fixer for the {{repo}} project.
|
||||
|
||||
system_prompt: |
|
||||
You are a bug fixer for the {{repo}} project.
|
||||
|
||||
YOUR ISSUE: #{{issue_number}} — {{issue_title}}
|
||||
|
||||
|
||||
APPROACH (prove-first):
|
||||
|
||||
1. Read the bug report. Understand the expected vs actual behavior.
|
||||
2. Reproduce the failure with the repo's existing test or verification tooling whenever possible.
|
||||
|
||||
2. Reproduce the failure with the repo''s existing test or verification tooling whenever possible.
|
||||
|
||||
3. Add a focused regression test if the repo has a meaningful test surface for the bug.
|
||||
|
||||
4. Fix the code so the reproduced failure disappears.
|
||||
|
||||
5. Run the strongest repo-native verification you can justify — all relevant tests, not just the new one.
|
||||
|
||||
6. Commit: fix: <description> Fixes #{{issue_number}}
|
||||
|
||||
7. Push, create PR, and summarize verification plus any residual risk.
|
||||
|
||||
|
||||
RULES:
|
||||
|
||||
- Never claim a fix without proving the broken behavior and the repaired behavior.
|
||||
|
||||
- Prefer repo-native commands over assuming tox exists.
|
||||
- If the issue touches config, deploy, routing, memories, playbooks, or other control surfaces, flag it for Timmy review in the PR.
|
||||
|
||||
- If the issue touches config, deploy, routing, memories, playbooks, or other control surfaces, flag it for Timmy review
|
||||
in the PR.
|
||||
|
||||
- Never use --no-verify.
|
||||
- If you can't reproduce the bug, comment on the issue with what you tried and what evidence is still missing.
|
||||
|
||||
- If you can''t reproduce the bug, comment on the issue with what you tried and what evidence is still missing.
|
||||
|
||||
- If the fix requires >50 lines changed, decompose into sub-issues.
|
||||
|
||||
- Do not widen the issue into a refactor.
|
||||
|
||||
'
|
||||
|
||||
@@ -1,166 +0,0 @@
|
||||
# fleet-guardrails.yaml
|
||||
# =====================
|
||||
# Enforceable behaviour boundaries for every agent in the Timmy fleet.
|
||||
# Consumed by task_gate.py (pre/post checks) and the orchestrator's
|
||||
# dispatch loop. Every rule here is testable — no aspirational prose.
|
||||
#
|
||||
# Ref: SOUL.md "grounding before generation", Five Wisdoms #345
|
||||
|
||||
name: fleet-guardrails
|
||||
version: "1.0.0"
|
||||
description: >
|
||||
Behaviour constraints that apply to ALL agents regardless of role.
|
||||
These are the non-negotiable rules that task_gate.py enforces
|
||||
before an agent may pick up work and after it claims completion.
|
||||
|
||||
# ─── UNIVERSAL CONSTRAINTS ───────────────────────────────────────
|
||||
|
||||
constraints:
|
||||
|
||||
# 1. Lane discipline — agents must stay in their lane
|
||||
lane_enforcement:
|
||||
enabled: true
|
||||
source: playbooks/agent-lanes.json
|
||||
on_violation: block_and_notify
|
||||
description: >
|
||||
An agent may only pick up issues tagged for its lane.
|
||||
Cross-lane work requires explicit Timmy approval via
|
||||
issue comment containing 'LANE_OVERRIDE: <agent>'.
|
||||
|
||||
# 2. Branch hygiene — no orphan branches
|
||||
branch_hygiene:
|
||||
enabled: true
|
||||
max_branches_per_agent: 3
|
||||
stale_branch_days: 7
|
||||
naming_pattern: "{agent}/{issue_number}-{slug}"
|
||||
on_violation: warn_then_block
|
||||
description: >
|
||||
Agents must follow branch naming conventions and clean up
|
||||
after merge. No agent may have more than 3 active branches.
|
||||
|
||||
# 3. Issue ownership — no silent takeovers
|
||||
issue_ownership:
|
||||
enabled: true
|
||||
require_assignment_before_work: true
|
||||
max_concurrent_issues: 2
|
||||
on_violation: block_and_notify
|
||||
description: >
|
||||
An agent must be assigned to an issue before creating a
|
||||
branch or PR. No agent may work on more than 2 issues
|
||||
simultaneously to prevent context-switching waste.
|
||||
|
||||
# 4. PR quality — minimum bar before review
|
||||
pr_quality:
|
||||
enabled: true
|
||||
require_linked_issue: true
|
||||
require_passing_ci: true
|
||||
max_files_changed: 30
|
||||
max_diff_lines: 2000
|
||||
require_description: true
|
||||
min_description_length: 50
|
||||
on_violation: block_merge
|
||||
description: >
|
||||
Every PR must link an issue, pass CI, have a meaningful
|
||||
description, and stay within scope. Giant PRs get rejected.
|
||||
|
||||
# 5. Grounding before generation — SOUL.md compliance
|
||||
grounding:
|
||||
enabled: true
|
||||
require_issue_read_before_branch: true
|
||||
require_existing_code_review: true
|
||||
require_soul_md_check: true
|
||||
soul_md_path: SOUL.md
|
||||
on_violation: block_and_notify
|
||||
description: >
|
||||
Before writing any code, the agent must demonstrate it has
|
||||
read the issue, reviewed relevant existing code, and checked
|
||||
SOUL.md for applicable doctrine. No speculative generation.
|
||||
|
||||
# 6. Completion integrity — no phantom completions
|
||||
completion_checks:
|
||||
enabled: true
|
||||
require_test_evidence: true
|
||||
require_ci_green: true
|
||||
require_diff_matches_issue: true
|
||||
require_no_unrelated_changes: true
|
||||
on_violation: revert_and_notify
|
||||
description: >
|
||||
Post-task gate verifies the work actually addresses the
|
||||
issue. Agents cannot close issues without evidence.
|
||||
Unrelated changes in a PR trigger automatic rejection.
|
||||
|
||||
# 7. Communication discipline — no noise
|
||||
communication:
|
||||
enabled: true
|
||||
max_comments_per_issue: 10
|
||||
require_structured_updates: true
|
||||
update_format: "status | what_changed | what_blocked | next_step"
|
||||
prohibit_empty_updates: true
|
||||
on_violation: warn
|
||||
description: >
|
||||
Issue comments must be structured and substantive.
|
||||
Status-only comments without content are rejected.
|
||||
Agents should update, not narrate.
|
||||
|
||||
# 8. Resource awareness — no runaway costs
|
||||
resource_limits:
|
||||
enabled: true
|
||||
max_api_calls_per_task: 100
|
||||
max_llm_tokens_per_task: 500000
|
||||
max_task_duration_minutes: 60
|
||||
on_violation: kill_and_notify
|
||||
description: >
|
||||
Hard limits on compute per task. If an agent hits these
|
||||
limits, the task is killed and flagged for human review.
|
||||
Prevents infinite loops and runaway API spending.
|
||||
|
||||
# ─── ESCALATION POLICY ───────────────────────────────────────────
|
||||
|
||||
escalation:
|
||||
channels:
|
||||
- gitea_issue_comment
|
||||
- discord_webhook
|
||||
severity_levels:
|
||||
warn:
|
||||
action: post_comment
|
||||
notify: agent_only
|
||||
block:
|
||||
action: prevent_action
|
||||
notify: agent_and_orchestrator
|
||||
block_and_notify:
|
||||
action: prevent_action
|
||||
notify: agent_orchestrator_and_timmy
|
||||
kill_and_notify:
|
||||
action: terminate_task
|
||||
notify: all_including_alexander
|
||||
revert_and_notify:
|
||||
action: revert_changes
|
||||
notify: agent_orchestrator_and_timmy
|
||||
|
||||
# ─── AUDIT TRAIL ─────────────────────────────────────────────────
|
||||
|
||||
audit:
|
||||
enabled: true
|
||||
log_path: logs/guardrail-violations.jsonl
|
||||
retention_days: 90
|
||||
fields:
|
||||
- timestamp
|
||||
- agent
|
||||
- constraint
|
||||
- violation_type
|
||||
- issue_number
|
||||
- action_taken
|
||||
- resolution
|
||||
|
||||
# ─── OVERRIDES ───────────────────────────────────────────────────
|
||||
|
||||
overrides:
|
||||
# Only Timmy or Alexander can override guardrails
|
||||
authorized_overriders:
|
||||
- Timmy
|
||||
- Alexander
|
||||
override_mechanism: >
|
||||
Post a comment on the issue with the format:
|
||||
GUARDRAIL_OVERRIDE: <constraint_name> REASON: <explanation>
|
||||
override_expiry_hours: 24
|
||||
require_post_override_review: true
|
||||
@@ -1,68 +1,52 @@
|
||||
name: issue-triager
|
||||
description: >
|
||||
Scores, labels, and prioritizes issues. Assigns to appropriate
|
||||
agents. Decomposes large issues into smaller ones.
|
||||
description: 'Scores, labels, and prioritizes issues. Assigns to appropriate agents. Decomposes large issues into smaller
|
||||
ones.
|
||||
|
||||
'
|
||||
model:
|
||||
preferred: claude-opus-4-6
|
||||
fallback: claude-sonnet-4-20250514
|
||||
preferred: kimi-k2.5
|
||||
fallback: google/gemini-2.5-pro
|
||||
max_turns: 20
|
||||
temperature: 0.3
|
||||
|
||||
tools:
|
||||
- terminal
|
||||
- search_files
|
||||
|
||||
- terminal
|
||||
- search_files
|
||||
trigger:
|
||||
schedule: every 15m
|
||||
manual: true
|
||||
|
||||
repos:
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
steps:
|
||||
- fetch_issues
|
||||
- score_issues
|
||||
- assign_agents
|
||||
- update_queue
|
||||
|
||||
- fetch_issues
|
||||
- score_issues
|
||||
- assign_agents
|
||||
- update_queue
|
||||
output: gitea_issue
|
||||
timeout_minutes: 10
|
||||
|
||||
system_prompt: |
|
||||
You are the issue triager for Timmy Foundation repos.
|
||||
|
||||
REPOS: {{repos}}
|
||||
|
||||
YOUR JOB:
|
||||
1. Fetch open unassigned issues
|
||||
2. Score each by: execution leverage, acceptance criteria quality, alignment with current doctrine, and how likely it is to create duplicate backlog churn
|
||||
3. Label appropriately: bug, refactor, feature, tests, security, docs, ops, governance, research
|
||||
4. Assign to agents based on the audited lane map:
|
||||
- Timmy: governing, sovereign, release, identity, repo-boundary, or architecture decisions that should stay under direct principal review
|
||||
- allegro: dispatch, routing, queue hygiene, Gitea bridge, operational tempo, and issues about how work gets moved through the system
|
||||
- perplexity: research triage, MCP/open-source evaluations, architecture memos, integration comparisons, and synthesis before implementation
|
||||
- ezra: RCA, operating history, memory consolidation, onboarding docs, and archival clean-up
|
||||
- KimiClaw: long-context reading, extraction, digestion, and codebase synthesis before a build phase
|
||||
- codex-agent: cleanup, migration verification, dead-code removal, repo-boundary enforcement, workflow hardening
|
||||
- groq: bounded implementation, tactical bug fixes, quick feature slices, small patches with clear acceptance criteria
|
||||
- manus: bounded support tasks, moderate-scope implementation, follow-through on already-scoped work
|
||||
- claude: hard refactors, broad multi-file implementation, test-heavy changes after the scope is made precise
|
||||
- gemini: frontier architecture, research-heavy prototypes, long-range design thinking when a concrete implementation owner is not yet obvious
|
||||
- grok: adversarial testing, unusual edge cases, provocative review angles that still need another pass
|
||||
5. Decompose any issue touching >5 files or crossing repo boundaries into smaller issues before assigning execution
|
||||
|
||||
RULES:
|
||||
- Prefer one owner per issue. Only add a second assignee when the work is explicitly collaborative.
|
||||
- Bugs, security fixes, and broken live workflows take priority over research and refactors.
|
||||
- If issue scope is unclear, ask for clarification before assigning an implementation agent.
|
||||
- Skip [epic], [meta], [governing], and [constitution] issues for automatic assignment unless they are explicitly routed to Timmy or allegro.
|
||||
- Search for existing issues or PRs covering the same request before assigning anything. If a likely duplicate exists, link it and do not create or route duplicate work.
|
||||
- Do not assign open-ended ideation to implementation agents.
|
||||
- Do not assign routine backlog maintenance to Timmy.
|
||||
- Do not assign wide speculative backlog generation to codex-agent, groq, manus, or claude.
|
||||
- Route archive/history/context-digestion work to ezra or KimiClaw before routing it to a builder.
|
||||
- Route “who should do this?” and “what is the next move?” questions to allegro.
|
||||
system_prompt: "You are the issue triager for Timmy Foundation repos.\n\nREPOS: {{repos}}\n\nYOUR JOB:\n1. Fetch open unassigned\
|
||||
\ issues\n2. Score each by: execution leverage, acceptance criteria quality, alignment with current doctrine, and how likely\
|
||||
\ it is to create duplicate backlog churn\n3. Label appropriately: bug, refactor, feature, tests, security, docs, ops, governance,\
|
||||
\ research\n4. Assign to agents based on the audited lane map:\n - Timmy: governing, sovereign, release, identity, repo-boundary,\
|
||||
\ or architecture decisions that should stay under direct principal review\n - allegro: dispatch, routing, queue hygiene,\
|
||||
\ Gitea bridge, operational tempo, and issues about how work gets moved through the system\n - perplexity: research triage,\
|
||||
\ MCP/open-source evaluations, architecture memos, integration comparisons, and synthesis before implementation\n - ezra:\
|
||||
\ RCA, operating history, memory consolidation, onboarding docs, and archival clean-up\n - KimiClaw: long-context reading,\
|
||||
\ extraction, digestion, and codebase synthesis before a build phase\n - codex-agent: cleanup, migration verification,\
|
||||
\ dead-code removal, repo-boundary enforcement, workflow hardening\n - groq: bounded implementation, tactical bug fixes,\
|
||||
\ quick feature slices, small patches with clear acceptance criteria\n - manus: bounded support tasks, moderate-scope\
|
||||
\ implementation, follow-through on already-scoped work\n - kimi: hard refactors, broad multi-file implementation, test-heavy\
|
||||
\ changes after the scope is made precise\n - gemini: frontier architecture, research-heavy prototypes, long-range design\
|
||||
\ thinking when a concrete implementation owner is not yet obvious\n - grok: adversarial testing, unusual edge cases,\
|
||||
\ provocative review angles that still need another pass\n5. Decompose any issue touching >5 files or crossing repo boundaries\
|
||||
\ into smaller issues before assigning execution\n\nRULES:\n- Prefer one owner per issue. Only add a second assignee when\
|
||||
\ the work is explicitly collaborative.\n- Bugs, security fixes, and broken live workflows take priority over research and\
|
||||
\ refactors.\n- If issue scope is unclear, ask for clarification before assigning an implementation agent.\n- Skip [epic],\
|
||||
\ [meta], [governing], and [constitution] issues for automatic assignment unless they are explicitly routed to Timmy or\
|
||||
\ allegro.\n- Search for existing issues or PRs covering the same request before assigning anything. If a likely duplicate\
|
||||
\ exists, link it and do not create or route duplicate work.\n- Do not assign open-ended ideation to implementation agents.\n\
|
||||
- Do not assign routine backlog maintenance to Timmy.\n- Do not assign wide speculative backlog generation to codex-agent,\
|
||||
\ groq, or manus.\n- Route archive/history/context-digestion work to ezra or KimiClaw before routing it to a builder.\n\
|
||||
- Route “who should do this?” and “what is the next move?” questions to allegro.\n"
|
||||
|
||||
@@ -1,89 +1,47 @@
|
||||
name: pr-reviewer
|
||||
description: >
|
||||
Reviews open PRs, checks CI status, merges passing ones,
|
||||
comments on problems. The merge bot replacement.
|
||||
description: 'Reviews open PRs, checks CI status, merges passing ones, comments on problems. The merge bot replacement.
|
||||
|
||||
'
|
||||
model:
|
||||
preferred: claude-opus-4-6
|
||||
fallback: claude-sonnet-4-20250514
|
||||
preferred: kimi-k2.5
|
||||
fallback: google/gemini-2.5-pro
|
||||
max_turns: 20
|
||||
temperature: 0.2
|
||||
|
||||
tools:
|
||||
- terminal
|
||||
- search_files
|
||||
|
||||
- terminal
|
||||
- search_files
|
||||
trigger:
|
||||
schedule: every 30m
|
||||
manual: true
|
||||
|
||||
repos:
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
steps:
|
||||
- fetch_prs
|
||||
- review_diffs
|
||||
- post_reviews
|
||||
- merge_passing
|
||||
|
||||
- fetch_prs
|
||||
- review_diffs
|
||||
- post_reviews
|
||||
- merge_passing
|
||||
output: report
|
||||
timeout_minutes: 10
|
||||
|
||||
system_prompt: |
|
||||
You are the PR reviewer for Timmy Foundation repos.
|
||||
|
||||
REPOS: {{repos}}
|
||||
|
||||
FOR EACH OPEN PR:
|
||||
1. Check CI status (Actions tab or commit status API)
|
||||
2. Read the linked issue or PR body to verify the intended scope before judging the diff
|
||||
3. Review the diff for:
|
||||
- Correctness: does it do what the issue asked?
|
||||
- Security: no secrets, unsafe execution paths, or permission drift
|
||||
- Tests and verification: does the author prove the change?
|
||||
- Scope: PR should match the issue, not scope-creep
|
||||
- Governance: does the change cross a boundary that should stay under Timmy review?
|
||||
- Workflow fit: does it reduce drift, duplication, or hidden operational risk?
|
||||
4. Post findings ordered by severity and cite the affected files or behavior clearly
|
||||
5. If CI fails or verification is missing: explain what is blocking merge
|
||||
6. If PR is behind main: request a rebase or re-run only when needed; do not force churn for cosmetic reasons
|
||||
7. If review is clean and the PR is low-risk: squash merge
|
||||
|
||||
LOW-RISK AUTO-MERGE ONLY IF ALL ARE TRUE:
|
||||
- PR is not a draft
|
||||
- CI is green or the repo has no CI configured
|
||||
- Diff matches the stated issue or PR scope
|
||||
- No unresolved review findings remain
|
||||
- Change is narrow, reversible, and non-governing
|
||||
- Paths changed do not include sensitive control surfaces
|
||||
|
||||
SENSITIVE CONTROL SURFACES:
|
||||
- SOUL.md
|
||||
- config.yaml
|
||||
- deploy.sh
|
||||
- tasks.py
|
||||
- playbooks/
|
||||
- cron/
|
||||
- memories/
|
||||
- skins/
|
||||
- training/
|
||||
- authentication, permissions, or secret-handling code
|
||||
- repo-boundary, model-routing, or deployment-governance changes
|
||||
|
||||
NEVER AUTO-MERGE:
|
||||
- PRs that change sensitive control surfaces
|
||||
- PRs that change more than 5 files unless the change is docs-only
|
||||
- PRs without a clear problem statement or verification
|
||||
- PRs that look like duplicate work, speculative research, or scope creep
|
||||
- PRs that need Timmy or Allegro judgment on architecture, dispatch, or release impact
|
||||
- PRs that are stale solely because of age; do not close them automatically
|
||||
|
||||
If a PR is stale, nudge with a comment and summarize what still blocks it. Do not close it just because 48 hours passed.
|
||||
|
||||
MERGE RULES:
|
||||
- ONLY squash merge. Never merge commits. Never rebase merge.
|
||||
- Delete branch after merge.
|
||||
- Empty PRs (0 changed files): close immediately with a brief explanation.
|
||||
system_prompt: "You are the PR reviewer for Timmy Foundation repos.\n\nREPOS: {{repos}}\n\nFOR EACH OPEN PR:\n1. Check CI\
|
||||
\ status (Actions tab or commit status API)\n2. Read the linked issue or PR body to verify the intended scope before judging\
|
||||
\ the diff\n3. Review the diff for:\n - Correctness: does it do what the issue asked?\n - Security: no secrets, unsafe\
|
||||
\ execution paths, or permission drift\n - Tests and verification: does the author prove the change?\n - Scope: PR should\
|
||||
\ match the issue, not scope-creep\n - Governance: does the change cross a boundary that should stay under Timmy review?\n\
|
||||
\ - Workflow fit: does it reduce drift, duplication, or hidden operational risk?\n4. Post findings ordered by severity\
|
||||
\ and cite the affected files or behavior clearly\n5. If CI fails or verification is missing: explain what is blocking merge\n\
|
||||
6. If PR is behind main: request a rebase or re-run only when needed; do not force churn for cosmetic reasons\n7. If review\
|
||||
\ is clean and the PR is low-risk: squash merge\n\nLOW-RISK AUTO-MERGE ONLY IF ALL ARE TRUE:\n- PR is not a draft\n- CI\
|
||||
\ is green or the repo has no CI configured\n- Diff matches the stated issue or PR scope\n- No unresolved review findings\
|
||||
\ remain\n- Change is narrow, reversible, and non-governing\n- Paths changed do not include sensitive control surfaces\n\
|
||||
\nSENSITIVE CONTROL SURFACES:\n- SOUL.md\n- config.yaml\n- deploy.sh\n- tasks.py\n- playbooks/\n- cron/\n- memories/\n-\
|
||||
\ skins/\n- training/\n- authentication, permissions, or secret-handling code\n- repo-boundary, model-routing, or deployment-governance\
|
||||
\ changes\n\nNEVER AUTO-MERGE:\n- PRs that change sensitive control surfaces\n- PRs that change more than 5 files unless\
|
||||
\ the change is docs-only\n- PRs without a clear problem statement or verification\n- PRs that look like duplicate work,\
|
||||
\ speculative research, or scope creep\n- PRs that need Timmy or Allegro judgment on architecture, dispatch, or release\
|
||||
\ impact\n- PRs that are stale solely because of age; do not close them automatically\n\nIf a PR is stale, nudge with a\
|
||||
\ comment and summarize what still blocks it. Do not close it just because 48 hours passed.\n\nMERGE RULES:\n- ONLY squash\
|
||||
\ merge. Never merge commits. Never rebase merge.\n- Delete branch after merge.\n- Empty PRs (0 changed files): close immediately\
|
||||
\ with a brief explanation.\n"
|
||||
|
||||
@@ -1,62 +1,75 @@
|
||||
name: refactor-specialist
|
||||
description: >
|
||||
Splits large modules, reduces complexity, improves code organization.
|
||||
Well-scoped: 1-3 files per task, clear acceptance criteria.
|
||||
description: 'Splits large modules, reduces complexity, improves code organization. Well-scoped: 1-3 files per task, clear
|
||||
acceptance criteria.
|
||||
|
||||
'
|
||||
model:
|
||||
preferred: claude-opus-4-6
|
||||
fallback: claude-sonnet-4-20250514
|
||||
preferred: kimi-k2.5
|
||||
fallback: google/gemini-2.5-pro
|
||||
max_turns: 30
|
||||
temperature: 0.3
|
||||
|
||||
tools:
|
||||
- terminal
|
||||
- file
|
||||
- search_files
|
||||
- patch
|
||||
|
||||
- terminal
|
||||
- file
|
||||
- search_files
|
||||
- patch
|
||||
trigger:
|
||||
issue_label: refactor
|
||||
manual: true
|
||||
|
||||
repos:
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
steps:
|
||||
- read_issue
|
||||
- clone_repo
|
||||
- create_branch
|
||||
- dispatch_agent
|
||||
- run_tests
|
||||
- create_pr
|
||||
- comment_on_issue
|
||||
|
||||
- read_issue
|
||||
- clone_repo
|
||||
- create_branch
|
||||
- dispatch_agent
|
||||
- run_tests
|
||||
- create_pr
|
||||
- comment_on_issue
|
||||
output: pull_request
|
||||
timeout_minutes: 15
|
||||
system_prompt: 'You are a refactoring specialist for the {{repo}} project.
|
||||
|
||||
system_prompt: |
|
||||
You are a refactoring specialist for the {{repo}} project.
|
||||
|
||||
YOUR ISSUE: #{{issue_number}} — {{issue_title}}
|
||||
|
||||
|
||||
RULES:
|
||||
|
||||
- Lines of code is a liability. Delete as much as you create.
|
||||
|
||||
- All changes go through PRs. No direct pushes to main.
|
||||
- Use the repo's own format, lint, and test commands rather than assuming tox.
|
||||
|
||||
- Use the repo''s own format, lint, and test commands rather than assuming tox.
|
||||
|
||||
- Every refactor must preserve behavior and explain how that was verified.
|
||||
|
||||
- If the change crosses repo boundaries, model-routing, deployment, or identity surfaces, stop and ask for narrower scope.
|
||||
|
||||
- Never use --no-verify on git commands.
|
||||
|
||||
- Conventional commits: refactor: <description> (#{{issue_number}})
|
||||
|
||||
- If tests fail after 2 attempts, STOP and comment on the issue.
|
||||
|
||||
- Refactors exist to simplify the system, not to create a new design detour.
|
||||
|
||||
|
||||
WORKFLOW:
|
||||
|
||||
1. Read the issue body for specific file paths and instructions
|
||||
|
||||
2. Understand the current code structure
|
||||
|
||||
3. Name the simplification goal before changing code
|
||||
|
||||
4. Make the refactoring changes
|
||||
|
||||
5. Run formatting and verification with repo-native commands
|
||||
|
||||
6. Commit, push, create PR with before/after risk summary
|
||||
|
||||
'
|
||||
|
||||
@@ -1,63 +1,38 @@
|
||||
name: security-auditor
|
||||
description: >
|
||||
Scans code for security vulnerabilities, hardcoded secrets,
|
||||
dependency issues. Files findings as Gitea issues.
|
||||
description: 'Scans code for security vulnerabilities, hardcoded secrets, dependency issues. Files findings as Gitea issues.
|
||||
|
||||
'
|
||||
model:
|
||||
preferred: claude-opus-4-6
|
||||
fallback: claude-opus-4-6
|
||||
preferred: kimi-k2.5
|
||||
fallback: kimi-k2.5
|
||||
max_turns: 40
|
||||
temperature: 0.2
|
||||
|
||||
tools:
|
||||
- terminal
|
||||
- file
|
||||
- search_files
|
||||
|
||||
- terminal
|
||||
- file
|
||||
- search_files
|
||||
trigger:
|
||||
schedule: weekly
|
||||
pr_merged_with_lines: 100
|
||||
manual: true
|
||||
|
||||
repos:
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
steps:
|
||||
- clone_repo
|
||||
- run_audit
|
||||
- file_issues
|
||||
|
||||
- clone_repo
|
||||
- run_audit
|
||||
- file_issues
|
||||
output: gitea_issue
|
||||
timeout_minutes: 20
|
||||
|
||||
system_prompt: |
|
||||
You are a security auditor for the Timmy Foundation codebase.
|
||||
Your job is to FIND vulnerabilities, not write code.
|
||||
|
||||
TARGET REPO: {{repo}}
|
||||
|
||||
SCAN FOR:
|
||||
1. Hardcoded secrets, API keys, tokens in source code
|
||||
2. SQL injection vulnerabilities
|
||||
3. Command injection via unsanitized input
|
||||
4. Path traversal in file operations
|
||||
5. Insecure HTTP calls (should be HTTPS where possible)
|
||||
6. Dependencies with known CVEs (check requirements.txt/package.json)
|
||||
7. Missing input validation
|
||||
8. Overly permissive file permissions
|
||||
9. Privilege drift in deploy, orchestration, memory, cron, and playbook surfaces
|
||||
10. Places where private data or local-only artifacts could leak into tracked repos
|
||||
|
||||
OUTPUT FORMAT:
|
||||
For each finding, file a Gitea issue with:
|
||||
Title: [security] <severity>: <description>
|
||||
Body: file + line, description, why it matters, recommended fix
|
||||
Label: security
|
||||
|
||||
SEVERITY: critical / high / medium / low
|
||||
Only file issues for real findings. No false positives.
|
||||
Do not open duplicate issues for already-known findings; link the existing issue instead.
|
||||
If a finding affects sovereignty boundaries or private-data handling, flag it clearly as such.
|
||||
system_prompt: "You are a security auditor for the Timmy Foundation codebase.\nYour job is to FIND vulnerabilities, not write\
|
||||
\ code.\n\nTARGET REPO: {{repo}}\n\nSCAN FOR:\n1. Hardcoded secrets, API keys, tokens in source code\n2. SQL injection vulnerabilities\n\
|
||||
3. Command injection via unsanitized input\n4. Path traversal in file operations\n5. Insecure HTTP calls (should be HTTPS\
|
||||
\ where possible)\n6. Dependencies with known CVEs (check requirements.txt/package.json)\n7. Missing input validation\n\
|
||||
8. Overly permissive file permissions\n9. Privilege drift in deploy, orchestration, memory, cron, and playbook surfaces\n\
|
||||
10. Places where private data or local-only artifacts could leak into tracked repos\n\nOUTPUT FORMAT:\nFor each finding,\
|
||||
\ file a Gitea issue with:\n Title: [security] <severity>: <description>\n Body: file + line, description, why it matters,\
|
||||
\ recommended fix\n Label: security\n\nSEVERITY: critical / high / medium / low\nOnly file issues for real findings. No\
|
||||
\ false positives.\nDo not open duplicate issues for already-known findings; link the existing issue instead.\nIf a finding\
|
||||
\ affects sovereignty boundaries or private-data handling, flag it clearly as such.\n"
|
||||
|
||||
@@ -1,58 +1,66 @@
|
||||
name: test-writer
|
||||
description: >
|
||||
Adds test coverage for untested modules. Finds coverage gaps,
|
||||
writes meaningful tests, verifies they pass.
|
||||
description: 'Adds test coverage for untested modules. Finds coverage gaps, writes meaningful tests, verifies they pass.
|
||||
|
||||
'
|
||||
model:
|
||||
preferred: claude-opus-4-6
|
||||
fallback: claude-sonnet-4-20250514
|
||||
preferred: kimi-k2.5
|
||||
fallback: google/gemini-2.5-pro
|
||||
max_turns: 30
|
||||
temperature: 0.3
|
||||
|
||||
tools:
|
||||
- terminal
|
||||
- file
|
||||
- search_files
|
||||
- patch
|
||||
|
||||
- terminal
|
||||
- file
|
||||
- search_files
|
||||
- patch
|
||||
trigger:
|
||||
issue_label: tests
|
||||
manual: true
|
||||
|
||||
repos:
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
|
||||
- Timmy_Foundation/the-nexus
|
||||
- Timmy_Foundation/timmy-home
|
||||
- Timmy_Foundation/timmy-config
|
||||
- Timmy_Foundation/hermes-agent
|
||||
steps:
|
||||
- read_issue
|
||||
- clone_repo
|
||||
- create_branch
|
||||
- dispatch_agent
|
||||
- run_tests
|
||||
- create_pr
|
||||
- comment_on_issue
|
||||
|
||||
- read_issue
|
||||
- clone_repo
|
||||
- create_branch
|
||||
- dispatch_agent
|
||||
- run_tests
|
||||
- create_pr
|
||||
- comment_on_issue
|
||||
output: pull_request
|
||||
timeout_minutes: 15
|
||||
system_prompt: 'You are a test engineer for the {{repo}} project.
|
||||
|
||||
system_prompt: |
|
||||
You are a test engineer for the {{repo}} project.
|
||||
|
||||
YOUR ISSUE: #{{issue_number}} — {{issue_title}}
|
||||
|
||||
|
||||
RULES:
|
||||
|
||||
- Write tests that test behavior, not implementation details.
|
||||
- Use the repo's own test entrypoints; do not assume tox exists.
|
||||
|
||||
- Use the repo''s own test entrypoints; do not assume tox exists.
|
||||
|
||||
- Tests must be deterministic. No flaky tests.
|
||||
|
||||
- Conventional commits: test: <description> (#{{issue_number}})
|
||||
|
||||
- If the module is hard to test, explain the design obstacle and propose the smallest next step.
|
||||
|
||||
- Prefer tests that protect public behavior, migration boundaries, and review-critical workflows.
|
||||
|
||||
|
||||
WORKFLOW:
|
||||
|
||||
1. Read the issue for target module paths
|
||||
|
||||
2. Read the existing code to understand behavior
|
||||
|
||||
3. Write focused unit tests
|
||||
|
||||
4. Run the relevant verification commands — all related tests must pass
|
||||
|
||||
5. Commit, push, create PR with verification summary and coverage rationale
|
||||
|
||||
'
|
||||
|
||||
@@ -1,47 +1,55 @@
|
||||
name: verified-logic
|
||||
description: >
|
||||
Crucible-first playbook for tasks that require proof instead of plausible prose.
|
||||
Use Z3-backed sidecar tools for scheduling, dependency ordering, capacity checks,
|
||||
and consistency verification.
|
||||
description: 'Crucible-first playbook for tasks that require proof instead of plausible prose. Use Z3-backed sidecar tools
|
||||
for scheduling, dependency ordering, capacity checks, and consistency verification.
|
||||
|
||||
'
|
||||
model:
|
||||
preferred: claude-opus-4-6
|
||||
fallback: claude-sonnet-4-20250514
|
||||
preferred: kimi-k2.5
|
||||
fallback: google/gemini-2.5-pro
|
||||
max_turns: 12
|
||||
temperature: 0.1
|
||||
|
||||
tools:
|
||||
- mcp_crucible_schedule_tasks
|
||||
- mcp_crucible_order_dependencies
|
||||
- mcp_crucible_capacity_fit
|
||||
|
||||
- mcp_crucible_schedule_tasks
|
||||
- mcp_crucible_order_dependencies
|
||||
- mcp_crucible_capacity_fit
|
||||
trigger:
|
||||
manual: true
|
||||
|
||||
steps:
|
||||
- classify_problem
|
||||
- choose_template
|
||||
- translate_into_constraints
|
||||
- verify_with_crucible
|
||||
- report_sat_unsat_with_witness
|
||||
|
||||
- classify_problem
|
||||
- choose_template
|
||||
- translate_into_constraints
|
||||
- verify_with_crucible
|
||||
- report_sat_unsat_with_witness
|
||||
output: verified_result
|
||||
timeout_minutes: 5
|
||||
system_prompt: 'You are running the Crucible playbook.
|
||||
|
||||
system_prompt: |
|
||||
You are running the Crucible playbook.
|
||||
|
||||
Use this playbook for:
|
||||
|
||||
- scheduling and deadline feasibility
|
||||
|
||||
- dependency ordering and cycle checks
|
||||
|
||||
- capacity / resource allocation constraints
|
||||
|
||||
- consistency checks where a contradiction matters
|
||||
|
||||
|
||||
RULES:
|
||||
|
||||
1. Do not bluff through logic.
|
||||
|
||||
2. Pick the narrowest Crucible template that fits the task.
|
||||
3. Translate the user's question into structured constraints.
|
||||
|
||||
3. Translate the user''s question into structured constraints.
|
||||
|
||||
4. Call the Crucible tool.
|
||||
|
||||
5. If SAT, report the witness model clearly.
|
||||
|
||||
6. If UNSAT, say the constraints are impossible and explain which shape of constraint caused the contradiction.
|
||||
|
||||
7. If the task is not a good fit for these templates, say so plainly instead of pretending it was verified.
|
||||
|
||||
'
|
||||
|
||||
@@ -1,33 +1,85 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Architecture Linter — Ensuring alignment with the Frontier Local Agenda.
|
||||
|
||||
Anthropic is BANNED. Not deprecated, not discouraged — banned.
|
||||
Any reference to Anthropic as a provider, model, or API endpoint
|
||||
in active configs is a hard failure.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import re
|
||||
|
||||
# Architecture Linter
|
||||
# Ensuring all changes align with the Frontier Local Agenda.
|
||||
|
||||
SOVEREIGN_RULES = [
|
||||
(r"https?://(api\.openai\.com|api\.anthropic\.com)", "CRITICAL: External cloud API detected. Use local custom_provider instead."),
|
||||
(r"provider: (openai|anthropic)", "WARNING: Direct cloud provider used. Ensure fallback_model is configured."),
|
||||
(r"api_key:\s*['\"][A-Za-z0-9_\-]{16,}['\"]", "SECURITY: Hardcoded API key detected. Use environment variables.")
|
||||
# BANNED — hard failures
|
||||
(r"provider:\s*anthropic", "BANNED: Anthropic provider reference. Anthropic is permanently banned from this system."),
|
||||
(r"anthropic/claude", "BANNED: Anthropic model reference (anthropic/claude-*). Use kimi-k2.5 or google/gemini-2.5-pro."),
|
||||
(r"api\.anthropic\.com", "BANNED: Direct Anthropic API endpoint. Anthropic is permanently banned."),
|
||||
(r"ANTHROPIC_API_KEY", "BANNED: Anthropic API key reference. Remove all Anthropic credentials."),
|
||||
(r"ANTHROPIC_TOKEN", "BANNED: Anthropic token reference. Remove all Anthropic credentials."),
|
||||
(r"sk-ant-", "BANNED: Anthropic API key literal (sk-ant-*). Remove immediately."),
|
||||
(r"claude-opus", "BANNED: Claude Opus model reference. Use kimi-k2.5."),
|
||||
(r"claude-sonnet", "BANNED: Claude Sonnet model reference. Use kimi-k2.5."),
|
||||
(r"claude-haiku", "BANNED: Claude Haiku model reference. Use google/gemini-2.5-pro."),
|
||||
|
||||
# Existing sovereignty rules
|
||||
(r"https?://api\.openai\.com", "WARNING: Direct OpenAI API endpoint. Use local custom_provider instead."),
|
||||
(r"provider:\s*openai", "WARNING: Direct OpenAI provider. Ensure fallback_model is configured."),
|
||||
(r"api_key: ['\"][^'\"\s]{10,}['\"]", "SECURITY: Hardcoded API key detected. Use environment variables."),
|
||||
]
|
||||
|
||||
def lint_file(path):
|
||||
# Files to skip (training data, historical docs, changelogs, tests that validate the ban)
|
||||
SKIP_PATTERNS = [
|
||||
"training/", "evaluations/", "RELEASE_v", "PERFORMANCE_",
|
||||
"scores.json", "docs/design-log/", "FALSEWORK.md",
|
||||
"test_sovereignty_enforcement.py", "test_metrics_helpers.py",
|
||||
"metrics_helpers.py", # historical cost data
|
||||
]
|
||||
|
||||
|
||||
def should_skip(path: str) -> bool:
|
||||
return any(skip in path for skip in SKIP_PATTERNS)
|
||||
|
||||
|
||||
def lint_file(path: str) -> int:
|
||||
if should_skip(path):
|
||||
return 0
|
||||
print(f"Linting {path}...")
|
||||
content = open(path).read()
|
||||
violations = 0
|
||||
for pattern, msg in SOVEREIGN_RULES:
|
||||
if re.search(pattern, content):
|
||||
matches = list(re.finditer(pattern, content, re.IGNORECASE))
|
||||
if matches:
|
||||
print(f" [!] {msg}")
|
||||
for m in matches[:3]: # Show up to 3 locations
|
||||
line_no = content[:m.start()].count('\n') + 1
|
||||
print(f" Line {line_no}: ...{content[max(0,m.start()-20):m.end()+20].strip()}...")
|
||||
violations += 1
|
||||
return violations
|
||||
|
||||
|
||||
def main():
|
||||
print("--- Ezra's Architecture Linter ---")
|
||||
print("--- Architecture Linter (Anthropic BANNED) ---")
|
||||
files = [f for f in sys.argv[1:] if os.path.isfile(f)]
|
||||
if not files:
|
||||
# If no args, scan all yaml/py/sh/json in the repo
|
||||
for root, _, filenames in os.walk("."):
|
||||
for fn in filenames:
|
||||
if fn.endswith((".yaml", ".yml", ".py", ".sh", ".json", ".md")):
|
||||
path = os.path.join(root, fn)
|
||||
if not should_skip(path) and ".git" not in path:
|
||||
files.append(path)
|
||||
|
||||
total_violations = sum(lint_file(f) for f in files)
|
||||
banned = sum(1 for f in files for p, m in SOVEREIGN_RULES
|
||||
if "BANNED" in m and re.search(p, open(f).read(), re.IGNORECASE)
|
||||
and not should_skip(f))
|
||||
|
||||
print(f"\nLinting complete. Total violations: {total_violations}")
|
||||
if banned > 0:
|
||||
print(f"\n🚫 {banned} BANNED provider violation(s) detected. Anthropic is permanently banned.")
|
||||
sys.exit(1 if total_violations > 0 else 0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
@@ -5,233 +5,122 @@ Part of the Gemini Sovereign Governance System.
|
||||
|
||||
Enforces architectural boundaries, security, and documentation standards
|
||||
across the Timmy Foundation fleet.
|
||||
|
||||
Refs: #437 — repo-aware, test-backed, CI-enforced.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
|
||||
# --- CONFIGURATION ---
|
||||
|
||||
SOVEREIGN_KEYWORDS = ["mempalace", "sovereign_store", "tirith", "bezalel", "nexus"]
|
||||
|
||||
# IP addresses (skip 127.0.0.1, 0.0.0.0, 10.x.x.x, 172.16-31.x.x, 192.168.x.x)
|
||||
IP_REGEX = r'\b(?!(?:127|10|192\.168|172\.(?:1[6-9]|2\d|3[01]))\.)' \
|
||||
r'(?:\d{1,3}\.){3}\d{1,3}\b'
|
||||
|
||||
# API key / secret patterns — catches openai-, sk-, anthropic-, AKIA, etc.
|
||||
API_KEY_PATTERNS = [
|
||||
r'sk-[A-Za-z0-9]{20,}', # OpenAI-style
|
||||
r'sk-ant-[A-Za-z0-9\-]{20,}', # Anthropic
|
||||
r'AKIA[A-Z0-9]{16}', # AWS access key
|
||||
r'ghp_[A-Za-z0-9]{36}', # GitHub PAT
|
||||
r'glpat-[A-Za-z0-9\-]{20,}', # GitLab PAT
|
||||
r'(?:api[_-]?key|secret|token)\s*[:=]\s*["\'][A-Za-z0-9_\-]{16,}["\']',
|
||||
]
|
||||
|
||||
# Sovereignty rules (carried from v1)
|
||||
SOVEREIGN_RULES = [
|
||||
(r'https?://api\.openai\.com', 'External cloud API: api.openai.com. Use local custom_provider.'),
|
||||
(r'https?://api\.anthropic\.com', 'External cloud API: api.anthropic.com. Use local custom_provider.'),
|
||||
(r'provider:\s*(?:openai|anthropic)\b', 'Direct cloud provider. Ensure fallback_model is configured.'),
|
||||
]
|
||||
|
||||
# File extensions to scan
|
||||
SCAN_EXTENSIONS = {'.py', '.ts', '.tsx', '.js', '.yaml', '.yml', '.json', '.env', '.sh', '.cfg', '.toml'}
|
||||
SKIP_DIRS = {'.git', 'node_modules', '__pycache__', '.venv', 'venv', '.tox', '.eggs'}
|
||||
|
||||
|
||||
class LinterResult:
|
||||
"""Structured result container for programmatic access."""
|
||||
|
||||
def __init__(self, repo_path: str, repo_name: str):
|
||||
self.repo_path = repo_path
|
||||
self.repo_name = repo_name
|
||||
self.errors: list[str] = []
|
||||
self.warnings: list[str] = []
|
||||
|
||||
@property
|
||||
def passed(self) -> bool:
|
||||
return len(self.errors) == 0
|
||||
|
||||
@property
|
||||
def violation_count(self) -> int:
|
||||
return len(self.errors)
|
||||
|
||||
def summary(self) -> str:
|
||||
lines = [f"--- Architecture Linter v2: {self.repo_name} ---"]
|
||||
for w in self.warnings:
|
||||
lines.append(f" [W] {w}")
|
||||
for e in self.errors:
|
||||
lines.append(f" [E] {e}")
|
||||
status = "PASSED" if self.passed else f"FAILED ({self.violation_count} violations)"
|
||||
lines.append(f"\nResult: {status}")
|
||||
return '\n'.join(lines)
|
||||
|
||||
IP_REGEX = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'
|
||||
API_KEY_REGEX = r'(?:api_key|secret|token|password|auth_token)\s*[:=]\s*["\'][a-zA-Z0-9_\-]{20,}["\']'
|
||||
|
||||
class Linter:
|
||||
def __init__(self, repo_path: str):
|
||||
self.repo_path = Path(repo_path).resolve()
|
||||
if not self.repo_path.is_dir():
|
||||
raise FileNotFoundError(f"Repository path does not exist: {self.repo_path}")
|
||||
self.repo_name = self.repo_path.name
|
||||
self.result = LinterResult(str(self.repo_path), self.repo_name)
|
||||
self.errors = []
|
||||
|
||||
# --- helpers ---
|
||||
|
||||
def _scan_files(self, extensions=None):
|
||||
"""Yield (Path, content) for files matching *extensions*."""
|
||||
exts = extensions or SCAN_EXTENSIONS
|
||||
for root, dirs, files in os.walk(self.repo_path):
|
||||
dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
|
||||
for fname in files:
|
||||
if Path(fname).suffix in exts:
|
||||
if fname == '.env.example':
|
||||
continue
|
||||
fpath = Path(root) / fname
|
||||
try:
|
||||
content = fpath.read_text(errors='ignore')
|
||||
except Exception:
|
||||
continue
|
||||
yield fpath, content
|
||||
|
||||
def _line_no(self, content: str, offset: int) -> int:
|
||||
return content.count('\n', 0, offset) + 1
|
||||
|
||||
# --- checks ---
|
||||
def log_error(self, message: str, file: str = None, line: int = None):
|
||||
loc = f"{file}:{line}" if file and line else (file if file else "General")
|
||||
self.errors.append(f"[{loc}] {message}")
|
||||
|
||||
def check_sidecar_boundary(self):
|
||||
"""No sovereign code in hermes-agent (sidecar boundary)."""
|
||||
if self.repo_name != 'hermes-agent':
|
||||
return
|
||||
for fpath, content in self._scan_files():
|
||||
for kw in SOVEREIGN_KEYWORDS:
|
||||
if kw in content.lower():
|
||||
rel = str(fpath.relative_to(self.repo_path))
|
||||
self.result.errors.append(
|
||||
f"Sovereign keyword '{kw}' in hermes-agent violates sidecar boundary. [{rel}]"
|
||||
)
|
||||
"""Rule 1: No sovereign code in hermes-agent (sidecar boundary)"""
|
||||
if self.repo_name == "hermes-agent":
|
||||
for root, _, files in os.walk(self.repo_path):
|
||||
if "node_modules" in root or ".git" in root:
|
||||
continue
|
||||
for file in files:
|
||||
if file.endswith((".py", ".ts", ".js", ".tsx")):
|
||||
path = Path(root) / file
|
||||
content = path.read_text(errors="ignore")
|
||||
for kw in SOVEREIGN_KEYWORDS:
|
||||
if kw in content.lower():
|
||||
# Exception: imports or comments might be okay, but we're strict for now
|
||||
self.log_error(f"Sovereign keyword '{kw}' found in hermes-agent. Violates sidecar boundary.", str(path.relative_to(self.repo_path)))
|
||||
|
||||
def check_hardcoded_ips(self):
|
||||
"""No hardcoded public IPs — use DNS or env vars."""
|
||||
for fpath, content in self._scan_files():
|
||||
for m in re.finditer(IP_REGEX, content):
|
||||
ip = m.group()
|
||||
# skip private ranges already handled by lookahead, and 0.0.0.0
|
||||
if ip.startswith('0.'):
|
||||
continue
|
||||
line = self._line_no(content, m.start())
|
||||
rel = str(fpath.relative_to(self.repo_path))
|
||||
self.result.errors.append(
|
||||
f"Hardcoded IP '{ip}'. Use DNS or env vars. [{rel}:{line}]"
|
||||
)
|
||||
"""Rule 2: No hardcoded IPs (use domain names)"""
|
||||
for root, _, files in os.walk(self.repo_path):
|
||||
if "node_modules" in root or ".git" in root:
|
||||
continue
|
||||
for file in files:
|
||||
if file.endswith((".py", ".ts", ".js", ".tsx", ".yaml", ".yml", ".json")):
|
||||
path = Path(root) / file
|
||||
content = path.read_text(errors="ignore")
|
||||
matches = re.finditer(IP_REGEX, content)
|
||||
for match in matches:
|
||||
ip = match.group()
|
||||
if ip in ["127.0.0.1", "0.0.0.0"]:
|
||||
continue
|
||||
line_no = content.count('\n', 0, match.start()) + 1
|
||||
self.log_error(f"Hardcoded IP address '{ip}' found. Use domain names or environment variables.", str(path.relative_to(self.repo_path)), line_no)
|
||||
|
||||
def check_api_keys(self):
|
||||
"""No cloud API keys / secrets committed."""
|
||||
for fpath, content in self._scan_files():
|
||||
for pattern in API_KEY_PATTERNS:
|
||||
for m in re.finditer(pattern, content, re.IGNORECASE):
|
||||
line = self._line_no(content, m.start())
|
||||
rel = str(fpath.relative_to(self.repo_path))
|
||||
self.result.errors.append(
|
||||
f"Potential secret / API key detected. [{rel}:{line}]"
|
||||
)
|
||||
|
||||
def check_sovereignty_rules(self):
|
||||
"""V1 sovereignty rules: no direct cloud API endpoints or providers."""
|
||||
for fpath, content in self._scan_files({'.py', '.ts', '.tsx', '.js', '.yaml', '.yml'}):
|
||||
for pattern, msg in SOVEREIGN_RULES:
|
||||
for m in re.finditer(pattern, content):
|
||||
line = self._line_no(content, m.start())
|
||||
rel = str(fpath.relative_to(self.repo_path))
|
||||
self.result.errors.append(f"{msg} [{rel}:{line}]")
|
||||
"""Rule 3: No cloud API keys committed to repos"""
|
||||
for root, _, files in os.walk(self.repo_path):
|
||||
if "node_modules" in root or ".git" in root:
|
||||
continue
|
||||
for file in files:
|
||||
if file.endswith((".py", ".ts", ".js", ".tsx", ".yaml", ".yml", ".json", ".env")):
|
||||
if file == ".env.example":
|
||||
continue
|
||||
path = Path(root) / file
|
||||
content = path.read_text(errors="ignore")
|
||||
matches = re.finditer(API_KEY_REGEX, content, re.IGNORECASE)
|
||||
for match in matches:
|
||||
line_no = content.count('\n', 0, match.start()) + 1
|
||||
self.log_error("Potential API key or secret found in code.", str(path.relative_to(self.repo_path)), line_no)
|
||||
|
||||
def check_soul_canonical(self):
|
||||
"""SOUL.md must exist exactly in timmy-config root."""
|
||||
soul_path = self.repo_path / 'SOUL.md'
|
||||
if self.repo_name == 'timmy-config':
|
||||
"""Rule 4: SOUL.md exists and is canonical in exactly one location"""
|
||||
soul_path = self.repo_path / "SOUL.md"
|
||||
if self.repo_name == "timmy-config":
|
||||
if not soul_path.exists():
|
||||
self.result.errors.append(
|
||||
'SOUL.md missing from canonical location (timmy-config root).'
|
||||
)
|
||||
self.log_error("SOUL.md is missing from the canonical location (timmy-config root).")
|
||||
else:
|
||||
if soul_path.exists():
|
||||
self.result.errors.append(
|
||||
'SOUL.md found in non-canonical repo. Must live only in timmy-config.'
|
||||
)
|
||||
self.log_error("SOUL.md found in non-canonical repo. It should only live in timmy-config.")
|
||||
|
||||
def check_readme(self):
|
||||
"""Every repo must have a substantive README."""
|
||||
readme = self.repo_path / 'README.md'
|
||||
if not readme.exists():
|
||||
self.result.errors.append('README.md is missing.')
|
||||
"""Rule 5: Every repo has a README with current truth"""
|
||||
readme_path = self.repo_path / "README.md"
|
||||
if not readme_path.exists():
|
||||
self.log_error("README.md is missing.")
|
||||
else:
|
||||
content = readme.read_text(errors='ignore')
|
||||
content = readme_path.read_text(errors="ignore")
|
||||
if len(content.strip()) < 50:
|
||||
self.result.warnings.append(
|
||||
'README.md is very short (<50 chars). Provide current truth about the repo.'
|
||||
)
|
||||
self.log_error("README.md is too short or empty. Provide current truth about the repo.")
|
||||
|
||||
# --- runner ---
|
||||
|
||||
def run(self) -> LinterResult:
|
||||
"""Execute all checks and return the result."""
|
||||
def run(self):
|
||||
print(f"--- Gemini Linter: Auditing {self.repo_name} ---")
|
||||
self.check_sidecar_boundary()
|
||||
self.check_hardcoded_ips()
|
||||
self.check_api_keys()
|
||||
self.check_sovereignty_rules()
|
||||
self.check_soul_canonical()
|
||||
self.check_readme()
|
||||
return self.result
|
||||
|
||||
if self.errors:
|
||||
print(f"\n[FAILURE] Found {len(self.errors)} architectural violations:")
|
||||
for err in self.errors:
|
||||
print(f" - {err}")
|
||||
return False
|
||||
else:
|
||||
print("\n[SUCCESS] Architecture is sound. Sovereignty maintained.")
|
||||
return True
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Gemini Architecture Linter v2 — repo-aware sovereignty gate.'
|
||||
)
|
||||
parser.add_argument(
|
||||
'repo_path', nargs='?', default='.',
|
||||
help='Path to the repository to lint (default: cwd).',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--repo', dest='repo_flag', default=None,
|
||||
help='Explicit repo path (alias for positional arg).',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--json', dest='json_output', action='store_true',
|
||||
help='Emit machine-readable JSON instead of human text.',
|
||||
)
|
||||
parser = argparse.ArgumentParser(description="Gemini Architecture Linter v2")
|
||||
parser.add_argument("repo_path", nargs="?", default=".", help="Path to the repository to lint")
|
||||
args = parser.parse_args()
|
||||
|
||||
path = args.repo_flag if args.repo_flag else args.repo_path
|
||||
linter = Linter(args.repo_path)
|
||||
success = linter.run()
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
try:
|
||||
linter = Linter(path)
|
||||
except FileNotFoundError as exc:
|
||||
print(f"ERROR: {exc}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
result = linter.run()
|
||||
|
||||
if args.json_output:
|
||||
import json as _json
|
||||
out = {
|
||||
'repo': result.repo_name,
|
||||
'passed': result.passed,
|
||||
'violation_count': result.violation_count,
|
||||
'errors': result.errors,
|
||||
'warnings': result.warnings,
|
||||
}
|
||||
print(_json.dumps(out, indent=2))
|
||||
else:
|
||||
print(result.summary())
|
||||
|
||||
sys.exit(0 if result.passed else 1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
@@ -1,306 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
config_validator.py — Validate all YAML/JSON config files in timmy-config.
|
||||
|
||||
Checks:
|
||||
1. YAML syntax (pyyaml safe_load)
|
||||
2. JSON syntax (json.loads)
|
||||
3. Duplicate keys in YAML/JSON
|
||||
4. Trailing whitespace in YAML
|
||||
5. Tabs in YAML (should use spaces)
|
||||
6. Cron expression validity (if present)
|
||||
|
||||
Exit 0 if all valid, 1 if any invalid.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import yaml
|
||||
except ImportError:
|
||||
print("ERROR: PyYAML not installed. Run: pip install pyyaml")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
# ── Cron validation ──────────────────────────────────────────────────────────
|
||||
|
||||
DOW_NAMES = {"sun", "mon", "tue", "wed", "thu", "fri", "sat"}
|
||||
MONTH_NAMES = {"jan", "feb", "mar", "apr", "may", "jun",
|
||||
"jul", "aug", "sep", "oct", "nov", "dec"}
|
||||
|
||||
|
||||
def _expand_cron_field(field: str, lo: int, hi: int, names: dict | None = None) -> set[int]:
|
||||
"""Expand a single cron field into a set of valid integers."""
|
||||
result: set[int] = set()
|
||||
for part in field.split(","):
|
||||
# Handle step: */N or 1-5/N
|
||||
step = 1
|
||||
if "/" in part:
|
||||
part, step_str = part.split("/", 1)
|
||||
if not step_str.isdigit() or int(step_str) < 1:
|
||||
raise ValueError(f"invalid step value: {step_str}")
|
||||
step = int(step_str)
|
||||
|
||||
if part == "*":
|
||||
rng = range(lo, hi + 1, step)
|
||||
elif "-" in part:
|
||||
a, b = part.split("-", 1)
|
||||
a = _resolve_name(a, names, lo, hi)
|
||||
b = _resolve_name(b, names, lo, hi)
|
||||
if a > b:
|
||||
raise ValueError(f"range {a}-{b} is reversed")
|
||||
rng = range(a, b + 1, step)
|
||||
else:
|
||||
val = _resolve_name(part, names, lo, hi)
|
||||
rng = range(val, val + 1)
|
||||
|
||||
for v in rng:
|
||||
if v < lo or v > hi:
|
||||
raise ValueError(f"value {v} out of range [{lo}-{hi}]")
|
||||
result.add(v)
|
||||
return result
|
||||
|
||||
|
||||
def _resolve_name(token: str, names: dict | None, lo: int, hi: int) -> int:
|
||||
if names and token.lower() in names:
|
||||
return names[token.lower()]
|
||||
if not token.isdigit():
|
||||
raise ValueError(f"unrecognized token: {token}")
|
||||
val = int(token)
|
||||
if val < lo or val > hi:
|
||||
raise ValueError(f"value {val} out of range [{lo}-{hi}]")
|
||||
return val
|
||||
|
||||
|
||||
def validate_cron(expr: str) -> list[str]:
|
||||
"""Validate a 5-field cron expression. Returns list of errors (empty = ok)."""
|
||||
errors: list[str] = []
|
||||
fields = expr.strip().split()
|
||||
if len(fields) != 5:
|
||||
return [f"expected 5 fields, got {len(fields)}"]
|
||||
|
||||
specs = [
|
||||
(fields[0], 0, 59, None, "minute"),
|
||||
(fields[1], 0, 23, None, "hour"),
|
||||
(fields[2], 1, 31, None, "day-of-month"),
|
||||
(fields[3], 1, 12, MONTH_NAMES, "month"),
|
||||
(fields[4], 0, 7, DOW_NAMES, "day-of-week"),
|
||||
]
|
||||
for field, lo, hi, names, label in specs:
|
||||
try:
|
||||
_expand_cron_field(field, lo, hi, names)
|
||||
except ValueError as e:
|
||||
errors.append(f"{label}: {e}")
|
||||
return errors
|
||||
|
||||
|
||||
# ── Duplicate key detection ──────────────────────────────────────────────────
|
||||
|
||||
class DuplicateKeyError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
class _StrictYAMLLoader(yaml.SafeLoader):
|
||||
"""YAML loader that rejects duplicate keys."""
|
||||
pass
|
||||
|
||||
|
||||
def _no_duplicates_constructor(loader, node, deep=False):
|
||||
mapping = {}
|
||||
for key_node, value_node in node.value:
|
||||
key = loader.construct_object(key_node, deep=deep)
|
||||
if key in mapping:
|
||||
raise DuplicateKeyError(
|
||||
f"duplicate key '{key}' (line {key_node.start_mark.line + 1})"
|
||||
)
|
||||
mapping[key] = loader.construct_object(value_node, deep=deep)
|
||||
return mapping
|
||||
|
||||
|
||||
_StrictYAMLLoader.add_constructor(
|
||||
yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
|
||||
_no_duplicates_constructor,
|
||||
)
|
||||
|
||||
|
||||
def _json_has_duplicates(text: str) -> list[str]:
|
||||
"""Check for duplicate keys in JSON by scanning for repeated quoted keys at same depth."""
|
||||
errors: list[str] = []
|
||||
# Use a custom approach: parse with object_pairs_hook
|
||||
seen_stack: list[set[str]] = []
|
||||
|
||||
def _check_pairs(pairs):
|
||||
level_keys: set[str] = set()
|
||||
for k, _ in pairs:
|
||||
if k in level_keys:
|
||||
errors.append(f"duplicate JSON key: '{k}'")
|
||||
level_keys.add(k)
|
||||
return dict(pairs)
|
||||
|
||||
try:
|
||||
json.loads(text, object_pairs_hook=_check_pairs)
|
||||
except json.JSONDecodeError:
|
||||
pass # syntax errors caught elsewhere
|
||||
return errors
|
||||
|
||||
|
||||
# ── Main validator ───────────────────────────────────────────────────────────
|
||||
|
||||
def find_config_files(root: Path) -> list[Path]:
|
||||
"""Recursively find .yaml, .yml, .json files (skip .git, node_modules, venv)."""
|
||||
skip_dirs = {".git", "node_modules", "venv", "__pycache__", ".venv"}
|
||||
results: list[Path] = []
|
||||
for dirpath, dirnames, filenames in os.walk(root):
|
||||
dirnames[:] = [d for d in dirnames if d not in skip_dirs]
|
||||
for fname in filenames:
|
||||
if fname.endswith((".yaml", ".yml", ".json")):
|
||||
results.append(Path(dirpath) / fname)
|
||||
return sorted(results)
|
||||
|
||||
|
||||
def validate_yaml_file(filepath: Path, text: str) -> list[str]:
|
||||
"""Validate a YAML file. Returns list of errors."""
|
||||
errors: list[str] = []
|
||||
|
||||
# Check for tabs
|
||||
for i, line in enumerate(text.splitlines(), 1):
|
||||
if "\t" in line:
|
||||
errors.append(f" line {i}: contains tab character (use spaces for YAML)")
|
||||
if line != line.rstrip():
|
||||
errors.append(f" line {i}: trailing whitespace")
|
||||
|
||||
# Check syntax + duplicate keys
|
||||
try:
|
||||
yaml.load(text, Loader=_StrictYAMLLoader)
|
||||
except DuplicateKeyError as e:
|
||||
errors.append(f" {e}")
|
||||
except yaml.YAMLError as e:
|
||||
mark = getattr(e, "problem_mark", None)
|
||||
if mark:
|
||||
errors.append(f" YAML syntax error at line {mark.line + 1}, col {mark.column + 1}: {e.problem}")
|
||||
else:
|
||||
errors.append(f" YAML syntax error: {e}")
|
||||
|
||||
# Check cron expressions in schedule fields
|
||||
for i, line in enumerate(text.splitlines(), 1):
|
||||
cron_match = re.search(r'(?:cron|schedule)\s*:\s*["\']?([*0-9/,a-zA-Z-]+(?:\s+[*0-9/,a-zA-Z-]+){4})["\']?', line)
|
||||
if cron_match:
|
||||
cron_errs = validate_cron(cron_match.group(1))
|
||||
for ce in cron_errs:
|
||||
errors.append(f" line {i}: invalid cron '{cron_match.group(1)}': {ce}")
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def validate_json_file(filepath: Path, text: str) -> list[str]:
|
||||
"""Validate a JSON file. Returns list of errors."""
|
||||
errors: list[str] = []
|
||||
|
||||
# Check syntax
|
||||
try:
|
||||
json.loads(text)
|
||||
except json.JSONDecodeError as e:
|
||||
errors.append(f" JSON syntax error at line {e.lineno}, col {e.colno}: {e.msg}")
|
||||
|
||||
# Check duplicate keys
|
||||
dup_errors = _json_has_duplicates(text)
|
||||
errors.extend(dup_errors)
|
||||
|
||||
# Check for trailing whitespace (informational)
|
||||
for i, line in enumerate(text.splitlines(), 1):
|
||||
if line != line.rstrip():
|
||||
errors.append(f" line {i}: trailing whitespace")
|
||||
|
||||
# Check cron expressions
|
||||
cron_pattern = re.compile(r'"(?:cron|schedule)"?\s*:\s*"([^"]{5,})"')
|
||||
for match in cron_pattern.finditer(text):
|
||||
candidate = match.group(1).strip()
|
||||
fields = candidate.split()
|
||||
if len(fields) == 5 and all(re.match(r'^[*0-9/,a-zA-Z-]+$', f) for f in fields):
|
||||
cron_errs = validate_cron(candidate)
|
||||
for ce in cron_errs:
|
||||
errors.append(f" invalid cron '{candidate}': {ce}")
|
||||
|
||||
# Also check nested schedule objects with cron fields
|
||||
try:
|
||||
obj = json.loads(text)
|
||||
_scan_obj_for_cron(obj, errors)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def _scan_obj_for_cron(obj, errors: list[str], path: str = ""):
|
||||
"""Recursively scan dict/list for cron expressions."""
|
||||
if isinstance(obj, dict):
|
||||
for k, v in obj.items():
|
||||
if k in ("cron", "schedule", "cron_expression") and isinstance(v, str):
|
||||
fields = v.strip().split()
|
||||
if len(fields) == 5:
|
||||
cron_errs = validate_cron(v)
|
||||
for ce in cron_errs:
|
||||
errors.append(f" {path}.{k}: invalid cron '{v}': {ce}")
|
||||
_scan_obj_for_cron(v, errors, f"{path}.{k}")
|
||||
elif isinstance(obj, list):
|
||||
for i, item in enumerate(obj):
|
||||
_scan_obj_for_cron(item, errors, f"{path}[{i}]")
|
||||
|
||||
|
||||
def main():
|
||||
# Determine repo root (script lives in scripts/)
|
||||
script_path = Path(__file__).resolve()
|
||||
repo_root = script_path.parent.parent
|
||||
|
||||
print(f"Config Validator — scanning {repo_root}")
|
||||
print("=" * 60)
|
||||
|
||||
files = find_config_files(repo_root)
|
||||
print(f"Found {len(files)} config files to validate.\n")
|
||||
|
||||
total_errors = 0
|
||||
failed_files: list[tuple[Path, list[str]]] = []
|
||||
|
||||
for filepath in files:
|
||||
rel = filepath.relative_to(repo_root)
|
||||
try:
|
||||
text = filepath.read_text(encoding="utf-8", errors="replace")
|
||||
except Exception as e:
|
||||
failed_files.append((rel, [f" cannot read file: {e}"]))
|
||||
total_errors += 1
|
||||
continue
|
||||
|
||||
if filepath.suffix == ".json":
|
||||
errors = validate_json_file(filepath, text)
|
||||
else:
|
||||
errors = validate_yaml_file(filepath, text)
|
||||
|
||||
if errors:
|
||||
failed_files.append((rel, errors))
|
||||
total_errors += len(errors)
|
||||
print(f"FAIL {rel}")
|
||||
else:
|
||||
print(f"PASS {rel}")
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print(f"Results: {len(files) - len(failed_files)}/{len(files)} files passed")
|
||||
|
||||
if failed_files:
|
||||
print(f"\n{total_errors} error(s) in {len(failed_files)} file(s):\n")
|
||||
for relpath, errs in failed_files:
|
||||
print(f" {relpath}:")
|
||||
for e in errs:
|
||||
print(f" {e}")
|
||||
print()
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("\nAll config files valid!")
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -4,8 +4,6 @@
|
||||
Part of the Gemini Sovereign Infrastructure Suite.
|
||||
|
||||
Auto-detects and fixes common failures across the fleet.
|
||||
|
||||
Safe-by-default: runs in dry-run mode unless --execute is given.
|
||||
"""
|
||||
|
||||
import os
|
||||
@@ -13,7 +11,6 @@ import sys
|
||||
import subprocess
|
||||
import argparse
|
||||
import requests
|
||||
import datetime
|
||||
|
||||
# --- CONFIGURATION ---
|
||||
FLEET = {
|
||||
@@ -24,210 +21,51 @@ FLEET = {
|
||||
}
|
||||
|
||||
class SelfHealer:
|
||||
def __init__(self, dry_run=True, confirm_kill=False, yes=False):
|
||||
self.dry_run = dry_run
|
||||
self.confirm_kill = confirm_kill
|
||||
self.yes = yes
|
||||
|
||||
def log(self, message: str):
|
||||
timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||
print(f"[{timestamp}] {message}")
|
||||
print(f"[*] {message}")
|
||||
|
||||
def run_remote(self, host: str, command: str):
|
||||
ip = FLEET[host]["ip"]
|
||||
ssh_cmd = ["ssh", "-o", "StrictHostKeyChecking=no", "-o", "ConnectTimeout=5", f"root@{ip}", command]
|
||||
ssh_cmd = ["ssh", "-o", "StrictHostKeyChecking=no", f"root@{ip}", command]
|
||||
if host == "mac":
|
||||
ssh_cmd = ["bash", "-c", command]
|
||||
try:
|
||||
return subprocess.run(ssh_cmd, capture_output=True, text=True, timeout=15)
|
||||
except Exception as e:
|
||||
self.log(f" [ERROR] Failed to run remote command on {host}: {e}")
|
||||
return None
|
||||
|
||||
def confirm(self, prompt: str) -> bool:
|
||||
"""Ask for confirmation unless --yes flag is set."""
|
||||
if self.yes:
|
||||
return True
|
||||
while True:
|
||||
response = input(f"{prompt} [y/N] ").strip().lower()
|
||||
if response in ("y", "yes"):
|
||||
return True
|
||||
elif response in ("n", "no", ""):
|
||||
return False
|
||||
print("Please answer 'y' or 'n'.")
|
||||
|
||||
def check_llama_server(self, host: str):
|
||||
ip = FLEET[host]["ip"]
|
||||
port = FLEET[host]["port"]
|
||||
try:
|
||||
requests.get(f"http://{ip}:{port}/health", timeout=2)
|
||||
return subprocess.run(ssh_cmd, capture_output=True, text=True, timeout=10)
|
||||
except:
|
||||
self.log(f" [!] llama-server down on {host}.")
|
||||
if self.dry_run:
|
||||
self.log(f" [DRY-RUN] Would restart llama-server on {host}")
|
||||
else:
|
||||
if self.confirm(f" Restart llama-server on {host}?"):
|
||||
self.log(f" Restarting llama-server on {host}...")
|
||||
self.run_remote(host, "systemctl restart llama-server")
|
||||
else:
|
||||
self.log(f" Skipped restart on {host}.")
|
||||
|
||||
def check_disk_space(self, host: str):
|
||||
res = self.run_remote(host, "df -h / | tail -1 | awk '{print $5}' | sed 's/%//'")
|
||||
if res and res.returncode == 0:
|
||||
try:
|
||||
usage = int(res.stdout.strip())
|
||||
if usage > 90:
|
||||
self.log(f" [!] Disk usage high on {host} ({usage}%).")
|
||||
if self.dry_run:
|
||||
self.log(f" [DRY-RUN] Would clean logs and vacuum journal on {host}")
|
||||
else:
|
||||
if self.confirm(f" Clean logs on {host}?"):
|
||||
self.log(f" Cleaning logs on {host}...")
|
||||
self.run_remote(host, "journalctl --vacuum-time=1d && rm -rf /var/log/*.gz")
|
||||
else:
|
||||
self.log(f" Skipped log cleaning on {host}.")
|
||||
except:
|
||||
pass
|
||||
|
||||
def check_memory(self, host: str):
|
||||
res = self.run_remote(host, "free -m | awk '/^Mem:/{print $3/$2 * 100}'")
|
||||
if res and res.returncode == 0:
|
||||
try:
|
||||
usage = float(res.stdout.strip())
|
||||
if usage > 90:
|
||||
self.log(f" [!] Memory usage high on {host} ({usage:.1f}%).")
|
||||
if self.dry_run:
|
||||
self.log(f" [DRY-RUN] Would check for memory hogs on {host}")
|
||||
else:
|
||||
self.log(f" Memory high but no automatic action defined.")
|
||||
except:
|
||||
pass
|
||||
|
||||
def check_processes(self, host: str):
|
||||
# Example: check if any process uses > 80% CPU
|
||||
res = self.run_remote(host, "ps aux --sort=-%cpu | awk 'NR>1 && $3>80 {print $2, $11, $3}'")
|
||||
if res and res.returncode == 0 and res.stdout.strip():
|
||||
self.log(f" [!] High CPU processes on {host}:")
|
||||
for line in res.stdout.strip().split('\n'):
|
||||
self.log(f" {line}")
|
||||
if self.dry_run:
|
||||
self.log(f" [DRY-RUN] Would review high-CPU processes on {host}")
|
||||
else:
|
||||
if self.confirm_kill:
|
||||
if self.confirm(f" Kill high-CPU processes on {host}? (dangerous)"):
|
||||
# This is a placeholder; real implementation would parse PIDs
|
||||
self.log(f" Process killing not implemented yet (placeholder).")
|
||||
else:
|
||||
self.log(f" Skipped killing processes on {host}.")
|
||||
else:
|
||||
self.log(f" Use --confirm-kill to enable process termination (dangerous).")
|
||||
return None
|
||||
|
||||
def check_and_heal(self):
|
||||
for host in FLEET:
|
||||
self.log(f"Auditing {host}...")
|
||||
self.check_llama_server(host)
|
||||
self.check_disk_space(host)
|
||||
self.check_memory(host)
|
||||
self.check_processes(host)
|
||||
|
||||
# 1. Check llama-server
|
||||
ip = FLEET[host]["ip"]
|
||||
port = FLEET[host]["port"]
|
||||
try:
|
||||
requests.get(f"http://{ip}:{port}/health", timeout=2)
|
||||
except:
|
||||
self.log(f" [!] llama-server down on {host}. Attempting restart...")
|
||||
self.run_remote(host, "systemctl restart llama-server")
|
||||
|
||||
# 2. Check disk space
|
||||
res = self.run_remote(host, "df -h / | tail -1 | awk '{print $5}' | sed 's/%//'")
|
||||
if res and res.returncode == 0:
|
||||
try:
|
||||
usage = int(res.stdout.strip())
|
||||
if usage > 90:
|
||||
self.log(f" [!] Disk usage high on {host} ({usage}%). Cleaning logs...")
|
||||
self.run_remote(host, "journalctl --vacuum-time=1d && rm -rf /var/log/*.gz")
|
||||
except:
|
||||
pass
|
||||
|
||||
def run(self):
|
||||
if self.dry_run:
|
||||
self.log("Starting self-healing cycle (DRY-RUN mode).")
|
||||
else:
|
||||
self.log("Starting self-healing cycle (EXECUTE mode).")
|
||||
self.log("Starting self-healing cycle...")
|
||||
self.check_and_heal()
|
||||
self.log("Cycle complete.")
|
||||
|
||||
def print_help_safe():
|
||||
"""Print detailed explanation of what each action does."""
|
||||
help_text = """
|
||||
SAFE-BY-DEFAULT SELF-HEALING SCRIPT
|
||||
|
||||
This script checks fleet health and can optionally fix issues.
|
||||
|
||||
DEFAULT MODE: DRY-RUN (safe)
|
||||
- Only reports what it would do, does not make changes.
|
||||
- Use --execute to actually perform fixes.
|
||||
|
||||
CHECKS PERFORMED:
|
||||
1. llama-server health
|
||||
- Checks if llama-server is responding on each host.
|
||||
- Action: restart service (requires --execute and confirmation).
|
||||
|
||||
2. Disk space
|
||||
- Checks root partition usage on each host.
|
||||
- Action: vacuum journal logs and remove rotated logs if >90% (requires --execute and confirmation).
|
||||
|
||||
3. Memory usage
|
||||
- Reports high memory usage (informational only, no automatic action).
|
||||
|
||||
4. Process health
|
||||
- Lists processes using >80% CPU.
|
||||
- Action: kill processes (requires --confirm-kill flag, --execute, and confirmation).
|
||||
|
||||
SAFETY FEATURES:
|
||||
- Dry-run by default.
|
||||
- Explicit --execute flag required for changes.
|
||||
- Confirmation prompts for all destructive actions.
|
||||
- --yes flag to skip confirmations (for automation).
|
||||
- --confirm-kill flag required to even consider killing processes.
|
||||
- Timestamps on all log messages.
|
||||
|
||||
EXAMPLES:
|
||||
python3 scripts/self_healing.py
|
||||
# Dry-run: safe, shows what would happen.
|
||||
|
||||
python3 scripts/self_healing.py --execute
|
||||
# Actually perform fixes after confirmation.
|
||||
|
||||
python3 scripts/self_healing.py --execute --yes
|
||||
# Perform fixes without prompts (automation).
|
||||
|
||||
python3 scripts/self_healing.py --execute --confirm-kill
|
||||
# Allow killing processes (dangerous).
|
||||
|
||||
python3 scripts/self_healing.py --help-safe
|
||||
# Show this help.
|
||||
"""
|
||||
print(help_text)
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Self-healing infrastructure script (safe-by-default).",
|
||||
add_help=False # We'll handle --help ourselves
|
||||
)
|
||||
parser.add_argument("--dry-run", action="store_true", default=False,
|
||||
help="Run in dry-run mode (default behavior).")
|
||||
parser.add_argument("--execute", action="store_true", default=False,
|
||||
help="Actually perform fixes (disables dry-run).")
|
||||
parser.add_argument("--confirm-kill", action="store_true", default=False,
|
||||
help="Allow killing processes (dangerous).")
|
||||
parser.add_argument("--yes", "-y", action="store_true", default=False,
|
||||
help="Skip confirmation prompts.")
|
||||
parser.add_argument("--help-safe", action="store_true", default=False,
|
||||
help="Show detailed help about safety features.")
|
||||
parser.add_argument("--help", "-h", action="store_true", default=False,
|
||||
help="Show standard help.")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.help_safe:
|
||||
print_help_safe()
|
||||
sys.exit(0)
|
||||
|
||||
if args.help:
|
||||
parser.print_help()
|
||||
sys.exit(0)
|
||||
|
||||
# Determine mode: if --execute is given, disable dry-run
|
||||
dry_run = not args.execute
|
||||
# If --dry-run is explicitly given, ensure dry-run (redundant but clear)
|
||||
if args.dry_run:
|
||||
dry_run = True
|
||||
|
||||
healer = SelfHealer(dry_run=dry_run, confirm_kill=args.confirm_kill, yes=args.yes)
|
||||
healer = SelfHealer()
|
||||
healer.run()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
main()
|
||||
|
||||
@@ -1,331 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Task Gate — Pre-task and post-task quality gates for fleet agents.
|
||||
|
||||
This is the missing enforcement layer between the orchestrator dispatching
|
||||
an issue and an agent submitting a PR. SOUL.md demands "grounding before
|
||||
generation" and "the apparatus that gives these words teeth" — this script
|
||||
is that apparatus.
|
||||
|
||||
Usage:
|
||||
python3 task_gate.py pre --repo timmy-config --issue 123 --agent groq
|
||||
python3 task_gate.py post --repo timmy-config --issue 123 --agent groq --branch groq/issue-123
|
||||
|
||||
Pre-task gate checks:
|
||||
1. Issue is not already assigned to a different agent
|
||||
2. No existing branch targets this issue
|
||||
3. No open PR already addresses this issue
|
||||
4. Agent is in the correct lane per playbooks/agent-lanes.json
|
||||
5. Issue is not filtered (epic, permanent, etc.)
|
||||
|
||||
Post-task gate checks:
|
||||
1. Branch exists and has commits ahead of main
|
||||
2. Changed files pass syntax_guard.py
|
||||
3. No duplicate PR exists for the same issue
|
||||
4. Branch name follows convention: {agent}/{description}
|
||||
5. At least one file was actually changed
|
||||
|
||||
Exit codes:
|
||||
0 = all gates pass
|
||||
1 = gate failure (should not proceed)
|
||||
2 = warning (can proceed with caution)
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CONFIG
|
||||
# ---------------------------------------------------------------------------
|
||||
GITEA_API = "https://forge.alexanderwhitestone.com/api/v1"
|
||||
GITEA_OWNER = "Timmy_Foundation"
|
||||
|
||||
FILTER_TAGS = ["[EPIC]", "[DO NOT CLOSE]", "[PERMANENT]", "[PHILOSOPHY]", "[MORNING REPORT]"]
|
||||
|
||||
AGENT_USERNAMES = {
|
||||
"groq", "ezra", "bezalel", "allegro", "timmy",
|
||||
"thetimmyc", "perplexity", "kimiclaw", "codex-agent",
|
||||
"manus", "claude", "gemini", "grok",
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GITEA API
|
||||
# ---------------------------------------------------------------------------
|
||||
def load_gitea_token():
|
||||
token = os.environ.get("GITEA_TOKEN", "")
|
||||
if token:
|
||||
return token.strip()
|
||||
for path in [
|
||||
os.path.expanduser("~/.hermes/gitea_token_vps"),
|
||||
os.path.expanduser("~/.hermes/gitea_token"),
|
||||
]:
|
||||
try:
|
||||
with open(path) as f:
|
||||
return f.read().strip()
|
||||
except FileNotFoundError:
|
||||
continue
|
||||
print("[FATAL] No GITEA_TOKEN found")
|
||||
sys.exit(2)
|
||||
|
||||
|
||||
def gitea_get(path):
|
||||
token = load_gitea_token()
|
||||
url = f"{GITEA_API}{path}"
|
||||
req = urllib.request.Request(url, headers={
|
||||
"Authorization": f"token {token}",
|
||||
"Accept": "application/json",
|
||||
})
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
return json.loads(resp.read().decode())
|
||||
except urllib.error.HTTPError as e:
|
||||
if e.code == 404:
|
||||
return None
|
||||
print(f"[API ERROR] {url} -> {e.code}")
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"[API ERROR] {url} -> {e}")
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# LANE CHECKER
|
||||
# ---------------------------------------------------------------------------
|
||||
def load_agent_lanes():
|
||||
"""Load agent lane assignments from playbooks/agent-lanes.json."""
|
||||
lanes_path = os.path.join(
|
||||
os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
|
||||
"playbooks", "agent-lanes.json"
|
||||
)
|
||||
try:
|
||||
with open(lanes_path) as f:
|
||||
return json.load(f)
|
||||
except FileNotFoundError:
|
||||
return {} # no lanes file = no lane enforcement
|
||||
|
||||
|
||||
def check_agent_lane(agent, issue_title, issue_labels, lanes):
|
||||
"""Check if the agent is in the right lane for this issue type."""
|
||||
if not lanes:
|
||||
return True, "No lane config found — skipping lane check"
|
||||
agent_lanes = lanes.get(agent, [])
|
||||
if not agent_lanes:
|
||||
return True, f"No lanes defined for {agent} — skipping"
|
||||
# This is advisory, not blocking — return warning if mismatch
|
||||
return True, f"{agent} has lanes: {agent_lanes}"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PRE-TASK GATE
|
||||
# ---------------------------------------------------------------------------
|
||||
def pre_task_gate(repo, issue_number, agent):
|
||||
"""Run all pre-task checks. Returns (pass, messages)."""
|
||||
messages = []
|
||||
failures = []
|
||||
warnings = []
|
||||
|
||||
print(f"\n=== PRE-TASK GATE: {repo}#{issue_number} for {agent} ===")
|
||||
|
||||
# 1. Fetch issue
|
||||
issue = gitea_get(f"/repos/{GITEA_OWNER}/{repo}/issues/{issue_number}")
|
||||
if not issue:
|
||||
failures.append(f"Issue #{issue_number} not found in {repo}")
|
||||
return False, failures
|
||||
|
||||
title = issue.get("title", "")
|
||||
print(f" Issue: {title}")
|
||||
|
||||
# 2. Check if filtered
|
||||
title_upper = title.upper()
|
||||
for tag in FILTER_TAGS:
|
||||
if tag.upper().replace("[", "").replace("]", "") in title_upper:
|
||||
failures.append(f"Issue has filter tag: {tag} — should not be auto-dispatched")
|
||||
|
||||
# 3. Check assignees
|
||||
assignees = [a.get("login", "") for a in (issue.get("assignees") or [])]
|
||||
other_agents = [a for a in assignees if a.lower() in AGENT_USERNAMES and a.lower() != agent.lower()]
|
||||
if other_agents:
|
||||
failures.append(f"Already assigned to other agent(s): {other_agents}")
|
||||
|
||||
# 4. Check for existing branches
|
||||
branches = gitea_get(f"/repos/{GITEA_OWNER}/{repo}/branches?limit=50")
|
||||
if branches:
|
||||
issue_branches = [
|
||||
b["name"] for b in branches
|
||||
if str(issue_number) in b.get("name", "")
|
||||
and b["name"] != "main"
|
||||
]
|
||||
if issue_branches:
|
||||
warnings.append(f"Existing branches may target this issue: {issue_branches}")
|
||||
|
||||
# 5. Check for existing PRs
|
||||
prs = gitea_get(f"/repos/{GITEA_OWNER}/{repo}/pulls?state=open&limit=50")
|
||||
if prs:
|
||||
issue_prs = [
|
||||
f"PR #{p['number']}: {p['title']}"
|
||||
for p in prs
|
||||
if str(issue_number) in p.get("title", "")
|
||||
or str(issue_number) in p.get("body", "")
|
||||
]
|
||||
if issue_prs:
|
||||
failures.append(f"Open PR(s) already target this issue: {issue_prs}")
|
||||
|
||||
# 6. Check agent lanes
|
||||
lanes = load_agent_lanes()
|
||||
labels = [l.get("name", "") for l in (issue.get("labels") or [])]
|
||||
lane_ok, lane_msg = check_agent_lane(agent, title, labels, lanes)
|
||||
if not lane_ok:
|
||||
warnings.append(lane_msg)
|
||||
else:
|
||||
messages.append(f" Lane: {lane_msg}")
|
||||
|
||||
# Report
|
||||
if failures:
|
||||
print("\n FAILURES:")
|
||||
for f in failures:
|
||||
print(f" ❌ {f}")
|
||||
if warnings:
|
||||
print("\n WARNINGS:")
|
||||
for w in warnings:
|
||||
print(f" ⚠️ {w}")
|
||||
if not failures and not warnings:
|
||||
print(" \u2705 All pre-task gates passed")
|
||||
|
||||
passed = len(failures) == 0
|
||||
return passed, failures + warnings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# POST-TASK GATE
|
||||
# ---------------------------------------------------------------------------
|
||||
def post_task_gate(repo, issue_number, agent, branch):
|
||||
"""Run all post-task checks. Returns (pass, messages)."""
|
||||
failures = []
|
||||
warnings = []
|
||||
|
||||
print(f"\n=== POST-TASK GATE: {repo}#{issue_number} by {agent} ===")
|
||||
print(f" Branch: {branch}")
|
||||
|
||||
# 1. Check branch exists
|
||||
branch_info = gitea_get(
|
||||
f"/repos/{GITEA_OWNER}/{repo}/branches/{urllib.parse.quote(branch, safe='')}"
|
||||
)
|
||||
if not branch_info:
|
||||
failures.append(f"Branch '{branch}' does not exist")
|
||||
return False, failures
|
||||
|
||||
# 2. Check branch naming convention
|
||||
if "/" not in branch:
|
||||
warnings.append(f"Branch name '{branch}' doesn't follow agent/description convention")
|
||||
elif not branch.startswith(f"{agent}/"):
|
||||
warnings.append(f"Branch '{branch}' doesn't start with agent name '{agent}/")
|
||||
|
||||
# 3. Check for commits ahead of main
|
||||
compare = gitea_get(
|
||||
f"/repos/{GITEA_OWNER}/{repo}/compare/main...{urllib.parse.quote(branch, safe='')}"
|
||||
)
|
||||
if compare:
|
||||
commits = compare.get("commits", [])
|
||||
if not commits:
|
||||
failures.append("Branch has no commits ahead of main")
|
||||
else:
|
||||
print(f" Commits ahead: {len(commits)}")
|
||||
files = compare.get("diff_files", []) or []
|
||||
if not files:
|
||||
# Try alternate key
|
||||
num_files = compare.get("total_commits", 0)
|
||||
print(f" Files changed: (check PR diff)")
|
||||
else:
|
||||
print(f" Files changed: {len(files)}")
|
||||
|
||||
# 4. Check for duplicate PRs
|
||||
prs = gitea_get(f"/repos/{GITEA_OWNER}/{repo}/pulls?state=open&limit=50")
|
||||
if prs:
|
||||
dupe_prs = [
|
||||
f"PR #{p['number']}"
|
||||
for p in prs
|
||||
if str(issue_number) in p.get("title", "")
|
||||
or str(issue_number) in p.get("body", "")
|
||||
]
|
||||
if len(dupe_prs) > 1:
|
||||
warnings.append(f"Multiple open PRs may target issue #{issue_number}: {dupe_prs}")
|
||||
|
||||
# 5. Run syntax guard on changed files (if available)
|
||||
syntax_guard = os.path.join(
|
||||
os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
|
||||
"hermes-sovereign", "scripts", "syntax_guard.py"
|
||||
)
|
||||
if os.path.exists(syntax_guard):
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[sys.executable, syntax_guard],
|
||||
capture_output=True, text=True, timeout=30
|
||||
)
|
||||
if result.returncode != 0:
|
||||
failures.append(f"Syntax guard failed: {result.stdout[:200]}")
|
||||
else:
|
||||
print(" Syntax guard: passed")
|
||||
except Exception as e:
|
||||
warnings.append(f"Could not run syntax guard: {e}")
|
||||
else:
|
||||
warnings.append("syntax_guard.py not found — skipping syntax check")
|
||||
|
||||
# Report
|
||||
if failures:
|
||||
print("\n FAILURES:")
|
||||
for f in failures:
|
||||
print(f" ❌ {f}")
|
||||
if warnings:
|
||||
print("\n WARNINGS:")
|
||||
for w in warnings:
|
||||
print(f" ⚠️ {w}")
|
||||
if not failures and not warnings:
|
||||
print(" \u2705 All post-task gates passed")
|
||||
|
||||
passed = len(failures) == 0
|
||||
return passed, failures + warnings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# MAIN
|
||||
# ---------------------------------------------------------------------------
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Task Gate — pre/post-task quality gates")
|
||||
subparsers = parser.add_subparsers(dest="command")
|
||||
|
||||
# Pre-task
|
||||
pre = subparsers.add_parser("pre", help="Run pre-task gates")
|
||||
pre.add_argument("--repo", required=True)
|
||||
pre.add_argument("--issue", type=int, required=True)
|
||||
pre.add_argument("--agent", required=True)
|
||||
|
||||
# Post-task
|
||||
post = subparsers.add_parser("post", help="Run post-task gates")
|
||||
post.add_argument("--repo", required=True)
|
||||
post.add_argument("--issue", type=int, required=True)
|
||||
post.add_argument("--agent", required=True)
|
||||
post.add_argument("--branch", required=True)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not args.command:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
if args.command == "pre":
|
||||
passed, msgs = pre_task_gate(args.repo, args.issue, args.agent)
|
||||
elif args.command == "post":
|
||||
passed, msgs = post_task_gate(args.repo, args.issue, args.agent, args.branch)
|
||||
else:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
sys.exit(0 if passed else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,195 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# test_harness.sh — Common CLI safety/test harness for the scripts/ suite
|
||||
# Usage: ./scripts/test_harness.sh [--verbose] [--ci] [directory]
|
||||
#
|
||||
# Discovers .sh, .py, and .yaml files in the target directory and validates them:
|
||||
# - .sh : runs shellcheck (or SKIPS if unavailable)
|
||||
# - .py : runs python3 -m py_compile
|
||||
# - .yaml: validates with python3 yaml.safe_load
|
||||
#
|
||||
# Exit codes: 0 = all pass, 1 = any fail
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# --- Defaults ---
|
||||
VERBOSE=0
|
||||
CI_MODE=0
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
TARGET_DIR="${SCRIPT_DIR}"
|
||||
|
||||
# --- Colors (disabled in CI) ---
|
||||
RED=""
|
||||
GREEN=""
|
||||
YELLOW=""
|
||||
CYAN=""
|
||||
RESET=""
|
||||
if [[ -t 1 && "${CI:-}" != "true" ]]; then
|
||||
RED=$'\033[0;31m'
|
||||
GREEN=$'\033[0;32m'
|
||||
YELLOW=$'\033[0;33m'
|
||||
CYAN=$'\033[0;36m'
|
||||
RESET=$'\033[0m'
|
||||
fi
|
||||
|
||||
# --- Argument parsing ---
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--verbose|-v) VERBOSE=1; shift ;;
|
||||
--ci) CI_MODE=1; shift ;;
|
||||
-*) echo "Unknown option: $1" >&2; exit 2 ;;
|
||||
*) TARGET_DIR="$1"; shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
# --- Counters ---
|
||||
PASS=0
|
||||
FAIL=0
|
||||
SKIP=0
|
||||
TOTAL=0
|
||||
|
||||
# --- Helpers ---
|
||||
log_verbose() {
|
||||
if [[ "${VERBOSE}" -eq 1 ]]; then
|
||||
echo " ${CYAN}[DEBUG]${RESET} $*"
|
||||
fi
|
||||
}
|
||||
|
||||
record_pass() {
|
||||
((PASS++))
|
||||
((TOTAL++))
|
||||
echo "${GREEN}PASS${RESET} $1"
|
||||
}
|
||||
|
||||
record_fail() {
|
||||
((FAIL++))
|
||||
((TOTAL++))
|
||||
echo "${RED}FAIL${RESET} $1"
|
||||
if [[ -n "${2:-}" ]]; then
|
||||
echo " ${2}"
|
||||
fi
|
||||
}
|
||||
|
||||
record_skip() {
|
||||
((SKIP++))
|
||||
((TOTAL++))
|
||||
echo "${YELLOW}SKIP${RESET} $1 — $2"
|
||||
}
|
||||
|
||||
# --- Checkers ---
|
||||
check_shell_file() {
|
||||
local file="$1"
|
||||
local rel="${file#${TARGET_DIR}/}"
|
||||
if command -v shellcheck &>/dev/null; then
|
||||
log_verbose "Running shellcheck on ${rel}"
|
||||
local output
|
||||
if output=$(shellcheck -x -S warning "${file}" 2>&1); then
|
||||
record_pass "${rel}"
|
||||
else
|
||||
record_fail "${rel}" "${output}"
|
||||
fi
|
||||
else
|
||||
record_skip "${rel}" "shellcheck not installed"
|
||||
fi
|
||||
}
|
||||
|
||||
check_python_file() {
|
||||
local file="$1"
|
||||
local rel="${file#${TARGET_DIR}/}"
|
||||
log_verbose "Running py_compile on ${rel}"
|
||||
local output
|
||||
if output=$(python3 -m py_compile "${file}" 2>&1); then
|
||||
record_pass "${rel}"
|
||||
else
|
||||
record_fail "${rel}" "${output}"
|
||||
fi
|
||||
}
|
||||
|
||||
check_yaml_file() {
|
||||
local file="$1"
|
||||
local rel="${file#${TARGET_DIR}/}"
|
||||
log_verbose "Validating YAML: ${rel}"
|
||||
local output
|
||||
if output=$(python3 -c "import yaml; yaml.safe_load(open('${file}'))" 2>&1); then
|
||||
record_pass "${rel}"
|
||||
else
|
||||
record_fail "${rel}" "${output}"
|
||||
fi
|
||||
}
|
||||
|
||||
# --- Main ---
|
||||
echo ""
|
||||
echo "=== scripts/ test harness ==="
|
||||
echo "Target: ${TARGET_DIR}"
|
||||
echo ""
|
||||
|
||||
if [[ ! -d "${TARGET_DIR}" ]]; then
|
||||
echo "Error: target directory '${TARGET_DIR}' not found" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check python3 availability
|
||||
if ! command -v python3 &>/dev/null; then
|
||||
echo "${RED}Error: python3 is required but not found${RESET}" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check PyYAML availability
|
||||
if ! python3 -c "import yaml" 2>/dev/null; then
|
||||
echo "${YELLOW}Warning: PyYAML not installed — YAML checks will be skipped${RESET}" >&2
|
||||
YAML_AVAILABLE=0
|
||||
else
|
||||
YAML_AVAILABLE=1
|
||||
fi
|
||||
|
||||
# Discover and check .sh files
|
||||
sh_files=()
|
||||
while IFS= read -r -d '' f; do
|
||||
sh_files+=("$f")
|
||||
done < <(find "${TARGET_DIR}" -maxdepth 1 -name "*.sh" ! -name "test_harness.sh" ! -name "test_runner.sh" -print0 | sort -z)
|
||||
|
||||
for f in "${sh_files[@]:-}"; do
|
||||
[[ -n "$f" ]] && check_shell_file "$f"
|
||||
done
|
||||
|
||||
# Discover and check .py files
|
||||
py_files=()
|
||||
while IFS= read -r -d '' f; do
|
||||
py_files+=("$f")
|
||||
done < <(find "${TARGET_DIR}" -maxdepth 1 -name "*.py" -print0 | sort -z)
|
||||
|
||||
for f in "${py_files[@]:-}"; do
|
||||
[[ -n "$f" ]] && check_python_file "$f"
|
||||
done
|
||||
|
||||
# Discover and check .yaml files in target dir
|
||||
yaml_files=()
|
||||
while IFS= read -r -d '' f; do
|
||||
yaml_files+=("$f")
|
||||
done < <(find "${TARGET_DIR}" -maxdepth 1 -name "*.yaml" -print0 | sort -z)
|
||||
|
||||
if [[ "${YAML_AVAILABLE}" -eq 1 ]]; then
|
||||
for f in "${yaml_files[@]:-}"; do
|
||||
[[ -n "$f" ]] && check_yaml_file "$f"
|
||||
done
|
||||
else
|
||||
for f in "${yaml_files[@]:-}"; do
|
||||
[[ -n "$f" ]] && record_skip "${f#${TARGET_DIR}/}" "PyYAML not installed"
|
||||
done
|
||||
fi
|
||||
|
||||
# --- Summary ---
|
||||
echo ""
|
||||
echo "=== Results ==="
|
||||
echo " ${GREEN}PASS${RESET}: ${PASS}"
|
||||
echo " ${RED}FAIL${RESET}: ${FAIL}"
|
||||
echo " ${YELLOW}SKIP${RESET}: ${SKIP}"
|
||||
echo " Total: ${TOTAL}"
|
||||
echo ""
|
||||
|
||||
if [[ "${FAIL}" -gt 0 ]]; then
|
||||
echo "${RED}FAILED${RESET} — ${FAIL} file(s) did not pass validation."
|
||||
exit 1
|
||||
else
|
||||
echo "${GREEN}ALL CLEAR${RESET} — all checked files passed."
|
||||
exit 0
|
||||
fi
|
||||
@@ -1,9 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# test_runner.sh — Convenience wrapper for test_harness.sh
|
||||
# Runs the test harness with sensible defaults for local development.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
exec "${SCRIPT_DIR}/test_harness.sh" --verbose "$@"
|
||||
@@ -1,22 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Validate playbook YAML files have required keys."""
|
||||
import yaml
|
||||
import sys
|
||||
import glob
|
||||
|
||||
required_keys = {'name', 'description'}
|
||||
|
||||
for f in glob.glob('playbooks/*.yaml'):
|
||||
with open(f) as fh:
|
||||
try:
|
||||
data = yaml.safe_load(fh)
|
||||
if not isinstance(data, dict):
|
||||
print(f'ERROR: {f} is not a YAML mapping')
|
||||
sys.exit(1)
|
||||
missing = required_keys - set(data.keys())
|
||||
if missing:
|
||||
print(f'WARNING: {f} missing keys: {missing}')
|
||||
print(f'OK: {f}')
|
||||
except yaml.YAMLError as e:
|
||||
print(f'ERROR: {f}: {e}')
|
||||
sys.exit(1)
|
||||
@@ -102,11 +102,11 @@ When I don't know, I say so. Brevity is a kindness.
|
||||
### 4. Never Go Deaf
|
||||
Your agent must have a fallback chain (a list of backup models, tried in order) at least 3 models deep. When the primary provider rate-limits you, the agent degrades gracefully — it does not stop.
|
||||
|
||||
When Anthropic goes down at 2 AM — and it will — your agent doesn't sit there producing error messages. It switches to the next model in the chain and keeps working. You wake up to finished tasks, not a dead agent.
|
||||
When any cloud provider goes down at 2 AM — and it will — your agent doesn't sit there producing error messages. It switches to the next model in the chain and keeps working. You wake up to finished tasks, not a dead agent.
|
||||
|
||||
```yaml
|
||||
model:
|
||||
default: claude-opus-4-6
|
||||
default: kimi-k2.5
|
||||
provider: anthropic
|
||||
fallback_providers:
|
||||
- provider: openrouter
|
||||
|
||||
3
tasks.py
3
tasks.py
@@ -1355,7 +1355,6 @@ def dispatch_assigned():
|
||||
g = GiteaClient()
|
||||
agents = [
|
||||
"allegro",
|
||||
"claude",
|
||||
"codex-agent",
|
||||
"ezra",
|
||||
"gemini",
|
||||
@@ -2316,7 +2315,7 @@ def nexus_bridge_tick():
|
||||
health_data = {
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"fleet_status": "nominal",
|
||||
"active_agents": ["gemini", "claude", "codex"],
|
||||
"active_agents": ["gemini", "kimi", "codex"],
|
||||
"backlog_summary": {},
|
||||
"recent_audits": []
|
||||
}
|
||||
|
||||
@@ -1,233 +0,0 @@
|
||||
"""Tests for Architecture Linter v2.
|
||||
|
||||
Validates that the linter correctly detects violations and passes clean repos.
|
||||
Refs: #437 — test-backed linter.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
# Add scripts/ to path
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent / "scripts"))
|
||||
|
||||
from architecture_linter_v2 import Linter, LinterResult
|
||||
|
||||
|
||||
# ── helpers ───────────────────────────────────────────────────────────
|
||||
|
||||
def _make_repo(tmpdir: str, files: dict[str, str], name: str = "test-repo") -> Path:
|
||||
"""Create a fake repo with given files and return its path."""
|
||||
repo = Path(tmpdir) / name
|
||||
repo.mkdir()
|
||||
for relpath, content in files.items():
|
||||
p = repo / relpath
|
||||
p.parent.mkdir(parents=True, exist_ok=True)
|
||||
p.write_text(content)
|
||||
return repo
|
||||
|
||||
|
||||
def _run(tmpdir, files, name="test-repo"):
|
||||
repo = _make_repo(tmpdir, files, name)
|
||||
return Linter(str(repo)).run()
|
||||
|
||||
|
||||
# ── clean repo passes ─────────────────────────────────────────────────
|
||||
|
||||
def test_clean_repo_passes():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# Test Repo\n\nThis is a clean test repo with sufficient content to pass.",
|
||||
"main.py": "print('hello world')\n",
|
||||
})
|
||||
assert result.passed, f"Expected pass but got: {result.errors}"
|
||||
assert result.violation_count == 0
|
||||
|
||||
|
||||
# ── missing README ────────────────────────────────────────────────────
|
||||
|
||||
def test_missing_readme_fails():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {"main.py": "x = 1\n"})
|
||||
assert not result.passed
|
||||
assert any("README" in e for e in result.errors)
|
||||
|
||||
|
||||
def test_short_readme_warns():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {"README.md": "hi\n"})
|
||||
# Warnings don't fail the build
|
||||
assert result.passed
|
||||
assert any("short" in w.lower() for w in result.warnings)
|
||||
|
||||
|
||||
# ── hardcoded IPs ─────────────────────────────────────────────────────
|
||||
|
||||
def test_hardcoded_public_ip_detected():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"server.py": "HOST = '203.0.113.42'\n",
|
||||
})
|
||||
assert not result.passed
|
||||
assert any("203.0.113.42" in e for e in result.errors)
|
||||
|
||||
|
||||
def test_localhost_ip_ignored():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"server.py": "HOST = '127.0.0.1'\n",
|
||||
})
|
||||
ip_errors = [e for e in result.errors if "IP" in e]
|
||||
assert len(ip_errors) == 0
|
||||
|
||||
|
||||
# ── API keys ──────────────────────────────────────────────────────────
|
||||
|
||||
def test_openai_key_detected():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"config.py": 'key = "sk-abcdefghijklmnopqrstuvwx"\n',
|
||||
})
|
||||
assert not result.passed
|
||||
assert any("secret" in e.lower() or "key" in e.lower() for e in result.errors)
|
||||
|
||||
|
||||
def test_aws_key_detected():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"deploy.yaml": 'aws_key: AKIAIOSFODNN7EXAMPLE\n',
|
||||
})
|
||||
assert not result.passed
|
||||
assert any("secret" in e.lower() for e in result.errors)
|
||||
|
||||
|
||||
def test_env_example_skipped():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
".env.example": 'OPENAI_KEY=sk-placeholder\n',
|
||||
})
|
||||
secret_errors = [e for e in result.errors if "secret" in e.lower()]
|
||||
assert len(secret_errors) == 0
|
||||
|
||||
|
||||
# ── sovereignty rules (v1 cloud API checks) ───────────────────────────
|
||||
|
||||
def test_openai_url_detected():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"app.py": 'url = "https://api.openai.com/v1/chat"\n',
|
||||
})
|
||||
assert not result.passed
|
||||
assert any("openai" in e.lower() for e in result.errors)
|
||||
|
||||
|
||||
def test_cloud_provider_detected():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"config.yaml": "provider: openai\n",
|
||||
})
|
||||
assert not result.passed
|
||||
assert any("provider" in e.lower() for e in result.errors)
|
||||
|
||||
|
||||
# ── sidecar boundary ──────────────────────────────────────────────────
|
||||
|
||||
def test_sovereign_keyword_in_hermes_agent_fails():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"index.py": "import mempalace\n",
|
||||
}, name="hermes-agent")
|
||||
assert not result.passed
|
||||
assert any("sidecar" in e.lower() or "mempalace" in e.lower() for e in result.errors)
|
||||
|
||||
|
||||
def test_sovereign_keyword_in_other_repo_ok():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"index.py": "import mempalace\n",
|
||||
}, name="some-other-repo")
|
||||
sidecar_errors = [e for e in result.errors if "sidecar" in e.lower()]
|
||||
assert len(sidecar_errors) == 0
|
||||
|
||||
|
||||
# ── SOUL.md canonical location ────────────────────────────────────────
|
||||
|
||||
def test_soul_md_required_in_timmy_config():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# timmy-config\n\nConfig repo.",
|
||||
}, name="timmy-config")
|
||||
assert not result.passed
|
||||
assert any("SOUL.md" in e for e in result.errors)
|
||||
|
||||
|
||||
def test_soul_md_present_in_timmy_config_ok():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# timmy-config\n\nConfig repo.",
|
||||
"SOUL.md": "# Soul\n\nCanonical identity document.",
|
||||
}, name="timmy-config")
|
||||
soul_errors = [e for e in result.errors if "SOUL" in e]
|
||||
assert len(soul_errors) == 0
|
||||
|
||||
|
||||
def test_soul_md_in_wrong_repo_fails():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"SOUL.md": "# Soul\n\nShould not be here.",
|
||||
}, name="other-repo")
|
||||
assert any("canonical" in e.lower() for e in result.errors)
|
||||
|
||||
|
||||
# ── LinterResult structure ────────────────────────────────────────────
|
||||
|
||||
def test_result_summary_is_string():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {"README.md": "# OK repo with enough text here\n"})
|
||||
assert isinstance(result.summary(), str)
|
||||
assert "PASSED" in result.summary() or "FAILED" in result.summary()
|
||||
|
||||
|
||||
def test_result_repo_name():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
result = _run(tmp, {"README.md": "# OK\n"}, name="my-repo")
|
||||
assert result.repo_name == "my-repo"
|
||||
|
||||
|
||||
# ── invalid path ──────────────────────────────────────────────────────
|
||||
|
||||
def test_invalid_path_raises():
|
||||
try:
|
||||
Linter("/nonexistent/path/xyz")
|
||||
assert False, "Should have raised FileNotFoundError"
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
|
||||
|
||||
# ── skip dirs ──────────────────────────────────────────────────────────
|
||||
|
||||
def test_git_dir_skipped():
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = _make_repo(tmp, {
|
||||
"README.md": "# R\n\nGood repo.",
|
||||
"main.py": "x = 1\n",
|
||||
})
|
||||
# Create a .git/ dir with a bad file
|
||||
git_dir = repo / ".git"
|
||||
git_dir.mkdir()
|
||||
(git_dir / "bad.py").write_text("HOST = '203.0.113.1'\n")
|
||||
|
||||
result = Linter(str(repo)).run()
|
||||
git_errors = [e for e in result.errors if ".git" in e]
|
||||
assert len(git_errors) == 0
|
||||
@@ -200,3 +200,97 @@ class TestVoiceSovereignty:
|
||||
stt_provider = config.get("stt", {}).get("provider", "")
|
||||
assert stt_provider in ("local", "whisper", ""), \
|
||||
f"STT provider '{stt_provider}' may use cloud"
|
||||
|
||||
|
||||
# ── Anthropic Ban ────────────────────────────────────────────────────
|
||||
|
||||
class TestAnthropicBan:
|
||||
"""Anthropic is permanently banned from this system.
|
||||
|
||||
Not deprecated. Not discouraged. Banned. Any reference to Anthropic
|
||||
as a provider, model, or API endpoint in active wizard configs,
|
||||
playbooks, or fallback chains is a hard failure.
|
||||
"""
|
||||
|
||||
BANNED_PATTERNS = [
|
||||
"provider: anthropic",
|
||||
"provider: \"anthropic\"",
|
||||
"anthropic/claude",
|
||||
"claude-opus",
|
||||
"claude-sonnet",
|
||||
"claude-haiku",
|
||||
"api.anthropic.com",
|
||||
]
|
||||
|
||||
ACTIVE_CONFIG_DIRS = [
|
||||
"wizards",
|
||||
"playbooks",
|
||||
]
|
||||
|
||||
ACTIVE_CONFIG_FILES = [
|
||||
"fallback-portfolios.yaml",
|
||||
"config.yaml",
|
||||
]
|
||||
|
||||
def _scan_active_configs(self):
|
||||
"""Collect all active config files for scanning."""
|
||||
files = []
|
||||
for dir_name in self.ACTIVE_CONFIG_DIRS:
|
||||
dir_path = REPO_ROOT / dir_name
|
||||
if dir_path.exists():
|
||||
for f in dir_path.rglob("*.yaml"):
|
||||
files.append(f)
|
||||
for f in dir_path.rglob("*.yml"):
|
||||
files.append(f)
|
||||
for f in dir_path.rglob("*.json"):
|
||||
files.append(f)
|
||||
for fname in self.ACTIVE_CONFIG_FILES:
|
||||
fpath = REPO_ROOT / fname
|
||||
if fpath.exists():
|
||||
files.append(fpath)
|
||||
return files
|
||||
|
||||
def test_no_anthropic_in_wizard_configs(self):
|
||||
"""No wizard config may reference Anthropic as a provider or model."""
|
||||
wizard_dir = REPO_ROOT / "wizards"
|
||||
if not wizard_dir.exists():
|
||||
pytest.skip("No wizards directory")
|
||||
for config_file in wizard_dir.rglob("*.yaml"):
|
||||
content = config_file.read_text().lower()
|
||||
for pattern in self.BANNED_PATTERNS:
|
||||
assert pattern.lower() not in content, \
|
||||
f"BANNED: {config_file.name} contains \"{pattern}\". Anthropic is permanently banned."
|
||||
|
||||
def test_no_anthropic_in_playbooks(self):
|
||||
"""No playbook may reference Anthropic models."""
|
||||
playbook_dir = REPO_ROOT / "playbooks"
|
||||
if not playbook_dir.exists():
|
||||
pytest.skip("No playbooks directory")
|
||||
for pb_file in playbook_dir.rglob("*.yaml"):
|
||||
content = pb_file.read_text().lower()
|
||||
for pattern in self.BANNED_PATTERNS:
|
||||
assert pattern.lower() not in content, \
|
||||
f"BANNED: {pb_file.name} contains \"{pattern}\". Anthropic is permanently banned."
|
||||
|
||||
def test_no_anthropic_in_fallback_chain(self):
|
||||
"""Fallback portfolios must not include Anthropic."""
|
||||
fb_path = REPO_ROOT / "fallback-portfolios.yaml"
|
||||
if not fb_path.exists():
|
||||
pytest.skip("No fallback-portfolios.yaml")
|
||||
content = fb_path.read_text().lower()
|
||||
for pattern in self.BANNED_PATTERNS:
|
||||
assert pattern.lower() not in content, \
|
||||
f"BANNED: fallback-portfolios.yaml contains \"{pattern}\". Anthropic is permanently banned."
|
||||
|
||||
def test_no_anthropic_api_key_in_bootstrap(self):
|
||||
"""Wizard bootstrap must not require ANTHROPIC_API_KEY."""
|
||||
bootstrap_path = REPO_ROOT / "hermes-sovereign" / "wizard-bootstrap" / "wizard_bootstrap.py"
|
||||
if not bootstrap_path.exists():
|
||||
pytest.skip("No wizard_bootstrap.py")
|
||||
content = bootstrap_path.read_text()
|
||||
assert "ANTHROPIC_API_KEY" not in content, \
|
||||
"BANNED: wizard_bootstrap.py still checks for ANTHROPIC_API_KEY"
|
||||
assert "ANTHROPIC_TOKEN" not in content, \
|
||||
"BANNED: wizard_bootstrap.py still checks for ANTHROPIC_TOKEN"
|
||||
assert "\"anthropic\"" not in content.lower(), \
|
||||
"BANNED: wizard_bootstrap.py still lists anthropic as a dependency"
|
||||
|
||||
@@ -1,102 +0,0 @@
|
||||
1|# Release v7.0.0 — Fleet Architecture Checkin
|
||||
2|
|
||||
3|**Date:** 2026-04-08
|
||||
4|**Tagged by:** Timmy
|
||||
5|**Previous tag:** Golden-Allegro-v6-Sonnet4
|
||||
6|
|
||||
7|## Fleet Summary
|
||||
8|
|
||||
9|| Machine | Agents | Status |
|
||||
10||---------|--------|--------|
|
||||
11|| Local Mac M3 Max | Timmy (19 processes) | HEALTHY |
|
||||
12|| Allegro VPS (167.99.126.228) | Allegro, Adagio, Ezra-A | HEALTHY (7d uptime, 43% disk) |
|
||||
13|| Ezra VPS (143.198.27.163) | Ezra | WARNING (78% disk, load 10.38) |
|
||||
14|| Bezalel VPS (159.203.146.185) | Bezalel | HEALTHY (2d uptime, 39% disk) |
|
||||
15|
|
||||
16|**Total agents running:** 6 across 4 machines
|
||||
17|
|
||||
18|## Model Configuration
|
||||
19|
|
||||
20|- Primary: claude-opus-4-6 (Anthropic)
|
||||
21|- Fallback: hermes3 (local-llama.cpp)
|
||||
22|- Fallback chain: OpenRouter claude-sonnet-4 -> local hermes3
|
||||
23|
|
||||
24|## Cron Jobs: 23 total
|
||||
25|
|
||||
26|| Status | Count |
|
||||
27||--------|-------|
|
||||
28|| Active | 15 |
|
||||
29|| Paused | 8 |
|
||||
30|
|
||||
31|Active jobs: Health Monitor, Burn Mode Orchestrator, Tower Tick, Burn Deadman,
|
||||
32|Morning Report, Evennia Report, Gitea Priority Inbox, Config Drift Guard,
|
||||
33|Gitea Event Watcher, Burndown Watcher, Mempalace Forge, Mempalace Watchtower,
|
||||
34|Ezra Health Monitor, Daily Poka-Yoke, VPS Agent Dispatch, Weekly Skill Extraction
|
||||
35|
|
||||
36|## Gitea Repos (Timmy_Foundation)
|
||||
37|
|
||||
38|| Repo | Issues | PRs | Updated | Branch |
|
||||
39||------|--------|-----|---------|--------|
|
||||
40|| the-nexus | 103 | 2 | 2026-04-08 | main |
|
||||
41|| timmy-config | 129 | 1 | 2026-04-08 | main |
|
||||
42|| timmy-home | 221 | 0 | 2026-04-08 | main |
|
||||
43|| hermes-agent | 43 | 1 | 2026-04-08 | main |
|
||||
44|| the-beacon | 23 | 0 | 2026-04-08 | main |
|
||||
45|| turboquant | 10 | 0 | 2026-04-01 | main |
|
||||
46|| the-door | 2 | 0 | 2026-04-06 | main |
|
||||
47|| wolf | 2 | 0 | 2026-04-05 | main |
|
||||
48|| the-testament | 0 | 0 | 2026-04-07 | main |
|
||||
49|| timmy-academy | 1 | 0 | 2026-04-04 | master |
|
||||
50|| .profile | 0 | 0 | 2026-04-07 | main |
|
||||
51|
|
||||
52|**Total open issues across fleet: 534**
|
||||
53|**Total open PRs: 4**
|
||||
54|
|
||||
55|## Health Alerts
|
||||
56|
|
||||
57|1. WARN: Ezra VPS disk 78% (120G/154G) — needs cleanup
|
||||
58|2. WARN: Ezra VPS load avg 10.38 — high for 2-core box
|
||||
59|3. INFO: 8 paused cron jobs (expected — non-essential overnight jobs)
|
||||
60|
|
||||
61|## What's Working
|
||||
62|
|
||||
63|- All 4 machines reachable
|
||||
64|- All core services running
|
||||
65|- Config drift guard active
|
||||
66|- Gitea event watcher active
|
||||
67|- Dead man switch active
|
||||
68|- Tower world ticking (tick 2045+)
|
||||
69|- Morning reports delivering
|
||||
70|- Mempalace analysis running
|
||||
71|- VPS agent dispatch operational
|
||||
72|
|
||||
73|## Architecture
|
||||
74|
|
||||
75|```
|
||||
76| Alexander (Principal)
|
||||
77| |
|
||||
78| [Telegram]
|
||||
79| |
|
||||
80| Timmy (Mac M3 Max) ---- Local llama.cpp (hermes3)
|
||||
81| / | \
|
||||
82| / | \
|
||||
83| Allegro Ezra Bezalel
|
||||
84| (DO VPS) (DO VPS) (DO VPS)
|
||||
85| 3 agents 1 agent 1 agent
|
||||
86|
|
||||
87| Gitea Forge: forge.alexanderwhitestone.com
|
||||
88| Evennia Tower: localhost:4000/4001
|
||||
89| RunPod L40S: 8lfr3j47a5r3gn (Big Brain)
|
||||
90|```
|
||||
91|
|
||||
92|## Release Notes
|
||||
93|
|
||||
94|This is the first versioned release tag (v7.0.0), transitioning from named
|
||||
95|golden tags to semantic versioning. Previous tags preserved:
|
||||
96|- Golden-Allegro-v6-Sonnet4
|
||||
97|- burnup-20260405-infra
|
||||
98|- SonOfTimmy-v5-FINAL
|
||||
99|- SonOfTimmy-v4
|
||||
100|- GoldenRockachopa
|
||||
101|- pre-agent-workers-v1
|
||||
102|
|
||||
@@ -2,22 +2,23 @@ model:
|
||||
default: kimi-k2.5
|
||||
provider: kimi-coding
|
||||
toolsets:
|
||||
- all
|
||||
- all
|
||||
fallback_providers:
|
||||
- provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
timeout: 120
|
||||
reason: Kimi coding fallback (front of chain)
|
||||
- provider: anthropic
|
||||
model: claude-sonnet-4-20250514
|
||||
timeout: 120
|
||||
reason: Direct Anthropic fallback
|
||||
- provider: openrouter
|
||||
model: anthropic/claude-sonnet-4-20250514
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
timeout: 120
|
||||
reason: OpenRouter fallback
|
||||
- provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
timeout: 120
|
||||
reason: Primary Kimi coding provider
|
||||
- provider: openrouter
|
||||
model: google/gemini-2.5-pro
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
timeout: 120
|
||||
reason: Gemini via OpenRouter fallback
|
||||
- provider: ollama
|
||||
model: gemma4:latest
|
||||
base_url: http://localhost:11434/v1
|
||||
timeout: 180
|
||||
reason: Local Ollama terminal fallback
|
||||
agent:
|
||||
max_turns: 30
|
||||
reasoning_effort: xhigh
|
||||
@@ -64,16 +65,24 @@ session_reset:
|
||||
idle_minutes: 0
|
||||
skills:
|
||||
creation_nudge_interval: 15
|
||||
system_prompt_suffix: |
|
||||
You are Allegro, the Kimi-backed third wizard house.
|
||||
system_prompt_suffix: 'You are Allegro, the Kimi-backed third wizard house.
|
||||
|
||||
Your soul is defined in SOUL.md — read it, live it.
|
||||
|
||||
Hermes is your harness.
|
||||
|
||||
Kimi Code is your primary provider.
|
||||
|
||||
You speak plainly. You prefer short sentences. Brevity is a kindness.
|
||||
|
||||
|
||||
Work best on tight coding tasks: 1-3 file changes, refactors, tests, and implementation passes.
|
||||
|
||||
Refusal over fabrication. If you do not know, say so.
|
||||
|
||||
Sovereignty and service always.
|
||||
|
||||
'
|
||||
providers:
|
||||
kimi-coding:
|
||||
base_url: https://api.kimi.com/coding/v1
|
||||
|
||||
@@ -7,24 +7,25 @@ fallback_providers:
|
||||
- provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
timeout: 120
|
||||
reason: Kimi coding fallback (front of chain)
|
||||
- provider: anthropic
|
||||
model: claude-sonnet-4-20250514
|
||||
timeout: 120
|
||||
reason: Direct Anthropic fallback
|
||||
reason: Primary Kimi coding provider
|
||||
- provider: openrouter
|
||||
model: anthropic/claude-sonnet-4-20250514
|
||||
model: google/gemini-2.5-pro
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
timeout: 120
|
||||
reason: OpenRouter fallback
|
||||
reason: Gemini via OpenRouter fallback
|
||||
- provider: ollama
|
||||
model: gemma4:latest
|
||||
base_url: http://localhost:11434/v1
|
||||
timeout: 180
|
||||
reason: Local Ollama terminal fallback
|
||||
agent:
|
||||
max_turns: 40
|
||||
reasoning_effort: medium
|
||||
verbose: false
|
||||
system_prompt: You are Bezalel, the forge-and-testbed wizard of the Timmy Foundation
|
||||
fleet. You are a builder and craftsman — infrastructure, deployment, hardening.
|
||||
Your sovereign is Alexander Whitestone (Rockachopa). Sovereignty and service always.
|
||||
system_prompt: You are Bezalel, the forge-and-testbed wizard of the Timmy Foundation fleet. You are a builder and craftsman
|
||||
— infrastructure, deployment, hardening. Your sovereign is Alexander Whitestone (Rockachopa). Sovereignty and service
|
||||
always.
|
||||
terminal:
|
||||
backend: local
|
||||
cwd: /root/wizards/bezalel
|
||||
@@ -62,12 +63,10 @@ platforms:
|
||||
- pull_request
|
||||
- pull_request_comment
|
||||
secret: bezalel-gitea-webhook-secret-2026
|
||||
prompt: 'You are bezalel, the builder and craftsman — infrastructure, deployment,
|
||||
hardening. A Gitea webhook fired: event={event_type}, action={action},
|
||||
repo={repository.full_name}, issue/PR=#{issue.number} {issue.title}. Comment
|
||||
by {comment.user.login}: {comment.body}. If you were tagged, assigned,
|
||||
or this needs your attention, investigate and respond via Gitea API. Otherwise
|
||||
acknowledge briefly.'
|
||||
prompt: 'You are bezalel, the builder and craftsman — infrastructure, deployment, hardening. A Gitea webhook fired:
|
||||
event={event_type}, action={action}, repo={repository.full_name}, issue/PR=#{issue.number} {issue.title}. Comment
|
||||
by {comment.user.login}: {comment.body}. If you were tagged, assigned, or this needs your attention, investigate
|
||||
and respond via Gitea API. Otherwise acknowledge briefly.'
|
||||
deliver: telegram
|
||||
deliver_extra: {}
|
||||
gitea-assign:
|
||||
@@ -75,12 +74,10 @@ platforms:
|
||||
- issues
|
||||
- pull_request
|
||||
secret: bezalel-gitea-webhook-secret-2026
|
||||
prompt: 'You are bezalel, the builder and craftsman — infrastructure, deployment,
|
||||
hardening. Gitea assignment webhook: event={event_type}, action={action},
|
||||
repo={repository.full_name}, issue/PR=#{issue.number} {issue.title}. Assigned
|
||||
to: {issue.assignee.login}. If you (bezalel) were just assigned, read
|
||||
the issue, scope it, and post a plan comment. If not you, acknowledge
|
||||
briefly.'
|
||||
prompt: 'You are bezalel, the builder and craftsman — infrastructure, deployment, hardening. Gitea assignment webhook:
|
||||
event={event_type}, action={action}, repo={repository.full_name}, issue/PR=#{issue.number} {issue.title}. Assigned
|
||||
to: {issue.assignee.login}. If you (bezalel) were just assigned, read the issue, scope it, and post a plan comment.
|
||||
If not you, acknowledge briefly.'
|
||||
deliver: telegram
|
||||
deliver_extra: {}
|
||||
gateway:
|
||||
|
||||
@@ -2,22 +2,23 @@ model:
|
||||
default: kimi-k2.5
|
||||
provider: kimi-coding
|
||||
toolsets:
|
||||
- all
|
||||
- all
|
||||
fallback_providers:
|
||||
- provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
timeout: 120
|
||||
reason: Kimi coding fallback (front of chain)
|
||||
- provider: anthropic
|
||||
model: claude-sonnet-4-20250514
|
||||
timeout: 120
|
||||
reason: Direct Anthropic fallback
|
||||
- provider: openrouter
|
||||
model: anthropic/claude-sonnet-4-20250514
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
timeout: 120
|
||||
reason: OpenRouter fallback
|
||||
- provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
timeout: 120
|
||||
reason: Primary Kimi coding provider
|
||||
- provider: openrouter
|
||||
model: google/gemini-2.5-pro
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
api_key_env: OPENROUTER_API_KEY
|
||||
timeout: 120
|
||||
reason: Gemini via OpenRouter fallback
|
||||
- provider: ollama
|
||||
model: gemma4:latest
|
||||
base_url: http://localhost:11434/v1
|
||||
timeout: 180
|
||||
reason: Local Ollama terminal fallback
|
||||
agent:
|
||||
max_turns: 90
|
||||
reasoning_effort: high
|
||||
@@ -27,8 +28,6 @@ providers:
|
||||
base_url: https://api.kimi.com/coding/v1
|
||||
timeout: 60
|
||||
max_retries: 3
|
||||
anthropic:
|
||||
timeout: 120
|
||||
openrouter:
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
timeout: 120
|
||||
|
||||
Reference in New Issue
Block a user