feat: add config validator script

scripts/config_validator.py — standalone validator for all YAML/JSON config files in the repo. Checks: - YAML syntax (pyyaml safe_load) - JSON syntax (json.loads) - Duplicate keys in YAML/JSON - Trailing whitespace - Tabs in YAML (should use spaces) - Cron expression validity (if present) Reports PASS/FAIL per file with line numbers. Exit 0 if all valid, 1 if any invalid.
Merge pull request #424
2026-04-10 07:13:17 -04:00 · 2026-04-10 09:37:46 +00:00 · 2026-04-10 09:37:44 +00:00 · 2026-04-10 09:37:40 +00:00 · 2026-04-10 09:37:38 +00:00 · 2026-04-10 09:36:29 +00:00
36 changed files with 2754 additions and 77 deletions
--- a/.gitea/PULL_REQUEST_TEMPLATE.md
+++ b/.gitea/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,49 @@
 ## Summary
 <!-- What changed and why. One paragraph max. -->
 ## Linked Issue
 <!-- REQUIRED. Every PR must reference at least one issue. Max 3 issues per PR. -->
 <!-- Closes #ISSUENUM -->
 <!-- Refs #ISSUENUM -->
 ## Acceptance Criteria
 <!-- What specific outcomes does this PR deliver? Check each when proven. -->
 - [ ] Criterion 1
 - [ ] Criterion 2
 ## Proof
 ### What was tested
 <!-- Paste the exact commands, output, log paths, or world-state artifacts that prove the acceptance criteria were met. -->
 <!-- No proof = no merge. See CONTRIBUTING.md for the full standard. -->
 ```
 $ <command you ran>
 <relevant output>
 ```
 ### Visual proof (if applicable)
 <!-- For skin updates, UI changes, dashboard changes: attach screenshot to the PR discussion. -->
 <!-- Name what the screenshot proves. Do not commit binary media unless explicitly required. -->
 ## Risk and Rollback
 <!-- What could go wrong? How do we undo it? -->
 - **Risk level:** low / medium / high
 - **What breaks if this is wrong:**
 - **How to rollback:**
 ## Checklist
 - [ ] Proof meets CONTRIBUTING.md standard (exact commands, output, or artifacts)
 - [ ] Python files pass syntax check (`python -c "import ast; ast.parse(open('file.py').read())"`)
 - [ ] Shell scripts are executable (`chmod +x`)
 - [ ] Branch is up-to-date with base
 - [ ] No more than 3 unrelated issues bundled in this PR
--- a/.gitea/workflows/architecture-lint.yml
+++ b/.gitea/workflows/architecture-lint.yml
@@ -0,0 +1,41 @@
 # architecture-lint.yml — CI gate for the Architecture Linter v2
 # Refs: #437 — repo-aware, test-backed, CI-enforced.
 #
 # Runs on every PR to main.  Validates Python syntax, then runs
 # linter tests and finally lints the repo itself.
 name: Architecture Lint
 on:
  pull_request:
    branches: [main, master]
  push:
    branches: [main]
 jobs:
  linter-tests:
    name: Linter Tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install test deps
        run: pip install pytest
      - name: Compile-check linter
        run: python3 -m py_compile scripts/architecture_linter_v2.py
      - name: Run linter tests
        run: python3 -m pytest tests/test_linter.py -v
  lint-repo:
    name: Lint Repository
    runs-on: ubuntu-latest
    needs: linter-tests
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Run architecture linter
        run: python3 scripts/architecture_linter_v2.py .
--- a/ansible/BANNED_PROVIDERS.yml
+++ b/ansible/BANNED_PROVIDERS.yml
@@ -0,0 +1,47 @@
 # =============================================================================
 # BANNED PROVIDERS — The Timmy Foundation
 # =============================================================================
 # "Anthropic is not only fired, but banned. I don't want these errors
 # cropping up." — Alexander, 2026-04-09
 #
 # This is a HARD BAN. Not deprecated. Not fallback. BANNED.
 # Enforcement: pre-commit hook, linter, Ansible validation, CI tests.
 # =============================================================================
 banned_providers:
  - name: anthropic
    reason: "Permanently banned. SDK access gated despite active quota. Fleet was bricked because golden state pointed to Anthropic Sonnet."
    banned_date: "2026-04-09"
    enforcement: strict  # Ansible playbook FAILS if detected
    models:
      - "claude-sonnet-*"
      - "claude-opus-*"
      - "claude-haiku-*"
      - "claude-*"
    endpoints:
      - "api.anthropic.com"
      - "anthropic/*"  # OpenRouter pattern
    api_keys:
      - "ANTHROPIC_API_KEY"
      - "CLAUDE_API_KEY"
 # Golden state alternative:
 approved_providers:
  - name: kimi-coding
    model: kimi-k2.5
    role: primary
  - name: openrouter
    model: google/gemini-2.5-pro
    role: fallback
  - name: ollama
    model: "gemma4:latest"
    role: terminal_fallback
 # Future evaluation:
 evaluation_candidates:
  - name: mimo-v2-pro
    status: pending
    notes: "Free via Nous Portal for ~2 weeks from 2026-04-07. Add after fallback chain is fixed."
  - name: hermes-4
    status: available
    notes: "Free on Nous Portal. 36B and 70B variants. Home team model."
--- a/ansible/README.md
+++ b/ansible/README.md
@@ -0,0 +1,95 @@
 # Ansible IaC — The Timmy Foundation Fleet
 > One canonical Ansible playbook defines: deadman switch, cron schedule,
 > golden state rollback, agent startup sequence.
 > — KT Final Session 2026-04-08, Priority TWO
 ## Purpose
 This directory contains the **single source of truth** for fleet infrastructure.
 No more ad-hoc recovery implementations. No more overlapping deadman switches.
 No more agents mutating their own configs into oblivion.
 **Everything** goes through Ansible. If it's not in a playbook, it doesn't exist.
 ## Architecture
 ```
 ┌─────────────────────────────────────────────────┐
 │                  Gitea (Source of Truth)          │
 │  timmy-config/ansible/                           │
 │    ├── inventory/hosts.yml    (fleet machines)    │
 │    ├── playbooks/site.yml     (master playbook)   │
 │    ├── roles/                 (reusable roles)    │
 │    └── group_vars/wizards.yml (golden state)      │
 └──────────────────┬──────────────────────────────┘
                   │  PR merge triggers webhook
                   ▼
 ┌─────────────────────────────────────────────────┐
 │              Gitea Webhook Handler                │
 │  scripts/deploy_on_webhook.sh                     │
 │  → ansible-pull on each target machine            │
 └──────────────────┬──────────────────────────────┘
                   │  ansible-pull
                   ▼
 ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
 │  Timmy   │  │ Allegro  │  │ Bezalel  │  │  Ezra    │
 │  (Mac)   │  │  (VPS)   │  │  (VPS)   │  │  (VPS)   │
 │          │  │          │  │          │  │          │
 │ deadman  │  │ deadman  │  │ deadman  │  │ deadman  │
 │ cron     │  │ cron     │  │ cron     │  │ cron     │
 │ golden   │  │ golden   │  │ golden   │  │ golden   │
 │ req_log  │  │ req_log  │  │ req_log  │  │ req_log  │
 └──────────┘  └──────────┘  └──────────┘  └──────────┘
 ```
 ## Quick Start
 ```bash
 # Deploy everything to all machines
 ansible-playbook -i inventory/hosts.yml playbooks/site.yml
 # Deploy only golden state config
 ansible-playbook -i inventory/hosts.yml playbooks/golden_state.yml
 # Deploy only to a specific wizard
 ansible-playbook -i inventory/hosts.yml playbooks/site.yml --limit bezalel
 # Dry run (check mode)
 ansible-playbook -i inventory/hosts.yml playbooks/site.yml --check --diff
 ```
 ## Golden State Provider Chain
 All wizard configs converge on this provider chain. **Anthropic is BANNED.**
 | Priority | Provider             | Model            | Endpoint                          |
 | -------- | -------------------- | ---------------- | --------------------------------- |
 | 1        | Kimi                 | kimi-k2.5        | https://api.kimi.com/coding/v1    |
 | 2        | Gemini (OpenRouter)  | gemini-2.5-pro   | https://openrouter.ai/api/v1      |
 | 3        | Ollama (local)       | gemma4:latest    | http://localhost:11434/v1         |
 ## Roles
 | Role             | Purpose                                                      |
 | ---------------- | ------------------------------------------------------------ |
 | `wizard_base`    | Common wizard setup: directories, thin config, git pull      |
 | `deadman_switch` | Health check → snapshot good config → rollback on death      |
 | `golden_state`   | Deploy and enforce golden state provider chain               |
 | `request_log`    | SQLite telemetry table for every inference call               |
 | `cron_manager`   | Source-controlled cron jobs — no manual crontab edits         |
 ## Rules
 1. **No manual changes.** If it's not in a playbook, it will be overwritten.
 2. **No Anthropic.** Banned. Enforcement is automated. See `BANNED_PROVIDERS.yml`.
 3. **Idempotent.** Every playbook can run 100 times with the same result.
 4. **PR required.** Config changes go through Gitea PR review, then deploy.
 5. **One identity per machine.** No duplicate agents. Fleet audit enforces this.
 ## Related Issues
 - timmy-config #442: [P2] Ansible IaC Canonical Playbook
 - timmy-config #444: Wire Deadman Switch ACTION
 - timmy-config #443: Thin Config Pattern
 - timmy-config #446: request_log Telemetry Table
--- a/ansible/ansible.cfg
+++ b/ansible/ansible.cfg
@@ -0,0 +1,21 @@
 [defaults]
 inventory = inventory/hosts.yml
 roles_path = roles
 host_key_checking = False
 retry_files_enabled = False
 stdout_callback = yaml
 forks = 10
 timeout = 30
 # Logging
 log_path = /var/log/ansible/timmy-fleet.log
 [privilege_escalation]
 become = True
 become_method = sudo
 become_user = root
 become_ask_pass = False
 [ssh_connection]
 pipelining = True
 ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no
--- a/ansible/inventory/group_vars/wizards.yml
+++ b/ansible/inventory/group_vars/wizards.yml
@@ -0,0 +1,74 @@
 # =============================================================================
 # Wizard Group Variables — Golden State Configuration
 # =============================================================================
 # These variables are applied to ALL wizards in the fleet.
 # This IS the golden state. If a wizard deviates, Ansible corrects it.
 # =============================================================================
 # --- Deadman Switch ---
 deadman_enabled: true
 deadman_check_interval: 300    # 5 minutes between health checks
 deadman_snapshot_dir: "~/.local/timmy/snapshots"
 deadman_max_snapshots: 10      # Rolling window of good configs
 deadman_restart_cooldown: 60   # Seconds to wait before restart after failure
 deadman_max_restart_attempts: 3
 deadman_escalation_channel: telegram  # Alert Alexander after max attempts
 # --- Thin Config ---
 thin_config_path: "~/.timmy/thin_config.yml"
 thin_config_mode: "0444"       # Read-only — agents CANNOT modify
 upstream_repo: "https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config.git"
 upstream_branch: main
 config_pull_on_wake: true
 config_validation_enabled: true
 # --- Agent Settings ---
 agent_max_turns: 30
 agent_reasoning_effort: high
 agent_verbose: false
 agent_approval_mode: auto
 # --- Hermes Harness ---
 hermes_config_dir: "{{ hermes_home }}"
 hermes_bin_dir: "{{ hermes_home }}/bin"
 hermes_skins_dir: "{{ hermes_home }}/skins"
 hermes_playbooks_dir: "{{ hermes_home }}/playbooks"
 hermes_memories_dir: "{{ hermes_home }}/memories"
 # --- Request Log (Telemetry) ---
 request_log_enabled: true
 request_log_path: "~/.local/timmy/request_log.db"
 request_log_rotation_days: 30  # Archive logs older than 30 days
 request_log_sync_to_gitea: false  # Future: push telemetry summaries to Gitea
 # --- Cron Schedule ---
 # All cron jobs are managed here. No manual crontab edits.
 cron_jobs:
  - name: "Deadman health check"
    job: "cd {{ wizard_home }}/workspace/timmy-config && python3 fleet/health_check.py"
    minute: "*/5"
    hour: "*"
    enabled: "{{ deadman_enabled }}"
  - name: "Muda audit"
    job: "cd {{ wizard_home }}/workspace/timmy-config && bash fleet/muda-audit.sh >> /tmp/muda-audit.log 2>&1"
    minute: "0"
    hour: "21"
    weekday: "0"
    enabled: true
  - name: "Config pull from upstream"
    job: "cd {{ wizard_home }}/workspace/timmy-config && git pull --ff-only origin main"
    minute: "*/15"
    hour: "*"
    enabled: "{{ config_pull_on_wake }}"
  - name: "Request log rotation"
    job: "python3 -c \"import sqlite3,datetime; db=sqlite3.connect('{{ request_log_path }}'); db.execute('DELETE FROM request_log WHERE timestamp < datetime(\\\"now\\\", \\\"-{{ request_log_rotation_days }} days\\\")'); db.commit()\""
    minute: "0"
    hour: "3"
    enabled: "{{ request_log_enabled }}"
 # --- Provider Enforcement ---
 # These are validated on every Ansible run. Any Anthropic reference = failure.
 provider_ban_enforcement: strict  # strict = fail playbook, warn = log only
--- a/ansible/inventory/hosts.yml
+++ b/ansible/inventory/hosts.yml
@@ -0,0 +1,119 @@
 # =============================================================================
 # Fleet Inventory — The Timmy Foundation
 # =============================================================================
 # Source of truth for all machines in the fleet.
 # Update this file when machines are added/removed.
 # All changes go through PR review.
 # =============================================================================
 all:
  children:
    wizards:
      hosts:
        timmy:
          ansible_host: localhost
          ansible_connection: local
          wizard_name: Timmy
          wizard_role: "Primary wizard — soul of the fleet"
          wizard_provider_primary: kimi-coding
          wizard_model_primary: kimi-k2.5
          hermes_port: 8081
          api_port: 8645
          wizard_home: "{{ ansible_env.HOME }}/wizards/timmy"
          hermes_home: "{{ ansible_env.HOME }}/.hermes"
          machine_type: mac
          # Timmy runs on Alexander's M3 Max
          ollama_available: true
        allegro:
          ansible_host: 167.99.126.228
          ansible_user: root
          wizard_name: Allegro
          wizard_role: "Kimi-backed third wizard house — tight coding tasks"
          wizard_provider_primary: kimi-coding
          wizard_model_primary: kimi-k2.5
          hermes_port: 8081
          api_port: 8645
          wizard_home: /root/wizards/allegro
          hermes_home: /root/.hermes
          machine_type: vps
          ollama_available: false
        bezalel:
          ansible_host: 159.203.146.185
          ansible_user: root
          wizard_name: Bezalel
          wizard_role: "Forge-and-testbed wizard — infrastructure, deployment, hardening"
          wizard_provider_primary: kimi-coding
          wizard_model_primary: kimi-k2.5
          hermes_port: 8081
          api_port: 8656
          wizard_home: /root/wizards/bezalel
          hermes_home: /root/.hermes
          machine_type: vps
          ollama_available: false
          # NOTE: The awake Bezalel may be the duplicate.
          # Fleet audit (the-nexus #1144) will resolve identity.
        ezra:
          ansible_host: 143.198.27.163
          ansible_user: root
          wizard_name: Ezra
          wizard_role: "Infrastructure wizard — Gitea, nginx, hosting"
          wizard_provider_primary: kimi-coding
          wizard_model_primary: kimi-k2.5
          hermes_port: 8081
          api_port: 8645
          wizard_home: /root/wizards/ezra
          hermes_home: /root/.hermes
          machine_type: vps
          ollama_available: false
          # NOTE: Currently DOWN — Telegram key revoked, awaiting propagation.
    # Infrastructure hosts (not wizards, but managed by Ansible)
    infrastructure:
      hosts:
        forge:
          ansible_host: 143.198.27.163
          ansible_user: root
          # Gitea runs on the same box as Ezra
          gitea_url: https://forge.alexanderwhitestone.com
          gitea_org: Timmy_Foundation
  vars:
    # Global variables applied to all hosts
    gitea_repo_url: "https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config.git"
    gitea_branch: main
    config_base_path: "{{ gitea_repo_url }}"
    timmy_log_dir: "~/.local/timmy/fleet-health"
    request_log_db: "~/.local/timmy/request_log.db"
    # Golden state provider chain — Anthropic is BANNED
    golden_state_providers:
      - name: kimi-coding
        model: kimi-k2.5
        base_url: "https://api.kimi.com/coding/v1"
        timeout: 120
        reason: "Primary — Kimi K2.5 (best value, least friction)"
      - name: openrouter
        model: google/gemini-2.5-pro
        base_url: "https://openrouter.ai/api/v1"
        api_key_env: OPENROUTER_API_KEY
        timeout: 120
        reason: "Fallback — Gemini 2.5 Pro via OpenRouter"
      - name: ollama
        model: "gemma4:latest"
        base_url: "http://localhost:11434/v1"
        timeout: 180
        reason: "Terminal fallback — local Ollama (sovereign, no API needed)"
    # Banned providers — hard enforcement
    banned_providers:
      - anthropic
      - claude
    banned_models_patterns:
      - "claude-*"
      - "anthropic/*"
      - "*sonnet*"
      - "*opus*"
      - "*haiku*"
--- a/ansible/playbooks/agent_startup.yml
+++ b/ansible/playbooks/agent_startup.yml
@@ -0,0 +1,98 @@
 ---
 # =============================================================================
 # agent_startup.yml — Resurrect Wizards from Checked-in Configs
 # =============================================================================
 # Brings wizards back online using golden state configs.
 # Order: pull config → validate → start agent → verify with request_log
 # =============================================================================
 - name: "Agent Startup Sequence"
  hosts: wizards
  become: true
  serial: 1  # One wizard at a time to avoid cascading issues
  tasks:
    - name: "Pull latest config from upstream"
      git:
        repo: "{{ upstream_repo }}"
        dest: "{{ wizard_home }}/workspace/timmy-config"
        version: "{{ upstream_branch }}"
        force: true
      tags: [pull]
    - name: "Deploy golden state config"
      include_role:
        name: golden_state
      tags: [config]
    - name: "Validate config — no banned providers"
      shell: |
        python3 -c "
        import yaml, sys
        with open('{{ wizard_home }}/config.yaml') as f:
            cfg = yaml.safe_load(f)
        banned = {{ banned_providers }}
        for p in cfg.get('fallback_providers', []):
            if p.get('provider', '') in banned:
                print(f'BANNED: {p[\"provider\"]}', file=sys.stderr)
                sys.exit(1)
        model = cfg.get('model', {}).get('provider', '')
        if model in banned:
            print(f'BANNED default provider: {model}', file=sys.stderr)
            sys.exit(1)
        print('Config validated — no banned providers.')
        "
      register: config_valid
      tags: [validate]
    - name: "Ensure hermes-agent service is running"
      systemd:
        name: "hermes-{{ wizard_name | lower }}"
        state: started
        enabled: true
      when: machine_type == 'vps'
      tags: [start]
      ignore_errors: true  # Service may not exist yet on all machines
    - name: "Start hermes agent (Mac — launchctl)"
      shell: |
        launchctl kickstart -k "ai.hermes.{{ wizard_name | lower }}" 2>/dev/null || \
        cd {{ wizard_home }} && hermes agent start --daemon 2>&1 | tail -5
      when: machine_type == 'mac'
      tags: [start]
      ignore_errors: true
    - name: "Wait for agent to come online"
      wait_for:
        host: 127.0.0.1
        port: "{{ api_port }}"
        timeout: 60
        state: started
      tags: [verify]
      ignore_errors: true
    - name: "Verify agent is alive — check request_log for activity"
      shell: |
        sleep 10
        python3 -c "
        import sqlite3, sys
        db = sqlite3.connect('{{ request_log_path }}')
        cursor = db.execute('''
            SELECT COUNT(*) FROM request_log
            WHERE agent_name = '{{ wizard_name }}'
            AND timestamp > datetime('now', '-5 minutes')
        ''')
        count = cursor.fetchone()[0]
        if count > 0:
            print(f'{{ wizard_name }} is alive — {count} recent inference calls logged.')
        else:
            print(f'WARNING: {{ wizard_name }} started but no telemetry yet.')
        "
      register: agent_status
      tags: [verify]
      ignore_errors: true
    - name: "Report startup status"
      debug:
        msg: "{{ wizard_name }}: {{ agent_status.stdout | default('startup attempted') }}"
      tags: [always]
--- a/ansible/playbooks/cron_schedule.yml
+++ b/ansible/playbooks/cron_schedule.yml
@@ -0,0 +1,15 @@
 ---
 # =============================================================================
 # cron_schedule.yml — Source-Controlled Cron Jobs
 # =============================================================================
 # All cron jobs are defined in group_vars/wizards.yml.
 # This playbook deploys them. No manual crontab edits allowed.
 # =============================================================================
 - name: "Deploy Cron Schedule"
  hosts: wizards
  become: true
  roles:
    - role: cron_manager
      tags: [cron, schedule]
--- a/ansible/playbooks/deadman_switch.yml
+++ b/ansible/playbooks/deadman_switch.yml
@@ -0,0 +1,17 @@
 ---
 # =============================================================================
 # deadman_switch.yml — Deploy Deadman Switch to All Wizards
 # =============================================================================
 # The deadman watch already fires and detects dead agents.
 # This playbook wires the ACTION:
 #   - On healthy check: snapshot current config as "last known good"
 #   - On failed check: rollback config to snapshot, restart agent
 # =============================================================================
 - name: "Deploy Deadman Switch ACTION"
  hosts: wizards
  become: true
  roles:
    - role: deadman_switch
      tags: [deadman, recovery]
--- a/ansible/playbooks/golden_state.yml
+++ b/ansible/playbooks/golden_state.yml
@@ -0,0 +1,30 @@
 ---
 # =============================================================================
 # golden_state.yml — Deploy Golden State Config to All Wizards
 # =============================================================================
 # Enforces the golden state provider chain across the fleet.
 # Removes any Anthropic references. Deploys the approved provider chain.
 # =============================================================================
 - name: "Deploy Golden State Configuration"
  hosts: wizards
  become: true
  roles:
    - role: golden_state
      tags: [golden, config]
  post_tasks:
    - name: "Verify golden state — no banned providers"
      shell: |
        grep -rci 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' \
          {{ hermes_home }}/config.yaml \
          {{ wizard_home }}/config.yaml 2>/dev/null || echo "0"
      register: banned_count
      changed_when: false
    - name: "Report golden state status"
      debug:
        msg: >
          {{ wizard_name }} golden state: {{ golden_state_providers | map(attribute='name') | list | join(' → ') }}.
          Banned provider references: {{ banned_count.stdout | trim }}.
--- a/ansible/playbooks/request_log.yml
+++ b/ansible/playbooks/request_log.yml
@@ -0,0 +1,15 @@
 ---
 # =============================================================================
 # request_log.yml — Deploy Telemetry Table
 # =============================================================================
 # Creates the request_log SQLite table on all machines.
 # Every inference call writes a row. No exceptions. No summarizing.
 # =============================================================================
 - name: "Deploy Request Log Telemetry"
  hosts: wizards
  become: true
  roles:
    - role: request_log
      tags: [telemetry, logging]
--- a/ansible/playbooks/site.yml
+++ b/ansible/playbooks/site.yml
@@ -0,0 +1,72 @@
 ---
 # =============================================================================
 # site.yml — Master Playbook for the Timmy Foundation Fleet
 # =============================================================================
 # This is the ONE playbook that defines the entire fleet state.
 # Run this and every machine converges to golden state.
 #
 # Usage:
 #   ansible-playbook -i inventory/hosts.yml playbooks/site.yml
 #   ansible-playbook -i inventory/hosts.yml playbooks/site.yml --limit bezalel
 #   ansible-playbook -i inventory/hosts.yml playbooks/site.yml --check --diff
 # =============================================================================
 - name: "Timmy Foundation Fleet — Full Convergence"
  hosts: wizards
  become: true
  pre_tasks:
    - name: "Validate no banned providers in golden state"
      assert:
        that:
          - "item.name not in banned_providers"
        fail_msg: "BANNED PROVIDER DETECTED: {{ item.name }} — Anthropic is permanently banned."
        quiet: true
      loop: "{{ golden_state_providers }}"
      tags: [always]
    - name: "Display target wizard"
      debug:
        msg: "Deploying to {{ wizard_name }} ({{ wizard_role }}) on {{ ansible_host }}"
      tags: [always]
  roles:
    - role: wizard_base
      tags: [base, setup]
    - role: golden_state
      tags: [golden, config]
    - role: deadman_switch
      tags: [deadman, recovery]
    - role: request_log
      tags: [telemetry, logging]
    - role: cron_manager
      tags: [cron, schedule]
  post_tasks:
    - name: "Final validation — scan for banned providers"
      shell: |
        grep -ri 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' \
          {{ hermes_home }}/config.yaml \
          {{ wizard_home }}/config.yaml \
          {{ thin_config_path }} 2>/dev/null || true
      register: banned_scan
      changed_when: false
      tags: [validation]
    - name: "FAIL if banned providers found in deployed config"
      fail:
        msg: |
          BANNED PROVIDER DETECTED IN DEPLOYED CONFIG:
          {{ banned_scan.stdout }}
          Anthropic is permanently banned. Fix the config and re-deploy.
      when: banned_scan.stdout | length > 0
      tags: [validation]
    - name: "Deployment complete"
      debug:
        msg: "{{ wizard_name }} converged to golden state. Provider chain: {{ golden_state_providers | map(attribute='name') | list | join(' → ') }}"
      tags: [always]
--- a/ansible/roles/cron_manager/tasks/main.yml
+++ b/ansible/roles/cron_manager/tasks/main.yml
@@ -0,0 +1,55 @@
 ---
 # =============================================================================
 # cron_manager/tasks — Source-Controlled Cron Jobs
 # =============================================================================
 # All cron jobs are defined in group_vars/wizards.yml.
 # No manual crontab edits. This is the only way to manage cron.
 # =============================================================================
 - name: "Deploy managed cron jobs"
  cron:
    name: "{{ item.name }}"
    job: "{{ item.job }}"
    minute: "{{ item.minute | default('*') }}"
    hour: "{{ item.hour | default('*') }}"
    day: "{{ item.day | default('*') }}"
    month: "{{ item.month | default('*') }}"
    weekday: "{{ item.weekday | default('*') }}"
    state: "{{ 'present' if item.enabled else 'absent' }}"
    user: "{{ ansible_user | default('root') }}"
  loop: "{{ cron_jobs }}"
  when: cron_jobs is defined
 - name: "Deploy deadman switch cron (fallback if systemd timer unavailable)"
  cron:
    name: "Deadman switch — {{ wizard_name }}"
    job: "{{ wizard_home }}/deadman_action.sh >> {{ timmy_log_dir }}/deadman-{{ wizard_name }}.log 2>&1"
    minute: "*/5"
    hour: "*"
    state: present
    user: "{{ ansible_user | default('root') }}"
  when: deadman_enabled and machine_type != 'vps'
  # VPS machines use systemd timers instead
 - name: "Remove legacy cron jobs (cleanup)"
  cron:
    name: "{{ item }}"
    state: absent
    user: "{{ ansible_user | default('root') }}"
  loop:
    - "legacy-deadman-watch"
    - "old-health-check"
    - "backup-deadman"
  ignore_errors: true
 - name: "List active cron jobs"
  shell: "crontab -l 2>/dev/null | grep -v '^#' | grep -v '^$' || echo 'No cron jobs found.'"
  register: active_crons
  changed_when: false
 - name: "Report cron status"
  debug:
    msg: |
      {{ wizard_name }} cron jobs deployed.
      Active:
      {{ active_crons.stdout }}
--- a/ansible/roles/deadman_switch/tasks/main.yml
+++ b/ansible/roles/deadman_switch/tasks/main.yml
@@ -0,0 +1,70 @@
 ---
 # =============================================================================
 # deadman_switch/tasks — Wire the Deadman Switch ACTION
 # =============================================================================
 # The watch fires. This makes it DO something:
 #   - On healthy check: snapshot current config as "last known good"
 #   - On failed check: rollback to last known good, restart agent
 # =============================================================================
 - name: "Create snapshot directory"
  file:
    path: "{{ deadman_snapshot_dir }}"
    state: directory
    mode: "0755"
 - name: "Deploy deadman switch script"
  template:
    src: deadman_action.sh.j2
    dest: "{{ wizard_home }}/deadman_action.sh"
    mode: "0755"
 - name: "Deploy deadman systemd service"
  template:
    src: deadman_switch.service.j2
    dest: "/etc/systemd/system/deadman-{{ wizard_name | lower }}.service"
    mode: "0644"
  when: machine_type == 'vps'
  notify: "Enable deadman service"
 - name: "Deploy deadman systemd timer"
  template:
    src: deadman_switch.timer.j2
    dest: "/etc/systemd/system/deadman-{{ wizard_name | lower }}.timer"
    mode: "0644"
  when: machine_type == 'vps'
  notify: "Enable deadman timer"
 - name: "Deploy deadman launchd plist (Mac)"
  template:
    src: deadman_switch.plist.j2
    dest: "{{ ansible_env.HOME }}/Library/LaunchAgents/com.timmy.deadman.{{ wizard_name | lower }}.plist"
    mode: "0644"
  when: machine_type == 'mac'
  notify: "Load deadman plist"
 - name: "Take initial config snapshot"
  copy:
    src: "{{ wizard_home }}/config.yaml"
    dest: "{{ deadman_snapshot_dir }}/config.yaml.known_good"
    remote_src: true
    mode: "0444"
  ignore_errors: true
 handlers:
  - name: "Enable deadman service"
    systemd:
      name: "deadman-{{ wizard_name | lower }}.service"
      daemon_reload: true
      enabled: true
  - name: "Enable deadman timer"
    systemd:
      name: "deadman-{{ wizard_name | lower }}.timer"
      daemon_reload: true
      enabled: true
      state: started
  - name: "Load deadman plist"
    shell: "launchctl load {{ ansible_env.HOME }}/Library/LaunchAgents/com.timmy.deadman.{{ wizard_name | lower }}.plist"
    ignore_errors: true
--- a/ansible/roles/deadman_switch/templates/deadman_action.sh.j2
+++ b/ansible/roles/deadman_switch/templates/deadman_action.sh.j2
@@ -0,0 +1,153 @@
 #!/usr/bin/env bash
 # =============================================================================
 # Deadman Switch ACTION — {{ wizard_name }}
 # =============================================================================
 # Generated by Ansible on {{ ansible_date_time.iso8601 }}
 # DO NOT EDIT MANUALLY.
 #
 # On healthy check: snapshot current config as "last known good"
 # On failed check: rollback config to last known good, restart agent
 # =============================================================================
 set -euo pipefail
 WIZARD_NAME="{{ wizard_name }}"
 WIZARD_HOME="{{ wizard_home }}"
 CONFIG_FILE="{{ wizard_home }}/config.yaml"
 SNAPSHOT_DIR="{{ deadman_snapshot_dir }}"
 SNAPSHOT_FILE="${SNAPSHOT_DIR}/config.yaml.known_good"
 REQUEST_LOG_DB="{{ request_log_path }}"
 LOG_DIR="{{ timmy_log_dir }}"
 LOG_FILE="${LOG_DIR}/deadman-${WIZARD_NAME}.log"
 MAX_SNAPSHOTS={{ deadman_max_snapshots }}
 RESTART_COOLDOWN={{ deadman_restart_cooldown }}
 MAX_RESTART_ATTEMPTS={{ deadman_max_restart_attempts }}
 COOLDOWN_FILE="${LOG_DIR}/deadman_cooldown_${WIZARD_NAME}"
 SERVICE_NAME="hermes-{{ wizard_name | lower }}"
 # Ensure directories exist
 mkdir -p "${SNAPSHOT_DIR}" "${LOG_DIR}"
 log() {
    echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] [deadman] [${WIZARD_NAME}] $*" >> "${LOG_FILE}"
    echo "[deadman] [${WIZARD_NAME}] $*"
 }
 log_telemetry() {
    local status="$1"
    local message="$2"
    if [ -f "${REQUEST_LOG_DB}" ]; then
        sqlite3 "${REQUEST_LOG_DB}" "INSERT INTO request_log (timestamp, agent_name, provider, model, endpoint, status, error_message) VALUES (datetime('now'), '${WIZARD_NAME}', 'deadman_switch', 'N/A', 'health_check', '${status}', '${message}');" 2>/dev/null || true
    fi
 }
 snapshot_config() {
    if [ -f "${CONFIG_FILE}" ]; then
        cp "${CONFIG_FILE}" "${SNAPSHOT_FILE}"
        # Keep rolling history
        cp "${CONFIG_FILE}" "${SNAPSHOT_DIR}/config.yaml.$(date +%s)"
        # Prune old snapshots
        ls -t "${SNAPSHOT_DIR}"/config.yaml.[0-9]* 2>/dev/null | tail -n +$((MAX_SNAPSHOTS + 1)) | xargs rm -f 2>/dev/null
        log "Config snapshot saved."
    fi
 }
 rollback_config() {
    if [ -f "${SNAPSHOT_FILE}" ]; then
        log "Rolling back config to last known good..."
        cp "${SNAPSHOT_FILE}" "${CONFIG_FILE}"
        log "Config rolled back."
        log_telemetry "fallback" "Config rolled back to last known good by deadman switch"
    else
        log "ERROR: No known good snapshot found. Pulling from upstream..."
        cd "${WIZARD_HOME}/workspace/timmy-config" 2>/dev/null && \
            git pull --ff-only origin {{ upstream_branch }} 2>/dev/null && \
            cp "wizards/{{ wizard_name | lower }}/config.yaml" "${CONFIG_FILE}" && \
            log "Config restored from upstream." || \
            log "CRITICAL: Cannot restore config from any source."
    fi
 }
 restart_agent() {
    # Check cooldown
    if [ -f "${COOLDOWN_FILE}" ]; then
        local last_restart
        last_restart=$(cat "${COOLDOWN_FILE}")
        local now
        now=$(date +%s)
        local elapsed=$((now - last_restart))
        if [ "${elapsed}" -lt "${RESTART_COOLDOWN}" ]; then
            log "Restart cooldown active (${elapsed}s / ${RESTART_COOLDOWN}s). Skipping."
            return 1
        fi
    fi
    log "Restarting ${SERVICE_NAME}..."
    date +%s > "${COOLDOWN_FILE}"
 {% if machine_type == 'vps' %}
    systemctl restart "${SERVICE_NAME}" 2>/dev/null && \
        log "Agent restarted via systemd." || \
        log "ERROR: systemd restart failed."
 {% else %}
    launchctl kickstart -k "ai.hermes.{{ wizard_name | lower }}" 2>/dev/null && \
        log "Agent restarted via launchctl." || \
        (cd "${WIZARD_HOME}" && hermes agent start --daemon 2>/dev/null && \
        log "Agent restarted via hermes CLI.") || \
        log "ERROR: All restart methods failed."
 {% endif %}
    log_telemetry "success" "Agent restarted by deadman switch"
 }
 # --- Health Check ---
 check_health() {
    # Check 1: Is the agent process running?
 {% if machine_type == 'vps' %}
    if ! systemctl is-active --quiet "${SERVICE_NAME}" 2>/dev/null; then
        if ! pgrep -f "hermes" > /dev/null 2>/dev/null; then
            log "FAIL: Agent process not running."
            return 1
        fi
    fi
 {% else %}
    if ! pgrep -f "hermes" > /dev/null 2>/dev/null; then
        log "FAIL: Agent process not running."
        return 1
    fi
 {% endif %}
    # Check 2: Is the API port responding?
    if ! timeout 10 bash -c "echo > /dev/tcp/127.0.0.1/{{ api_port }}" 2>/dev/null; then
        log "FAIL: API port {{ api_port }} not responding."
        return 1
    fi
    # Check 3: Does the config contain banned providers?
    if grep -qi 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' "${CONFIG_FILE}" 2>/dev/null; then
        log "FAIL: Config contains banned provider (Anthropic). Rolling back."
        return 1
    fi
    return 0
 }
 # --- Main ---
 main() {
    log "Health check starting..."
    if check_health; then
        log "HEALTHY — snapshotting config."
        snapshot_config
        log_telemetry "success" "Health check passed"
    else
        log "UNHEALTHY — initiating recovery."
        log_telemetry "error" "Health check failed — initiating rollback"
        rollback_config
        restart_agent
    fi
    log "Health check complete."
 }
 main "$@"
--- a/ansible/roles/deadman_switch/templates/deadman_switch.plist.j2
+++ b/ansible/roles/deadman_switch/templates/deadman_switch.plist.j2
@@ -0,0 +1,22 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
 <!-- Deadman Switch — {{ wizard_name }}. Generated by Ansible. DO NOT EDIT MANUALLY. -->
 <plist version="1.0">
 <dict>
    <key>Label</key>
    <string>com.timmy.deadman.{{ wizard_name | lower }}</string>
    <key>ProgramArguments</key>
    <array>
        <string>/bin/bash</string>
        <string>{{ wizard_home }}/deadman_action.sh</string>
    </array>
    <key>StartInterval</key>
    <integer>{{ deadman_check_interval }}</integer>
    <key>RunAtLoad</key>
    <true/>
    <key>StandardOutPath</key>
    <string>{{ timmy_log_dir }}/deadman-{{ wizard_name }}.log</string>
    <key>StandardErrorPath</key>
    <string>{{ timmy_log_dir }}/deadman-{{ wizard_name }}.log</string>
 </dict>
 </plist>
--- a/ansible/roles/deadman_switch/templates/deadman_switch.service.j2
+++ b/ansible/roles/deadman_switch/templates/deadman_switch.service.j2
@@ -0,0 +1,16 @@
 # Deadman Switch — {{ wizard_name }}
 # Generated by Ansible. DO NOT EDIT MANUALLY.
 [Unit]
 Description=Deadman Switch for {{ wizard_name }} wizard
 After=network.target
 [Service]
 Type=oneshot
 ExecStart={{ wizard_home }}/deadman_action.sh
 User={{ ansible_user | default('root') }}
 StandardOutput=append:{{ timmy_log_dir }}/deadman-{{ wizard_name }}.log
 StandardError=append:{{ timmy_log_dir }}/deadman-{{ wizard_name }}.log
 [Install]
 WantedBy=multi-user.target
--- a/ansible/roles/deadman_switch/templates/deadman_switch.timer.j2
+++ b/ansible/roles/deadman_switch/templates/deadman_switch.timer.j2
@@ -0,0 +1,14 @@
 # Deadman Switch Timer — {{ wizard_name }}
 # Generated by Ansible. DO NOT EDIT MANUALLY.
 # Runs every {{ deadman_check_interval // 60 }} minutes.
 [Unit]
 Description=Deadman Switch Timer for {{ wizard_name }} wizard
 [Timer]
 OnBootSec=60
 OnUnitActiveSec={{ deadman_check_interval }}s
 AccuracySec=30s
 [Install]
 WantedBy=timers.target
--- a/ansible/roles/golden_state/defaults/main.yml
+++ b/ansible/roles/golden_state/defaults/main.yml
@@ -0,0 +1,6 @@
 ---
 # golden_state defaults
 # The golden_state_providers list is defined in group_vars/wizards.yml
 # and inventory/hosts.yml (global vars).
 golden_state_enforce: true
 golden_state_backup_before_deploy: true
--- a/ansible/roles/golden_state/tasks/main.yml
+++ b/ansible/roles/golden_state/tasks/main.yml
@@ -0,0 +1,46 @@
 ---
 # =============================================================================
 # golden_state/tasks — Deploy and enforce golden state provider chain
 # =============================================================================
 - name: "Backup current config before golden state deploy"
  copy:
    src: "{{ wizard_home }}/config.yaml"
    dest: "{{ wizard_home }}/config.yaml.pre-golden-{{ ansible_date_time.epoch }}"
    remote_src: true
  when: golden_state_backup_before_deploy
  ignore_errors: true
 - name: "Deploy golden state wizard config"
  template:
    src: "../../wizard_base/templates/wizard_config.yaml.j2"
    dest: "{{ wizard_home }}/config.yaml"
    mode: "0644"
    backup: true
  notify:
    - "Restart hermes agent (systemd)"
    - "Restart hermes agent (launchctl)"
 - name: "Scan for banned providers in all config files"
  shell: |
    FOUND=0
    for f in {{ wizard_home }}/config.yaml {{ hermes_home }}/config.yaml; do
      if [ -f "$f" ]; then
        if grep -qi 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' "$f"; then
          echo "BANNED PROVIDER in $f:"
          grep -ni 'anthropic\|claude-sonnet\|claude-opus\|claude-haiku' "$f"
          FOUND=1
        fi
      fi
    done
    exit $FOUND
  register: provider_scan
  changed_when: false
  failed_when: provider_scan.rc != 0 and provider_ban_enforcement == 'strict'
 - name: "Report golden state deployment"
  debug:
    msg: >
      {{ wizard_name }} golden state deployed.
      Provider chain: {{ golden_state_providers | map(attribute='name') | list | join(' → ') }}.
      Banned provider scan: {{ 'CLEAN' if provider_scan.rc == 0 else 'VIOLATIONS FOUND' }}.
--- a/ansible/roles/request_log/files/request_log_schema.sql
+++ b/ansible/roles/request_log/files/request_log_schema.sql
@@ -0,0 +1,64 @@
 -- =============================================================================
 -- request_log — Inference Telemetry Table
 -- =============================================================================
 -- Every agent writes to this table BEFORE and AFTER every inference call.
 -- No exceptions. No summarizing. No describing what you would log.
 -- Actually write the row.
 --
 -- Source: KT Bezalel Architecture Session 2026-04-08
 -- =============================================================================
 CREATE TABLE IF NOT EXISTS request_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL DEFAULT (datetime('now')),
    agent_name TEXT NOT NULL,
    provider TEXT NOT NULL,
    model TEXT NOT NULL,
    endpoint TEXT NOT NULL,
    tokens_in INTEGER,
    tokens_out INTEGER,
    latency_ms INTEGER,
    status TEXT NOT NULL,  -- 'success', 'error', 'timeout', 'fallback'
    error_message TEXT
 );
 -- Index for common queries
 CREATE INDEX IF NOT EXISTS idx_request_log_agent
    ON request_log (agent_name, timestamp);
 CREATE INDEX IF NOT EXISTS idx_request_log_provider
    ON request_log (provider, timestamp);
 CREATE INDEX IF NOT EXISTS idx_request_log_status
    ON request_log (status, timestamp);
 -- View: recent activity per agent (last hour)
 CREATE VIEW IF NOT EXISTS v_recent_activity AS
    SELECT
        agent_name,
        provider,
        model,
        status,
        COUNT(*) as call_count,
        AVG(latency_ms) as avg_latency_ms,
        SUM(tokens_in) as total_tokens_in,
        SUM(tokens_out) as total_tokens_out
    FROM request_log
    WHERE timestamp > datetime('now', '-1 hour')
    GROUP BY agent_name, provider, model, status;
 -- View: provider reliability (last 24 hours)
 CREATE VIEW IF NOT EXISTS v_provider_reliability AS
    SELECT
        provider,
        model,
        COUNT(*) as total_calls,
        SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successes,
        SUM(CASE WHEN status = 'error' THEN 1 ELSE 0 END) as errors,
        SUM(CASE WHEN status = 'timeout' THEN 1 ELSE 0 END) as timeouts,
        SUM(CASE WHEN status = 'fallback' THEN 1 ELSE 0 END) as fallbacks,
        ROUND(100.0 * SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) / COUNT(*), 1) as success_rate,
        AVG(latency_ms) as avg_latency_ms
    FROM request_log
    WHERE timestamp > datetime('now', '-24 hours')
    GROUP BY provider, model;
--- a/ansible/roles/request_log/tasks/main.yml
+++ b/ansible/roles/request_log/tasks/main.yml
@@ -0,0 +1,50 @@
 ---
 # =============================================================================
 # request_log/tasks — Deploy Telemetry Table
 # =============================================================================
 # "This is non-negotiable infrastructure. Without it, we cannot verify
 # if any agent actually executed what it claims."
 # — KT Bezalel 2026-04-08
 # =============================================================================
 - name: "Create telemetry directory"
  file:
    path: "{{ request_log_path | dirname }}"
    state: directory
    mode: "0755"
 - name: "Deploy request_log schema"
  copy:
    src: request_log_schema.sql
    dest: "{{ wizard_home }}/request_log_schema.sql"
    mode: "0644"
 - name: "Initialize request_log database"
  shell: |
    sqlite3 "{{ request_log_path }}" < "{{ wizard_home }}/request_log_schema.sql"
  args:
    creates: "{{ request_log_path }}"
 - name: "Verify request_log table exists"
  shell: |
    sqlite3 "{{ request_log_path }}" ".tables" | grep -q "request_log"
  register: table_check
  changed_when: false
 - name: "Verify request_log schema matches"
  shell: |
    sqlite3 "{{ request_log_path }}" ".schema request_log" | grep -q "agent_name"
  register: schema_check
  changed_when: false
 - name: "Set permissions on request_log database"
  file:
    path: "{{ request_log_path }}"
    mode: "0644"
 - name: "Report request_log status"
  debug:
    msg: >
      {{ wizard_name }} request_log: {{ request_log_path }}
      — table exists: {{ table_check.rc == 0 }}
      — schema valid: {{ schema_check.rc == 0 }}
--- a/ansible/roles/wizard_base/defaults/main.yml
+++ b/ansible/roles/wizard_base/defaults/main.yml
@@ -0,0 +1,6 @@
 ---
 # wizard_base defaults
 wizard_user: "{{ ansible_user | default('root') }}"
 wizard_group: "{{ ansible_user | default('root') }}"
 timmy_base_dir: "~/.local/timmy"
 timmy_config_repo: "https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config.git"
--- a/ansible/roles/wizard_base/handlers/main.yml
+++ b/ansible/roles/wizard_base/handlers/main.yml
@@ -0,0 +1,11 @@
 ---
 - name: "Restart hermes agent (systemd)"
  systemd:
    name: "hermes-{{ wizard_name | lower }}"
    state: restarted
  when: machine_type == 'vps'
 - name: "Restart hermes agent (launchctl)"
  shell: "launchctl kickstart -k ai.hermes.{{ wizard_name | lower }}"
  when: machine_type == 'mac'
  ignore_errors: true
--- a/ansible/roles/wizard_base/tasks/main.yml
+++ b/ansible/roles/wizard_base/tasks/main.yml
@@ -0,0 +1,69 @@
 ---
 # =============================================================================
 # wizard_base/tasks — Common wizard setup
 # =============================================================================
 - name: "Create wizard directories"
  file:
    path: "{{ item }}"
    state: directory
    mode: "0755"
  loop:
    - "{{ wizard_home }}"
    - "{{ wizard_home }}/workspace"
    - "{{ hermes_home }}"
    - "{{ hermes_home }}/bin"
    - "{{ hermes_home }}/skins"
    - "{{ hermes_home }}/playbooks"
    - "{{ hermes_home }}/memories"
    - "~/.local/timmy"
    - "~/.local/timmy/fleet-health"
    - "~/.local/timmy/snapshots"
    - "~/.timmy"
 - name: "Clone/update timmy-config"
  git:
    repo: "{{ upstream_repo }}"
    dest: "{{ wizard_home }}/workspace/timmy-config"
    version: "{{ upstream_branch }}"
    force: false
    update: true
  ignore_errors: true  # May fail on first run if no SSH key
 - name: "Deploy SOUL.md"
  copy:
    src: "{{ wizard_home }}/workspace/timmy-config/SOUL.md"
    dest: "~/.timmy/SOUL.md"
    remote_src: true
    mode: "0644"
  ignore_errors: true
 - name: "Deploy thin config (immutable pointer to upstream)"
  template:
    src: thin_config.yml.j2
    dest: "{{ thin_config_path }}"
    mode: "{{ thin_config_mode }}"
  tags: [thin_config]
 - name: "Ensure Python3 and pip are available"
  package:
    name:
      - python3
      - python3-pip
    state: present
  when: machine_type == 'vps'
  ignore_errors: true
 - name: "Ensure PyYAML is installed (for config validation)"
  pip:
    name: pyyaml
    state: present
  when: machine_type == 'vps'
  ignore_errors: true
 - name: "Create Ansible log directory"
  file:
    path: /var/log/ansible
    state: directory
    mode: "0755"
  ignore_errors: true
--- a/ansible/roles/wizard_base/templates/thin_config.yml.j2
+++ b/ansible/roles/wizard_base/templates/thin_config.yml.j2
@@ -0,0 +1,41 @@
 # =============================================================================
 # Thin Config — {{ wizard_name }}
 # =============================================================================
 # THIS FILE IS READ-ONLY. Agents CANNOT modify it.
 # It contains only pointers to upstream. The actual config lives in Gitea.
 #
 # Agent wakes up → pulls config from upstream → loads → runs.
 # If anything tries to mutate this → fails gracefully → pulls fresh on restart.
 #
 # Only way to permanently change config: commit to Gitea, merge PR, Ansible deploys.
 #
 # Generated by Ansible on {{ ansible_date_time.iso8601 }}
 # DO NOT EDIT MANUALLY.
 # =============================================================================
 identity:
  wizard_name: "{{ wizard_name }}"
  wizard_role: "{{ wizard_role }}"
  machine: "{{ inventory_hostname }}"
 upstream:
  repo: "{{ upstream_repo }}"
  branch: "{{ upstream_branch }}"
  config_path: "wizards/{{ wizard_name | lower }}/config.yaml"
  pull_on_wake: {{ config_pull_on_wake | lower }}
 recovery:
  deadman_enabled: {{ deadman_enabled | lower }}
  snapshot_dir: "{{ deadman_snapshot_dir }}"
  restart_cooldown: {{ deadman_restart_cooldown }}
  max_restart_attempts: {{ deadman_max_restart_attempts }}
  escalation_channel: "{{ deadman_escalation_channel }}"
 telemetry:
  request_log_path: "{{ request_log_path }}"
  request_log_enabled: {{ request_log_enabled | lower }}
 local_overrides:
  # Runtime overrides go here. They are EPHEMERAL — not persisted across restarts.
  # On restart, this section is reset to empty.
  {}
--- a/ansible/roles/wizard_base/templates/wizard_config.yaml.j2
+++ b/ansible/roles/wizard_base/templates/wizard_config.yaml.j2
@@ -0,0 +1,115 @@
 # =============================================================================
 # {{ wizard_name }} — Wizard Configuration (Golden State)
 # =============================================================================
 # Generated by Ansible on {{ ansible_date_time.iso8601 }}
 # DO NOT EDIT MANUALLY. Changes go through Gitea PR → Ansible deploy.
 #
 # Provider chain: {{ golden_state_providers | map(attribute='name') | list | join(' → ') }}
 # Anthropic is PERMANENTLY BANNED.
 # =============================================================================
 model:
  default: {{ wizard_model_primary }}
  provider: {{ wizard_provider_primary }}
  context_length: 65536
  base_url: {{ golden_state_providers[0].base_url }}
 toolsets:
  - all
 fallback_providers:
 {% for provider in golden_state_providers %}
  - provider: {{ provider.name }}
    model: {{ provider.model }}
 {% if provider.base_url is defined %}
    base_url: {{ provider.base_url }}
 {% endif %}
 {% if provider.api_key_env is defined %}
    api_key_env: {{ provider.api_key_env }}
 {% endif %}
    timeout: {{ provider.timeout }}
    reason: "{{ provider.reason }}"
 {% endfor %}
 agent:
  max_turns: {{ agent_max_turns }}
  reasoning_effort: {{ agent_reasoning_effort }}
  verbose: {{ agent_verbose | lower }}
 terminal:
  backend: local
  cwd: .
  timeout: 180
  persistent_shell: true
 browser:
  inactivity_timeout: 120
  command_timeout: 30
  record_sessions: false
 display:
  compact: false
  personality: ''
  resume_display: full
  busy_input_mode: interrupt
  bell_on_complete: false
  show_reasoning: false
  streaming: false
  show_cost: false
  tool_progress: all
 memory:
  memory_enabled: true
  user_profile_enabled: true
  memory_char_limit: 2200
  user_char_limit: 1375
  nudge_interval: 10
  flush_min_turns: 6
 approvals:
  mode: {{ agent_approval_mode }}
 security:
  redact_secrets: true
  tirith_enabled: false
 platforms:
  api_server:
    enabled: true
    extra:
      host: 127.0.0.1
      port: {{ api_port }}
 session_reset:
  mode: none
  idle_minutes: 0
 skills:
  creation_nudge_interval: 15
 system_prompt_suffix: |
  You are {{ wizard_name }}, {{ wizard_role }}.
  Your soul is defined in SOUL.md — read it, live it.
  Hermes is your harness.
  {{ golden_state_providers[0].name }} is your primary provider.
  Refusal over fabrication. If you do not know, say so.
  Sovereignty and service always.
 providers:
 {% for provider in golden_state_providers %}
  {{ provider.name }}:
    base_url: {{ provider.base_url }}
    timeout: {{ provider.timeout | default(60) }}
 {% if provider.name == 'kimi-coding' %}
    max_retries: 3
 {% endif %}
 {% endfor %}
 # =============================================================================
 # BANNED PROVIDERS — DO NOT ADD
 # =============================================================================
 # The following providers are PERMANENTLY BANNED:
 # - anthropic (any model: claude-sonnet, claude-opus, claude-haiku)
 # Enforcement: pre-commit hook, linter, Ansible validation, this comment.
 # Adding any banned provider will cause Ansible deployment to FAIL.
 # =============================================================================
--- a/ansible/scripts/deploy_on_webhook.sh
+++ b/ansible/scripts/deploy_on_webhook.sh
@@ -0,0 +1,75 @@
 #!/usr/bin/env bash
 # =============================================================================
 # Gitea Webhook Handler — Trigger Ansible Deploy on Merge
 # =============================================================================
 # This script is called by the Gitea webhook when a PR is merged
 # to the main branch of timmy-config.
 #
 # Setup:
 #   1. Add webhook in Gitea: Settings → Webhooks → Add Webhook
 #   2. URL: http://localhost:9000/hooks/deploy-timmy-config
 #   3. Events: Pull Request (merged only)
 #   4. Secret: <configured in Gitea>
 #
 # This script runs ansible-pull to update the local machine.
 # For fleet-wide deploys, each machine runs ansible-pull independently.
 # =============================================================================
 set -euo pipefail
 REPO="https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config.git"
 BRANCH="main"
 ANSIBLE_DIR="ansible"
 LOG_FILE="/var/log/ansible/webhook-deploy.log"
 LOCK_FILE="/tmp/ansible-deploy.lock"
 log() {
    echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] [webhook] $*" | tee -a "${LOG_FILE}"
 }
 # Prevent concurrent deploys
 if [ -f "${LOCK_FILE}" ]; then
    LOCK_AGE=$(( $(date +%s) - $(stat -c %Y "${LOCK_FILE}" 2>/dev/null || echo 0) ))
    if [ "${LOCK_AGE}" -lt 300 ]; then
        log "Deploy already in progress (lock age: ${LOCK_AGE}s). Skipping."
        exit 0
    else
        log "Stale lock file (${LOCK_AGE}s old). Removing."
        rm -f "${LOCK_FILE}"
    fi
 fi
 trap 'rm -f "${LOCK_FILE}"' EXIT
 touch "${LOCK_FILE}"
 log "Webhook triggered. Starting ansible-pull..."
 # Pull latest config
 cd /tmp
 rm -rf timmy-config-deploy
 git clone --depth 1 --branch "${BRANCH}" "${REPO}" timmy-config-deploy 2>&1 | tee -a "${LOG_FILE}"
 cd timmy-config-deploy/${ANSIBLE_DIR}
 # Run Ansible against localhost
 log "Running Ansible playbook..."
 ansible-playbook \
    -i inventory/hosts.yml \
    playbooks/site.yml \
    --limit "$(hostname)" \
    --diff \
    2>&1 | tee -a "${LOG_FILE}"
 RESULT=$?
 if [ ${RESULT} -eq 0 ]; then
    log "Deploy successful."
 else
    log "ERROR: Deploy failed with exit code ${RESULT}."
 fi
 # Cleanup
 rm -rf /tmp/timmy-config-deploy
 log "Webhook handler complete."
 exit ${RESULT}
--- a/ansible/scripts/validate_config.py
+++ b/ansible/scripts/validate_config.py
@@ -0,0 +1,155 @@
 #!/usr/bin/env python3
 """
 Config Validator — The Timmy Foundation
 Validates wizard configs against golden state rules.
 Run before any config deploy to catch violations early.
 Usage:
    python3 validate_config.py <config_file>
    python3 validate_config.py --all  # Validate all wizard configs
 Exit codes:
    0 — All validations passed
    1 — Validation errors found
    2 — File not found or parse error
 """
 import sys
 import os
 import yaml
 import fnmatch
 from pathlib import Path
 # === BANNED PROVIDERS — HARD POLICY ===
 BANNED_PROVIDERS = {"anthropic", "claude"}
 BANNED_MODEL_PATTERNS = [
    "claude-*",
    "anthropic/*",
    "*sonnet*",
    "*opus*",
    "*haiku*",
 ]
 # === REQUIRED FIELDS ===
 REQUIRED_FIELDS = {
    "model": ["default", "provider"],
    "fallback_providers": None,  # Must exist as a list
 }
 def is_banned_model(model_name: str) -> bool:
    """Check if a model name matches any banned pattern."""
    model_lower = model_name.lower()
    for pattern in BANNED_MODEL_PATTERNS:
        if fnmatch.fnmatch(model_lower, pattern):
            return True
    return False
 def validate_config(config_path: str) -> list[str]:
    """Validate a wizard config file. Returns list of error strings."""
    errors = []
    try:
        with open(config_path) as f:
            cfg = yaml.safe_load(f)
    except FileNotFoundError:
        return [f"File not found: {config_path}"]
    except yaml.YAMLError as e:
        return [f"YAML parse error: {e}"]
    if not cfg:
        return ["Config file is empty"]
    # Check required fields
    for section, fields in REQUIRED_FIELDS.items():
        if section not in cfg:
            errors.append(f"Missing required section: {section}")
        elif fields:
            for field in fields:
                if field not in cfg[section]:
                    errors.append(f"Missing required field: {section}.{field}")
    # Check default provider
    default_provider = cfg.get("model", {}).get("provider", "")
    if default_provider.lower() in BANNED_PROVIDERS:
        errors.append(f"BANNED default provider: {default_provider}")
    default_model = cfg.get("model", {}).get("default", "")
    if is_banned_model(default_model):
        errors.append(f"BANNED default model: {default_model}")
    # Check fallback providers
    for i, fb in enumerate(cfg.get("fallback_providers", [])):
        provider = fb.get("provider", "")
        model = fb.get("model", "")
        if provider.lower() in BANNED_PROVIDERS:
            errors.append(f"BANNED fallback provider [{i}]: {provider}")
        if is_banned_model(model):
            errors.append(f"BANNED fallback model [{i}]: {model}")
    # Check providers section
    for name, provider_cfg in cfg.get("providers", {}).items():
        if name.lower() in BANNED_PROVIDERS:
            errors.append(f"BANNED provider in providers section: {name}")
        base_url = str(provider_cfg.get("base_url", ""))
        if "anthropic" in base_url.lower():
            errors.append(f"BANNED URL in provider {name}: {base_url}")
    # Check system prompt for banned references
    prompt = cfg.get("system_prompt_suffix", "")
    if isinstance(prompt, str):
        for banned in BANNED_PROVIDERS:
            if banned in prompt.lower():
                errors.append(f"BANNED provider referenced in system_prompt_suffix: {banned}")
    return errors
 def main():
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <config_file> [--all]")
        sys.exit(2)
    if sys.argv[1] == "--all":
        # Validate all wizard configs in the repo
        repo_root = Path(__file__).parent.parent.parent
        wizard_dir = repo_root / "wizards"
        all_errors = {}
        for wizard_path in sorted(wizard_dir.iterdir()):
            config_file = wizard_path / "config.yaml"
            if config_file.exists():
                errors = validate_config(str(config_file))
                if errors:
                    all_errors[wizard_path.name] = errors
        if all_errors:
            print("VALIDATION FAILED:")
            for wizard, errors in all_errors.items():
                print(f"\n  {wizard}:")
                for err in errors:
                    print(f"    - {err}")
            sys.exit(1)
        else:
            print("All wizard configs passed validation.")
            sys.exit(0)
    else:
        config_path = sys.argv[1]
        errors = validate_config(config_path)
        if errors:
            print(f"VALIDATION FAILED for {config_path}:")
            for err in errors:
                print(f"  - {err}")
            sys.exit(1)
        else:
            print(f"PASSED: {config_path}")
            sys.exit(0)
 if __name__ == "__main__":
    main()
--- a/bin/deadman-fallback.py
+++ b/bin/deadman-fallback.py
@@ -0,0 +1,264 @@
     1|#!/usr/bin/env python3
     2|"""
     3|Dead Man Switch Fallback Engine
     4|
     5|When the dead man switch triggers (zero commits for 2+ hours, model down,
     6|Gitea unreachable, etc.), this script diagnoses the failure and applies
     7|common sense fallbacks automatically.
     8|
     9|Fallback chain:
    10|1. Primary model (Anthropic) down -> switch config to local-llama.cpp
    11|2. Gitea unreachable -> cache issues locally, retry on recovery
    12|3. VPS agents down -> alert + lazarus protocol
    13|4. Local llama.cpp down -> try Ollama, then alert-only mode
    14|5. All inference dead -> safe mode (cron pauses, alert Alexander)
    15|
    16|Each fallback is reversible. Recovery auto-restores the previous config.
    17|"""
    18|import os
    19|import sys
    20|import json
    21|import subprocess
    22|import time
    23|import yaml
    24|import shutil
    25|from pathlib import Path
    26|from datetime import datetime, timedelta
    27|
    28|HERMES_HOME = Path(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")))
    29|CONFIG_PATH = HERMES_HOME / "config.yaml"
    30|FALLBACK_STATE = HERMES_HOME / "deadman-fallback-state.json"
    31|BACKUP_CONFIG = HERMES_HOME / "config.yaml.pre-fallback"
    32|FORGE_URL = "https://forge.alexanderwhitestone.com"
    33|
    34|def load_config():
    35|    with open(CONFIG_PATH) as f:
    36|        return yaml.safe_load(f)
    37|
    38|def save_config(cfg):
    39|    with open(CONFIG_PATH, "w") as f:
    40|        yaml.dump(cfg, f, default_flow_style=False)
    41|
    42|def load_state():
    43|    if FALLBACK_STATE.exists():
    44|        with open(FALLBACK_STATE) as f:
    45|            return json.load(f)
    46|    return {"active_fallbacks": [], "last_check": None, "recovery_pending": False}
    47|
    48|def save_state(state):
    49|    state["last_check"] = datetime.now().isoformat()
    50|    with open(FALLBACK_STATE, "w") as f:
    51|        json.dump(state, f, indent=2)
    52|
    53|def run(cmd, timeout=10):
    54|    try:
    55|        r = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
    56|        return r.returncode, r.stdout.strip(), r.stderr.strip()
    57|    except subprocess.TimeoutExpired:
    58|        return -1, "", "timeout"
    59|    except Exception as e:
    60|        return -1, "", str(e)
    61|
    62|# ─── HEALTH CHECKS ───
    63|
    64|def check_anthropic():
    65|    """Can we reach Anthropic API?"""
    66|    key = os.environ.get("ANTHROPIC_API_KEY", "")
    67|    if not key:
    68|        # Check multiple .env locations
    69|        for env_path in [HERMES_HOME / ".env", Path.home() / ".hermes" / ".env"]:
    70|            if env_path.exists():
    71|                for line in open(env_path):
    72|                    line = line.strip()
    73|                    if line.startswith("ANTHROPIC_API_KEY=***
    74|                        key = line.split("=", 1)[1].strip().strip('"').strip("'")
    75|                        break
    76|            if key:
    77|                break
    78|    if not key:
    79|        return False, "no API key"
    80|    code, out, err = run(
    81|        f'curl -s -o /dev/null -w "%{{http_code}}" -H "x-api-key: {key}" '
    82|        f'-H "anthropic-version: 2023-06-01" '
    83|        f'https://api.anthropic.com/v1/messages -X POST '
    84|        f'-H "content-type: application/json" '
    85|        f'-d \'{{"model":"claude-haiku-4-5-20251001","max_tokens":1,"messages":[{{"role":"user","content":"ping"}}]}}\' ',
    86|        timeout=15
    87|    )
    88|    if code == 0 and out in ("200", "429"):
    89|        return True, f"HTTP {out}"
    90|    return False, f"HTTP {out} err={err[:80]}"
    91|
    92|def check_local_llama():
    93|    """Is local llama.cpp serving?"""
    94|    code, out, err = run("curl -s http://localhost:8081/v1/models", timeout=5)
    95|    if code == 0 and "hermes" in out.lower():
    96|        return True, "serving"
    97|    return False, f"exit={code}"
    98|
    99|def check_ollama():
   100|    """Is Ollama running?"""
   101|    code, out, err = run("curl -s http://localhost:11434/api/tags", timeout=5)
   102|    if code == 0 and "models" in out:
   103|        return True, "running"
   104|    return False, f"exit={code}"
   105|
   106|def check_gitea():
   107|    """Can we reach the Forge?"""
   108|    token_path = Path.home() / ".config" / "gitea" / "timmy-token"
   109|    if not token_path.exists():
   110|        return False, "no token"
   111|    token = token_path.read_text().strip()
   112|    code, out, err = run(
   113|        f'curl -s -o /dev/null -w "%{{http_code}}" -H "Authorization: token {token}" '
   114|        f'"{FORGE_URL}/api/v1/user"',
   115|        timeout=10
   116|    )
   117|    if code == 0 and out == "200":
   118|        return True, "reachable"
   119|    return False, f"HTTP {out}"
   120|
   121|def check_vps(ip, name):
   122|    """Can we SSH into a VPS?"""
   123|    code, out, err = run(f"ssh -o ConnectTimeout=5 root@{ip} 'echo alive'", timeout=10)
   124|    if code == 0 and "alive" in out:
   125|        return True, "alive"
   126|    return False, f"unreachable"
   127|
   128|# ─── FALLBACK ACTIONS ───
   129|
   130|def fallback_to_local_model(cfg):
   131|    """Switch primary model from Anthropic to local llama.cpp"""
   132|    if not BACKUP_CONFIG.exists():
   133|        shutil.copy2(CONFIG_PATH, BACKUP_CONFIG)
   134|    
   135|    cfg["model"]["provider"] = "local-llama.cpp"
   136|    cfg["model"]["default"] = "hermes3"
   137|    save_config(cfg)
   138|    return "Switched primary model to local-llama.cpp/hermes3"
   139|
   140|def fallback_to_ollama(cfg):
   141|    """Switch to Ollama if llama.cpp is also down"""
   142|    if not BACKUP_CONFIG.exists():
   143|        shutil.copy2(CONFIG_PATH, BACKUP_CONFIG)
   144|    
   145|    cfg["model"]["provider"] = "ollama"
   146|    cfg["model"]["default"] = "gemma4:latest"
   147|    save_config(cfg)
   148|    return "Switched primary model to ollama/gemma4:latest"
   149|
   150|def enter_safe_mode(state):
   151|    """Pause all non-essential cron jobs, alert Alexander"""
   152|    state["safe_mode"] = True
   153|    state["safe_mode_entered"] = datetime.now().isoformat()
   154|    save_state(state)
   155|    return "SAFE MODE: All inference down. Cron jobs should be paused. Alert Alexander."
   156|
   157|def restore_config():
   158|    """Restore pre-fallback config when primary recovers"""
   159|    if BACKUP_CONFIG.exists():
   160|        shutil.copy2(BACKUP_CONFIG, CONFIG_PATH)
   161|        BACKUP_CONFIG.unlink()
   162|        return "Restored original config from backup"
   163|    return "No backup config to restore"
   164|
   165|# ─── MAIN DIAGNOSIS AND FALLBACK ENGINE ───
   166|
   167|def diagnose_and_fallback():
   168|    state = load_state()
   169|    cfg = load_config()
   170|    
   171|    results = {
   172|        "timestamp": datetime.now().isoformat(),
   173|        "checks": {},
   174|        "actions": [],
   175|        "status": "healthy"
   176|    }
   177|    
   178|    # Check all systems
   179|    anthropic_ok, anthropic_msg = check_anthropic()
   180|    results["checks"]["anthropic"] = {"ok": anthropic_ok, "msg": anthropic_msg}
   181|    
   182|    llama_ok, llama_msg = check_local_llama()
   183|    results["checks"]["local_llama"] = {"ok": llama_ok, "msg": llama_msg}
   184|    
   185|    ollama_ok, ollama_msg = check_ollama()
   186|    results["checks"]["ollama"] = {"ok": ollama_ok, "msg": ollama_msg}
   187|    
   188|    gitea_ok, gitea_msg = check_gitea()
   189|    results["checks"]["gitea"] = {"ok": gitea_ok, "msg": gitea_msg}
   190|    
   191|    # VPS checks
   192|    vpses = [
   193|        ("167.99.126.228", "Allegro"),
   194|        ("143.198.27.163", "Ezra"),
   195|        ("159.203.146.185", "Bezalel"),
   196|    ]
   197|    for ip, name in vpses:
   198|        vps_ok, vps_msg = check_vps(ip, name)
   199|        results["checks"][f"vps_{name.lower()}"] = {"ok": vps_ok, "msg": vps_msg}
   200|    
   201|    current_provider = cfg.get("model", {}).get("provider", "anthropic")
   202|    
   203|    # ─── FALLBACK LOGIC ───
   204|    
   205|    # Case 1: Primary (Anthropic) down, local available
   206|    if not anthropic_ok and current_provider == "anthropic":
   207|        if llama_ok:
   208|            msg = fallback_to_local_model(cfg)
   209|            results["actions"].append(msg)
   210|            state["active_fallbacks"].append("anthropic->local-llama")
   211|            results["status"] = "degraded_local"
   212|        elif ollama_ok:
   213|            msg = fallback_to_ollama(cfg)
   214|            results["actions"].append(msg)
   215|            state["active_fallbacks"].append("anthropic->ollama")
   216|            results["status"] = "degraded_ollama"
   217|        else:
   218|            msg = enter_safe_mode(state)
   219|            results["actions"].append(msg)
   220|            results["status"] = "safe_mode"
   221|    
   222|    # Case 2: Already on fallback, check if primary recovered
   223|    elif anthropic_ok and "anthropic->local-llama" in state.get("active_fallbacks", []):
   224|        msg = restore_config()
   225|        results["actions"].append(msg)
   226|        state["active_fallbacks"].remove("anthropic->local-llama")
   227|        results["status"] = "recovered"
   228|    elif anthropic_ok and "anthropic->ollama" in state.get("active_fallbacks", []):
   229|        msg = restore_config()
   230|        results["actions"].append(msg)
   231|        state["active_fallbacks"].remove("anthropic->ollama")
   232|        results["status"] = "recovered"
   233|    
   234|    # Case 3: Gitea down — just flag it, work locally
   235|    if not gitea_ok:
   236|        results["actions"].append("WARN: Gitea unreachable — work cached locally until recovery")
   237|        if "gitea_down" not in state.get("active_fallbacks", []):
   238|            state["active_fallbacks"].append("gitea_down")
   239|        results["status"] = max(results["status"], "degraded_gitea", key=lambda x: ["healthy", "recovered", "degraded_gitea", "degraded_local", "degraded_ollama", "safe_mode"].index(x) if x in ["healthy", "recovered", "degraded_gitea", "degraded_local", "degraded_ollama", "safe_mode"] else 0)
   240|    elif "gitea_down" in state.get("active_fallbacks", []):
   241|        state["active_fallbacks"].remove("gitea_down")
   242|        results["actions"].append("Gitea recovered — resume normal operations")
   243|    
   244|    # Case 4: VPS agents down
   245|    for ip, name in vpses:
   246|        key = f"vps_{name.lower()}"
   247|        if not results["checks"][key]["ok"]:
   248|            results["actions"].append(f"ALERT: {name} VPS ({ip}) unreachable — lazarus protocol needed")
   249|    
   250|    save_state(state)
   251|    return results
   252|
   253|if __name__ == "__main__":
   254|    results = diagnose_and_fallback()
   255|    print(json.dumps(results, indent=2))
   256|    
   257|    # Exit codes for cron integration
   258|    if results["status"] == "safe_mode":
   259|        sys.exit(2)
   260|    elif results["status"].startswith("degraded"):
   261|        sys.exit(1)
   262|    else:
   263|        sys.exit(0)
   264|
--- a/scripts/architecture_linter.py
+++ b/scripts/architecture_linter.py
@@ -9,7 +9,7 @@ import re
 SOVEREIGN_RULES = [
    (r"https?://(api\.openai\.com|api\.anthropic\.com)", "CRITICAL: External cloud API detected. Use local custom_provider instead."),
    (r"provider: (openai|anthropic)", "WARNING: Direct cloud provider used. Ensure fallback_model is configured."),
-    (r"api_key: ['"][^'"\s]{10,}['"]", "SECURITY: Hardcoded API key detected. Use environment variables.")
+    (r"api_key:\s*['\"][A-Za-z0-9_\-]{16,}['\"]", "SECURITY: Hardcoded API key detected. Use environment variables.")
 ]
 def lint_file(path):
--- a/scripts/architecture_linter_v2.py
+++ b/scripts/architecture_linter_v2.py
@@ -5,122 +5,233 @@ Part of the Gemini Sovereign Governance System.
 Enforces architectural boundaries, security, and documentation standards
 across the Timmy Foundation fleet.
 Refs: #437 — repo-aware, test-backed, CI-enforced.
 """
 import argparse
 import os
 import re
 import sys
 import argparse
 from pathlib import Path
 # --- CONFIGURATION ---
 SOVEREIGN_KEYWORDS = ["mempalace", "sovereign_store", "tirith", "bezalel", "nexus"]
-IP_REGEX = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'
+
-API_KEY_REGEX = r'(?:api_key|secret|token|password|auth_token)\s*[:=]\s*["\'][a-zA-Z0-9_\-]{20,}["\']'
+# IP addresses (skip 127.0.0.1, 0.0.0.0, 10.x.x.x, 172.16-31.x.x, 192.168.x.x)
 IP_REGEX = r'\b(?!(?:127|10|192\.168|172\.(?:1[6-9]|2\d|3[01]))\.)' \
           r'(?:\d{1,3}\.){3}\d{1,3}\b'
 # API key / secret patterns — catches openai-, sk-, anthropic-, AKIA, etc.
 API_KEY_PATTERNS = [
    r'sk-[A-Za-z0-9]{20,}',               # OpenAI-style
    r'sk-ant-[A-Za-z0-9\-]{20,}',          # Anthropic
    r'AKIA[A-Z0-9]{16}',                    # AWS access key
    r'ghp_[A-Za-z0-9]{36}',                # GitHub PAT
    r'glpat-[A-Za-z0-9\-]{20,}',           # GitLab PAT
    r'(?:api[_-]?key|secret|token)\s*[:=]\s*["\'][A-Za-z0-9_\-]{16,}["\']',
 ]
 # Sovereignty rules (carried from v1)
 SOVEREIGN_RULES = [
    (r'https?://api\.openai\.com', 'External cloud API: api.openai.com. Use local custom_provider.'),
    (r'https?://api\.anthropic\.com', 'External cloud API: api.anthropic.com. Use local custom_provider.'),
    (r'provider:\s*(?:openai|anthropic)\b', 'Direct cloud provider. Ensure fallback_model is configured.'),
 ]
 # File extensions to scan
 SCAN_EXTENSIONS = {'.py', '.ts', '.tsx', '.js', '.yaml', '.yml', '.json', '.env', '.sh', '.cfg', '.toml'}
 SKIP_DIRS = {'.git', 'node_modules', '__pycache__', '.venv', 'venv', '.tox', '.eggs'}
 class LinterResult:
    """Structured result container for programmatic access."""
    def __init__(self, repo_path: str, repo_name: str):
        self.repo_path = repo_path
        self.repo_name = repo_name
        self.errors: list[str] = []
        self.warnings: list[str] = []
    @property
    def passed(self) -> bool:
        return len(self.errors) == 0
    @property
    def violation_count(self) -> int:
        return len(self.errors)
    def summary(self) -> str:
        lines = [f"--- Architecture Linter v2: {self.repo_name} ---"]
        for w in self.warnings:
            lines.append(f"  [W] {w}")
        for e in self.errors:
            lines.append(f"  [E] {e}")
        status = "PASSED" if self.passed else f"FAILED ({self.violation_count} violations)"
        lines.append(f"\nResult: {status}")
        return '\n'.join(lines)
 class Linter:
    def __init__(self, repo_path: str):
        self.repo_path = Path(repo_path).resolve()
        if not self.repo_path.is_dir():
            raise FileNotFoundError(f"Repository path does not exist: {self.repo_path}")
        self.repo_name = self.repo_path.name
-        self.errors = []
+        self.result = LinterResult(str(self.repo_path), self.repo_name)
-    def log_error(self, message: str, file: str = None, line: int = None):
+    # --- helpers ---
-        loc = f"{file}:{line}" if file and line else (file if file else "General")
+
-        self.errors.append(f"[{loc}] {message}")
+    def _scan_files(self, extensions=None):
        """Yield (Path, content) for files matching *extensions*."""
        exts = extensions or SCAN_EXTENSIONS
        for root, dirs, files in os.walk(self.repo_path):
            dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
            for fname in files:
                if Path(fname).suffix in exts:
                    if fname == '.env.example':
                        continue
                    fpath = Path(root) / fname
                    try:
                        content = fpath.read_text(errors='ignore')
                    except Exception:
                        continue
                    yield fpath, content
    def _line_no(self, content: str, offset: int) -> int:
        return content.count('\n', 0, offset) + 1
    # --- checks ---
    def check_sidecar_boundary(self):
-        """Rule 1: No sovereign code in hermes-agent (sidecar boundary)"""
+        """No sovereign code in hermes-agent (sidecar boundary)."""
-        if self.repo_name == "hermes-agent":
+        if self.repo_name != 'hermes-agent':
-            for root, _, files in os.walk(self.repo_path):
+            return
-                if "node_modules" in root or ".git" in root:
+        for fpath, content in self._scan_files():
                    continue
                for file in files:
                    if file.endswith((".py", ".ts", ".js", ".tsx")):
                        path = Path(root) / file
                        content = path.read_text(errors="ignore")
            for kw in SOVEREIGN_KEYWORDS:
                if kw in content.lower():
-                                # Exception: imports or comments might be okay, but we're strict for now
+                    rel = str(fpath.relative_to(self.repo_path))
-                                self.log_error(f"Sovereign keyword '{kw}' found in hermes-agent. Violates sidecar boundary.", str(path.relative_to(self.repo_path)))
+                    self.result.errors.append(
                        f"Sovereign keyword '{kw}' in hermes-agent violates sidecar boundary. [{rel}]"
                    )
    def check_hardcoded_ips(self):
-        """Rule 2: No hardcoded IPs (use domain names)"""
+        """No hardcoded public IPs — use DNS or env vars."""
-        for root, _, files in os.walk(self.repo_path):
+        for fpath, content in self._scan_files():
-            if "node_modules" in root or ".git" in root:
+            for m in re.finditer(IP_REGEX, content):
                ip = m.group()
                # skip private ranges already handled by lookahead, and 0.0.0.0
                if ip.startswith('0.'):
                    continue
-            for file in files:
+                line = self._line_no(content, m.start())
-                if file.endswith((".py", ".ts", ".js", ".tsx", ".yaml", ".yml", ".json")):
+                rel = str(fpath.relative_to(self.repo_path))
-                    path = Path(root) / file
+                self.result.errors.append(
-                    content = path.read_text(errors="ignore")
+                    f"Hardcoded IP '{ip}'. Use DNS or env vars. [{rel}:{line}]"
-                    matches = re.finditer(IP_REGEX, content)
+                )
                    for match in matches:
                        ip = match.group()
                        if ip in ["127.0.0.1", "0.0.0.0"]:
                            continue
                        line_no = content.count('\n', 0, match.start()) + 1
                        self.log_error(f"Hardcoded IP address '{ip}' found. Use domain names or environment variables.", str(path.relative_to(self.repo_path)), line_no)
    def check_api_keys(self):
-        """Rule 3: No cloud API keys committed to repos"""
+        """No cloud API keys / secrets committed."""
-        for root, _, files in os.walk(self.repo_path):
+        for fpath, content in self._scan_files():
-            if "node_modules" in root or ".git" in root:
+            for pattern in API_KEY_PATTERNS:
-                continue
+                for m in re.finditer(pattern, content, re.IGNORECASE):
-            for file in files:
+                    line = self._line_no(content, m.start())
-                if file.endswith((".py", ".ts", ".js", ".tsx", ".yaml", ".yml", ".json", ".env")):
+                    rel = str(fpath.relative_to(self.repo_path))
-                    if file == ".env.example":
+                    self.result.errors.append(
-                        continue
+                        f"Potential secret / API key detected. [{rel}:{line}]"
-                    path = Path(root) / file
+                    )
-                    content = path.read_text(errors="ignore")
+
-                    matches = re.finditer(API_KEY_REGEX, content, re.IGNORECASE)
+    def check_sovereignty_rules(self):
-                    for match in matches:
+        """V1 sovereignty rules: no direct cloud API endpoints or providers."""
-                        line_no = content.count('\n', 0, match.start()) + 1
+        for fpath, content in self._scan_files({'.py', '.ts', '.tsx', '.js', '.yaml', '.yml'}):
-                        self.log_error("Potential API key or secret found in code.", str(path.relative_to(self.repo_path)), line_no)
+            for pattern, msg in SOVEREIGN_RULES:
                for m in re.finditer(pattern, content):
                    line = self._line_no(content, m.start())
                    rel = str(fpath.relative_to(self.repo_path))
                    self.result.errors.append(f"{msg} [{rel}:{line}]")
    def check_soul_canonical(self):
-        """Rule 4: SOUL.md exists and is canonical in exactly one location"""
+        """SOUL.md must exist exactly in timmy-config root."""
-        soul_path = self.repo_path / "SOUL.md"
+        soul_path = self.repo_path / 'SOUL.md'
-        if self.repo_name == "timmy-config":
+        if self.repo_name == 'timmy-config':
            if not soul_path.exists():
-                self.log_error("SOUL.md is missing from the canonical location (timmy-config root).")
+                self.result.errors.append(
                    'SOUL.md missing from canonical location (timmy-config root).'
                )
        else:
            if soul_path.exists():
-                self.log_error("SOUL.md found in non-canonical repo. It should only live in timmy-config.")
+                self.result.errors.append(
                    'SOUL.md found in non-canonical repo. Must live only in timmy-config.'
                )
    def check_readme(self):
-        """Rule 5: Every repo has a README with current truth"""
+        """Every repo must have a substantive README."""
-        readme_path = self.repo_path / "README.md"
+        readme = self.repo_path / 'README.md'
-        if not readme_path.exists():
+        if not readme.exists():
-            self.log_error("README.md is missing.")
+            self.result.errors.append('README.md is missing.')
        else:
-            content = readme_path.read_text(errors="ignore")
+            content = readme.read_text(errors='ignore')
            if len(content.strip()) < 50:
-                self.log_error("README.md is too short or empty. Provide current truth about the repo.")
+                self.result.warnings.append(
                    'README.md is very short (<50 chars). Provide current truth about the repo.'
                )
-    def run(self):
+    # --- runner ---
-        print(f"--- Gemini Linter: Auditing {self.repo_name} ---")
+
    def run(self) -> LinterResult:
        """Execute all checks and return the result."""
        self.check_sidecar_boundary()
        self.check_hardcoded_ips()
        self.check_api_keys()
        self.check_sovereignty_rules()
        self.check_soul_canonical()
        self.check_readme()
        return self.result
        if self.errors:
            print(f"\n[FAILURE] Found {len(self.errors)} architectural violations:")
            for err in self.errors:
                print(f"  - {err}")
            return False
        else:
            print("\n[SUCCESS] Architecture is sound. Sovereignty maintained.")
            return True
 def main():
-    parser = argparse.ArgumentParser(description="Gemini Architecture Linter v2")
+    parser = argparse.ArgumentParser(
-    parser.add_argument("repo_path", nargs="?", default=".", help="Path to the repository to lint")
+        description='Gemini Architecture Linter v2 — repo-aware sovereignty gate.'
    )
    parser.add_argument(
        'repo_path', nargs='?', default='.',
        help='Path to the repository to lint (default: cwd).',
    )
    parser.add_argument(
        '--repo', dest='repo_flag', default=None,
        help='Explicit repo path (alias for positional arg).',
    )
    parser.add_argument(
        '--json', dest='json_output', action='store_true',
        help='Emit machine-readable JSON instead of human text.',
    )
    args = parser.parse_args()
-    linter = Linter(args.repo_path)
+    path = args.repo_flag if args.repo_flag else args.repo_path
    success = linter.run()
    sys.exit(0 if success else 1)
-if __name__ == "__main__":
+    try:
        linter = Linter(path)
    except FileNotFoundError as exc:
        print(f"ERROR: {exc}", file=sys.stderr)
        sys.exit(2)
    result = linter.run()
    if args.json_output:
        import json as _json
        out = {
            'repo': result.repo_name,
            'passed': result.passed,
            'violation_count': result.violation_count,
            'errors': result.errors,
            'warnings': result.warnings,
        }
        print(_json.dumps(out, indent=2))
    else:
        print(result.summary())
    sys.exit(0 if result.passed else 1)
 if __name__ == '__main__':
    main()
--- a/scripts/config_validator.py
+++ b/scripts/config_validator.py
@@ -0,0 +1,306 @@
 #!/usr/bin/env python3
 """
 config_validator.py — Validate all YAML/JSON config files in timmy-config.
 Checks:
  1. YAML syntax (pyyaml safe_load)
  2. JSON syntax (json.loads)
  3. Duplicate keys in YAML/JSON
  4. Trailing whitespace in YAML
  5. Tabs in YAML (should use spaces)
  6. Cron expression validity (if present)
 Exit 0 if all valid, 1 if any invalid.
 """
 import json
 import os
 import re
 import sys
 from pathlib import Path
 try:
    import yaml
 except ImportError:
    print("ERROR: PyYAML not installed. Run: pip install pyyaml")
    sys.exit(1)
 # ── Cron validation ──────────────────────────────────────────────────────────
 DOW_NAMES = {"sun", "mon", "tue", "wed", "thu", "fri", "sat"}
 MONTH_NAMES = {"jan", "feb", "mar", "apr", "may", "jun",
               "jul", "aug", "sep", "oct", "nov", "dec"}
 def _expand_cron_field(field: str, lo: int, hi: int, names: dict | None = None) -> set[int]:
    """Expand a single cron field into a set of valid integers."""
    result: set[int] = set()
    for part in field.split(","):
        # Handle step: */N or 1-5/N
        step = 1
        if "/" in part:
            part, step_str = part.split("/", 1)
            if not step_str.isdigit() or int(step_str) < 1:
                raise ValueError(f"invalid step value: {step_str}")
            step = int(step_str)
        if part == "*":
            rng = range(lo, hi + 1, step)
        elif "-" in part:
            a, b = part.split("-", 1)
            a = _resolve_name(a, names, lo, hi)
            b = _resolve_name(b, names, lo, hi)
            if a > b:
                raise ValueError(f"range {a}-{b} is reversed")
            rng = range(a, b + 1, step)
        else:
            val = _resolve_name(part, names, lo, hi)
            rng = range(val, val + 1)
        for v in rng:
            if v < lo or v > hi:
                raise ValueError(f"value {v} out of range [{lo}-{hi}]")
            result.add(v)
    return result
 def _resolve_name(token: str, names: dict | None, lo: int, hi: int) -> int:
    if names and token.lower() in names:
        return names[token.lower()]
    if not token.isdigit():
        raise ValueError(f"unrecognized token: {token}")
    val = int(token)
    if val < lo or val > hi:
        raise ValueError(f"value {val} out of range [{lo}-{hi}]")
    return val
 def validate_cron(expr: str) -> list[str]:
    """Validate a 5-field cron expression. Returns list of errors (empty = ok)."""
    errors: list[str] = []
    fields = expr.strip().split()
    if len(fields) != 5:
        return [f"expected 5 fields, got {len(fields)}"]
    specs = [
        (fields[0], 0, 59, None, "minute"),
        (fields[1], 0, 23, None, "hour"),
        (fields[2], 1, 31, None, "day-of-month"),
        (fields[3], 1, 12, MONTH_NAMES, "month"),
        (fields[4], 0, 7, DOW_NAMES, "day-of-week"),
    ]
    for field, lo, hi, names, label in specs:
        try:
            _expand_cron_field(field, lo, hi, names)
        except ValueError as e:
            errors.append(f"{label}: {e}")
    return errors
 # ── Duplicate key detection ──────────────────────────────────────────────────
 class DuplicateKeyError(Exception):
    pass
 class _StrictYAMLLoader(yaml.SafeLoader):
    """YAML loader that rejects duplicate keys."""
    pass
 def _no_duplicates_constructor(loader, node, deep=False):
    mapping = {}
    for key_node, value_node in node.value:
        key = loader.construct_object(key_node, deep=deep)
        if key in mapping:
            raise DuplicateKeyError(
                f"duplicate key '{key}' (line {key_node.start_mark.line + 1})"
            )
        mapping[key] = loader.construct_object(value_node, deep=deep)
    return mapping
 _StrictYAMLLoader.add_constructor(
    yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
    _no_duplicates_constructor,
 )
 def _json_has_duplicates(text: str) -> list[str]:
    """Check for duplicate keys in JSON by scanning for repeated quoted keys at same depth."""
    errors: list[str] = []
    # Use a custom approach: parse with object_pairs_hook
    seen_stack: list[set[str]] = []
    def _check_pairs(pairs):
        level_keys: set[str] = set()
        for k, _ in pairs:
            if k in level_keys:
                errors.append(f"duplicate JSON key: '{k}'")
            level_keys.add(k)
        return dict(pairs)
    try:
        json.loads(text, object_pairs_hook=_check_pairs)
    except json.JSONDecodeError:
        pass  # syntax errors caught elsewhere
    return errors
 # ── Main validator ───────────────────────────────────────────────────────────
 def find_config_files(root: Path) -> list[Path]:
    """Recursively find .yaml, .yml, .json files (skip .git, node_modules, venv)."""
    skip_dirs = {".git", "node_modules", "venv", "__pycache__", ".venv"}
    results: list[Path] = []
    for dirpath, dirnames, filenames in os.walk(root):
        dirnames[:] = [d for d in dirnames if d not in skip_dirs]
        for fname in filenames:
            if fname.endswith((".yaml", ".yml", ".json")):
                results.append(Path(dirpath) / fname)
    return sorted(results)
 def validate_yaml_file(filepath: Path, text: str) -> list[str]:
    """Validate a YAML file. Returns list of errors."""
    errors: list[str] = []
    # Check for tabs
    for i, line in enumerate(text.splitlines(), 1):
        if "\t" in line:
            errors.append(f"  line {i}: contains tab character (use spaces for YAML)")
        if line != line.rstrip():
            errors.append(f"  line {i}: trailing whitespace")
    # Check syntax + duplicate keys
    try:
        yaml.load(text, Loader=_StrictYAMLLoader)
    except DuplicateKeyError as e:
        errors.append(f"  {e}")
    except yaml.YAMLError as e:
        mark = getattr(e, "problem_mark", None)
        if mark:
            errors.append(f"  YAML syntax error at line {mark.line + 1}, col {mark.column + 1}: {e.problem}")
        else:
            errors.append(f"  YAML syntax error: {e}")
    # Check cron expressions in schedule fields
    for i, line in enumerate(text.splitlines(), 1):
        cron_match = re.search(r'(?:cron|schedule)\s*:\s*["\']?([*0-9/,a-zA-Z-]+(?:\s+[*0-9/,a-zA-Z-]+){4})["\']?', line)
        if cron_match:
            cron_errs = validate_cron(cron_match.group(1))
            for ce in cron_errs:
                errors.append(f"  line {i}: invalid cron '{cron_match.group(1)}': {ce}")
    return errors
 def validate_json_file(filepath: Path, text: str) -> list[str]:
    """Validate a JSON file. Returns list of errors."""
    errors: list[str] = []
    # Check syntax
    try:
        json.loads(text)
    except json.JSONDecodeError as e:
        errors.append(f"  JSON syntax error at line {e.lineno}, col {e.colno}: {e.msg}")
    # Check duplicate keys
    dup_errors = _json_has_duplicates(text)
    errors.extend(dup_errors)
    # Check for trailing whitespace (informational)
    for i, line in enumerate(text.splitlines(), 1):
        if line != line.rstrip():
            errors.append(f"  line {i}: trailing whitespace")
    # Check cron expressions
    cron_pattern = re.compile(r'"(?:cron|schedule)"?\s*:\s*"([^"]{5,})"')
    for match in cron_pattern.finditer(text):
        candidate = match.group(1).strip()
        fields = candidate.split()
        if len(fields) == 5 and all(re.match(r'^[*0-9/,a-zA-Z-]+$', f) for f in fields):
            cron_errs = validate_cron(candidate)
            for ce in cron_errs:
                errors.append(f"  invalid cron '{candidate}': {ce}")
    # Also check nested schedule objects with cron fields
    try:
        obj = json.loads(text)
        _scan_obj_for_cron(obj, errors)
    except Exception:
        pass
    return errors
 def _scan_obj_for_cron(obj, errors: list[str], path: str = ""):
    """Recursively scan dict/list for cron expressions."""
    if isinstance(obj, dict):
        for k, v in obj.items():
            if k in ("cron", "schedule", "cron_expression") and isinstance(v, str):
                fields = v.strip().split()
                if len(fields) == 5:
                    cron_errs = validate_cron(v)
                    for ce in cron_errs:
                        errors.append(f"  {path}.{k}: invalid cron '{v}': {ce}")
            _scan_obj_for_cron(v, errors, f"{path}.{k}")
    elif isinstance(obj, list):
        for i, item in enumerate(obj):
            _scan_obj_for_cron(item, errors, f"{path}[{i}]")
 def main():
    # Determine repo root (script lives in scripts/)
    script_path = Path(__file__).resolve()
    repo_root = script_path.parent.parent
    print(f"Config Validator — scanning {repo_root}")
    print("=" * 60)
    files = find_config_files(repo_root)
    print(f"Found {len(files)} config files to validate.\n")
    total_errors = 0
    failed_files: list[tuple[Path, list[str]]] = []
    for filepath in files:
        rel = filepath.relative_to(repo_root)
        try:
            text = filepath.read_text(encoding="utf-8", errors="replace")
        except Exception as e:
            failed_files.append((rel, [f"  cannot read file: {e}"]))
            total_errors += 1
            continue
        if filepath.suffix == ".json":
            errors = validate_json_file(filepath, text)
        else:
            errors = validate_yaml_file(filepath, text)
        if errors:
            failed_files.append((rel, errors))
            total_errors += len(errors)
            print(f"FAIL  {rel}")
        else:
            print(f"PASS  {rel}")
    print("\n" + "=" * 60)
    print(f"Results: {len(files) - len(failed_files)}/{len(files)} files passed")
    if failed_files:
        print(f"\n{total_errors} error(s) in {len(failed_files)} file(s):\n")
        for relpath, errs in failed_files:
            print(f"  {relpath}:")
            for e in errs:
                print(f"    {e}")
        print()
        sys.exit(1)
    else:
        print("\nAll config files valid!")
        sys.exit(0)
 if __name__ == "__main__":
    main()
--- a/tests/test_linter.py
+++ b/tests/test_linter.py
@@ -0,0 +1,233 @@
 """Tests for Architecture Linter v2.
 Validates that the linter correctly detects violations and passes clean repos.
 Refs: #437 — test-backed linter.
 """
 import json
 import sys
 import tempfile
 from pathlib import Path
 # Add scripts/ to path
 sys.path.insert(0, str(Path(__file__).resolve().parent.parent / "scripts"))
 from architecture_linter_v2 import Linter, LinterResult
 # ── helpers ───────────────────────────────────────────────────────────
 def _make_repo(tmpdir: str, files: dict[str, str], name: str = "test-repo") -> Path:
    """Create a fake repo with given files and return its path."""
    repo = Path(tmpdir) / name
    repo.mkdir()
    for relpath, content in files.items():
        p = repo / relpath
        p.parent.mkdir(parents=True, exist_ok=True)
        p.write_text(content)
    return repo
 def _run(tmpdir, files, name="test-repo"):
    repo = _make_repo(tmpdir, files, name)
    return Linter(str(repo)).run()
 # ── clean repo passes ─────────────────────────────────────────────────
 def test_clean_repo_passes():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# Test Repo\n\nThis is a clean test repo with sufficient content to pass.",
            "main.py": "print('hello world')\n",
        })
        assert result.passed, f"Expected pass but got: {result.errors}"
        assert result.violation_count == 0
 # ── missing README ────────────────────────────────────────────────────
 def test_missing_readme_fails():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {"main.py": "x = 1\n"})
        assert not result.passed
        assert any("README" in e for e in result.errors)
 def test_short_readme_warns():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {"README.md": "hi\n"})
        # Warnings don't fail the build
        assert result.passed
        assert any("short" in w.lower() for w in result.warnings)
 # ── hardcoded IPs ─────────────────────────────────────────────────────
 def test_hardcoded_public_ip_detected():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            "server.py": "HOST = '203.0.113.42'\n",
        })
        assert not result.passed
        assert any("203.0.113.42" in e for e in result.errors)
 def test_localhost_ip_ignored():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            "server.py": "HOST = '127.0.0.1'\n",
        })
        ip_errors = [e for e in result.errors if "IP" in e]
        assert len(ip_errors) == 0
 # ── API keys ──────────────────────────────────────────────────────────
 def test_openai_key_detected():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            "config.py": 'key = "sk-abcdefghijklmnopqrstuvwx"\n',
        })
        assert not result.passed
        assert any("secret" in e.lower() or "key" in e.lower() for e in result.errors)
 def test_aws_key_detected():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            "deploy.yaml": 'aws_key: AKIAIOSFODNN7EXAMPLE\n',
        })
        assert not result.passed
        assert any("secret" in e.lower() for e in result.errors)
 def test_env_example_skipped():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            ".env.example": 'OPENAI_KEY=sk-placeholder\n',
        })
        secret_errors = [e for e in result.errors if "secret" in e.lower()]
        assert len(secret_errors) == 0
 # ── sovereignty rules (v1 cloud API checks) ───────────────────────────
 def test_openai_url_detected():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            "app.py": 'url = "https://api.openai.com/v1/chat"\n',
        })
        assert not result.passed
        assert any("openai" in e.lower() for e in result.errors)
 def test_cloud_provider_detected():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            "config.yaml": "provider: openai\n",
        })
        assert not result.passed
        assert any("provider" in e.lower() for e in result.errors)
 # ── sidecar boundary ──────────────────────────────────────────────────
 def test_sovereign_keyword_in_hermes_agent_fails():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            "index.py": "import mempalace\n",
        }, name="hermes-agent")
        assert not result.passed
        assert any("sidecar" in e.lower() or "mempalace" in e.lower() for e in result.errors)
 def test_sovereign_keyword_in_other_repo_ok():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            "index.py": "import mempalace\n",
        }, name="some-other-repo")
        sidecar_errors = [e for e in result.errors if "sidecar" in e.lower()]
        assert len(sidecar_errors) == 0
 # ── SOUL.md canonical location ────────────────────────────────────────
 def test_soul_md_required_in_timmy_config():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# timmy-config\n\nConfig repo.",
        }, name="timmy-config")
        assert not result.passed
        assert any("SOUL.md" in e for e in result.errors)
 def test_soul_md_present_in_timmy_config_ok():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# timmy-config\n\nConfig repo.",
            "SOUL.md": "# Soul\n\nCanonical identity document.",
        }, name="timmy-config")
        soul_errors = [e for e in result.errors if "SOUL" in e]
        assert len(soul_errors) == 0
 def test_soul_md_in_wrong_repo_fails():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {
            "README.md": "# R\n\nGood repo.",
            "SOUL.md": "# Soul\n\nShould not be here.",
        }, name="other-repo")
        assert any("canonical" in e.lower() for e in result.errors)
 # ── LinterResult structure ────────────────────────────────────────────
 def test_result_summary_is_string():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {"README.md": "# OK repo with enough text here\n"})
        assert isinstance(result.summary(), str)
        assert "PASSED" in result.summary() or "FAILED" in result.summary()
 def test_result_repo_name():
    with tempfile.TemporaryDirectory() as tmp:
        result = _run(tmp, {"README.md": "# OK\n"}, name="my-repo")
        assert result.repo_name == "my-repo"
 # ── invalid path ──────────────────────────────────────────────────────
 def test_invalid_path_raises():
    try:
        Linter("/nonexistent/path/xyz")
        assert False, "Should have raised FileNotFoundError"
    except FileNotFoundError:
        pass
 # ── skip dirs ──────────────────────────────────────────────────────────
 def test_git_dir_skipped():
    with tempfile.TemporaryDirectory() as tmp:
        repo = _make_repo(tmp, {
            "README.md": "# R\n\nGood repo.",
            "main.py": "x = 1\n",
        })
        # Create a .git/ dir with a bad file
        git_dir = repo / ".git"
        git_dir.mkdir()
        (git_dir / "bad.py").write_text("HOST = '203.0.113.1'\n")
        result = Linter(str(repo)).run()
        git_errors = [e for e in result.errors if ".git" in e]
        assert len(git_errors) == 0
--- a/v7.0.0-checkin.md
+++ b/v7.0.0-checkin.md
@@ -0,0 +1,102 @@
     1|# Release v7.0.0 — Fleet Architecture Checkin
     2|
     3|**Date:** 2026-04-08
     4|**Tagged by:** Timmy
     5|**Previous tag:** Golden-Allegro-v6-Sonnet4
     6|
     7|## Fleet Summary
     8|
     9|| Machine | Agents | Status |
    10||---------|--------|--------|
    11|| Local Mac M3 Max | Timmy (19 processes) | HEALTHY |
    12|| Allegro VPS (167.99.126.228) | Allegro, Adagio, Ezra-A | HEALTHY (7d uptime, 43% disk) |
    13|| Ezra VPS (143.198.27.163) | Ezra | WARNING (78% disk, load 10.38) |
    14|| Bezalel VPS (159.203.146.185) | Bezalel | HEALTHY (2d uptime, 39% disk) |
    15|
    16|**Total agents running:** 6 across 4 machines
    17|
    18|## Model Configuration
    19|
    20|- Primary: claude-opus-4-6 (Anthropic)
    21|- Fallback: hermes3 (local-llama.cpp)
    22|- Fallback chain: OpenRouter claude-sonnet-4 -> local hermes3
    23|
    24|## Cron Jobs: 23 total
    25|
    26|| Status | Count |
    27||--------|-------|
    28|| Active | 15 |
    29|| Paused | 8 |
    30|
    31|Active jobs: Health Monitor, Burn Mode Orchestrator, Tower Tick, Burn Deadman,
    32|Morning Report, Evennia Report, Gitea Priority Inbox, Config Drift Guard,
    33|Gitea Event Watcher, Burndown Watcher, Mempalace Forge, Mempalace Watchtower,
    34|Ezra Health Monitor, Daily Poka-Yoke, VPS Agent Dispatch, Weekly Skill Extraction
    35|
    36|## Gitea Repos (Timmy_Foundation)
    37|
    38|| Repo | Issues | PRs | Updated | Branch |
    39||------|--------|-----|---------|--------|
    40|| the-nexus | 103 | 2 | 2026-04-08 | main |
    41|| timmy-config | 129 | 1 | 2026-04-08 | main |
    42|| timmy-home | 221 | 0 | 2026-04-08 | main |
    43|| hermes-agent | 43 | 1 | 2026-04-08 | main |
    44|| the-beacon | 23 | 0 | 2026-04-08 | main |
    45|| turboquant | 10 | 0 | 2026-04-01 | main |
    46|| the-door | 2 | 0 | 2026-04-06 | main |
    47|| wolf | 2 | 0 | 2026-04-05 | main |
    48|| the-testament | 0 | 0 | 2026-04-07 | main |
    49|| timmy-academy | 1 | 0 | 2026-04-04 | master |
    50|| .profile | 0 | 0 | 2026-04-07 | main |
    51|
    52|**Total open issues across fleet: 534**
    53|**Total open PRs: 4**
    54|
    55|## Health Alerts
    56|
    57|1. WARN: Ezra VPS disk 78% (120G/154G) — needs cleanup
    58|2. WARN: Ezra VPS load avg 10.38 — high for 2-core box
    59|3. INFO: 8 paused cron jobs (expected — non-essential overnight jobs)
    60|
    61|## What's Working
    62|
    63|- All 4 machines reachable
    64|- All core services running
    65|- Config drift guard active
    66|- Gitea event watcher active
    67|- Dead man switch active
    68|- Tower world ticking (tick 2045+)
    69|- Morning reports delivering
    70|- Mempalace analysis running
    71|- VPS agent dispatch operational
    72|
    73|## Architecture
    74|
    75|```
    76|        Alexander (Principal)
    77|              |
    78|         [Telegram]
    79|              |
    80|    Timmy (Mac M3 Max) ---- Local llama.cpp (hermes3)
    81|         /    |    \
    82|        /     |     \
    83|  Allegro   Ezra   Bezalel
    84|  (DO VPS)  (DO VPS) (DO VPS)
    85|  3 agents  1 agent  1 agent
    86|
    87|  Gitea Forge: forge.alexanderwhitestone.com
    88|  Evennia Tower: localhost:4000/4001
    89|  RunPod L40S: 8lfr3j47a5r3gn (Big Brain)
    90|```
    91|
    92|## Release Notes
    93|
    94|This is the first versioned release tag (v7.0.0), transitioning from named
    95|golden tags to semantic versioning. Previous tags preserved:
    96|- Golden-Allegro-v6-Sonnet4
    97|- burnup-20260405-infra
    98|- SonOfTimmy-v5-FINAL
    99|- SonOfTimmy-v4
   100|- GoldenRockachopa
   101|- pre-agent-workers-v1
   102|
Author	SHA1	Message	Date
Alexander Whitestone	49b0b9d207	feat: add config validator script Some checks failed Architecture Lint / Linter Tests (pull_request) Successful in 7s Details PR Checklist / pr-checklist (pull_request) Failing after 1m8s Details Architecture Lint / Lint Repository (pull_request) Failing after 7s Details scripts/config_validator.py — standalone validator for all YAML/JSON config files in the repo. Checks: - YAML syntax (pyyaml safe_load) - JSON syntax (json.loads) - Duplicate keys in YAML/JSON - Trailing whitespace - Tabs in YAML (should use spaces) - Cron expression validity (if present) Reports PASS/FAIL per file with line numbers. Exit 0 if all valid, 1 if any invalid.	2026-04-10 07:13:17 -04:00
Alexander Whitestone	3fd4223e1e	Merge pull request #424 Some checks failed Architecture Lint / Linter Tests (push) Successful in 9s Details Architecture Lint / Lint Repository (push) Failing after 6s Details Merged PR #424	2026-04-10 09:37:46 +00:00
Alexander Whitestone	d8f88bed16	Merge pull request #449 Some checks failed Architecture Lint / Linter Tests (push) Has been cancelled Details Architecture Lint / Lint Repository (push) Has been cancelled Details Merged PR #449	2026-04-10 09:37:44 +00:00
Timmy Time	a01935825c	Merge branch 'main' into timmy/v7.0.0-checkin Some checks failed Architecture Lint / Linter Tests (pull_request) Successful in 7s Details PR Checklist / pr-checklist (pull_request) Failing after 1m12s Details Architecture Lint / Lint Repository (pull_request) Failing after 7s Details	2026-04-10 09:37:40 +00:00
Timmy Time	544f2a9729	Merge branch 'main' into ansible-iac Some checks failed Architecture Lint / Linter Tests (pull_request) Successful in 8s Details PR Checklist / pr-checklist (pull_request) Failing after 1m43s Details Architecture Lint / Lint Repository (pull_request) Failing after 6s Details	2026-04-10 09:37:38 +00:00
Alexander Whitestone	fa9e83ac95	Merge pull request #425 Some checks failed Architecture Lint / Linter Tests (push) Has been cancelled Details Architecture Lint / Lint Repository (push) Has been cancelled Details Merged PR #425	2026-04-10 09:36:29 +00:00
Timmy Time	28317cbde9	Merge branch 'main' into timmy/v7.0.0-checkin Some checks failed Architecture Lint / Linter Tests (pull_request) Successful in 7s Details PR Checklist / pr-checklist (pull_request) Failing after 1m11s Details Architecture Lint / Lint Repository (pull_request) Failing after 6s Details	2026-04-10 09:36:27 +00:00
Timmy Time	6e5f1f6a22	Merge branch 'main' into timmy/deadman-fallback Some checks failed Architecture Lint / Linter Tests (pull_request) Successful in 8s Details PR Checklist / pr-checklist (pull_request) Failing after 1m11s Details Architecture Lint / Lint Repository (pull_request) Failing after 6s Details	2026-04-10 09:36:25 +00:00
Alexander Whitestone	2677e1c796	Merge pull request #453 Some checks failed Architecture Lint / Linter Tests (push) Has been cancelled Details Architecture Lint / Lint Repository (push) Has been cancelled Details Merged PR #453	2026-04-10 09:36:22 +00:00
Timmy Time	e124ff8b05	Merge branch 'main' into ansible-iac Some checks failed PR Checklist / pr-checklist (pull_request) Failing after 1m40s Details	2026-04-10 09:36:21 +00:00
Timmy Time	836849ffeb	Merge branch 'main' into burn/20260409-1926-linter-v2 Some checks failed Architecture Lint / Linter Tests (pull_request) Successful in 10s Details PR Checklist / pr-checklist (pull_request) Failing after 1m13s Details Architecture Lint / Lint Repository (pull_request) Failing after 6s Details	2026-04-10 09:36:17 +00:00
Timmy Time	eb7ca1f96f	Merge pull request 'burn: Add proof-driven PR template (closes #451 )' (#454 ) from burn/20260410-0018-451-pr-template into main Merge PR #454: burn: Add proof-driven PR template (closes #451)	2026-04-10 09:35:25 +00:00
Alexander Whitestone	641db62112	burn: Add proof-driven PR template (.gitea/PULL_REQUEST_TEMPLATE.md) All checks were successful PR Checklist / pr-checklist (pull_request) Successful in 1m9s Details Closes #451. Enforces the CONTRIBUTING.md proof standard at PR authoring time: summary, linked issue, acceptance criteria, proof evidence, risk and rollback. Aligns with existing bin/pr-checklist.py CI gate.	2026-04-10 00:20:37 -04:00
Alexander Whitestone	b38871d4cd	Merge pull request #439 Merged PR #439	2026-04-10 03:43:52 +00:00
timmy-bot	ee025957d9	fix: architecture_linter_v2 — repo-aware, test-backed, CI-enforced (#437 ) Some checks failed Architecture Lint / Linter Tests (pull_request) Successful in 28s Details PR Checklist / pr-checklist (pull_request) Successful in 4m25s Details Architecture Lint / Lint Repository (pull_request) Failing after 21s Details - Fix broken API_KEY_REGEX in linter_v2.py (was invalid regex causing runtime crash) - Fix syntax error in architecture_linter.py (malformed character class) - Add --repo flag and --json output to linter_v2 - Add LinterResult class for structured programmatic access - Port v1 sovereignty rules (cloud API endpoint/provider checks) into v2 - Skip .git, node_modules, __pycache__ dirs; skip .env.example files - Add tests/test_linter.py (19 tests covering all checks) - Add .gitea/workflows/architecture-lint.yml for CI enforcement - All files pass python3 -m py_compile Refs: #437	2026-04-09 19:29:33 -04:00
Perplexity	7ec45642eb	feat(ansible): Canonical IaC playbook for fleet management Some checks failed PR Checklist / pr-checklist (pull_request) Failing after 1m27s Details Implements the Ansible Infrastructure as Code story from KT 2026-04-08. One canonical Ansible playbook defines: - Deadman switch (snapshot good config on health, rollback+restart on death) - Golden state config deployment (Anthropic BANNED, Kimi→Gemini→Ollama) - Cron schedule (source-controlled, no manual crontab edits) - Agent startup sequence (pull→validate→start→verify) - request_log telemetry table (every inference call logged) - Thin config pattern (immutable local pointer to upstream) - Gitea webhook handler (deploy on merge) - Config validator (rejects banned providers) Fleet inventory: Timmy (Mac), Allegro (VPS), Bezalel (VPS), Ezra (VPS) Roles: wizard_base, golden_state, deadman_switch, request_log, cron_manager Addresses: timmy-config #442, #443, #444, #445, #446 References: KT Final 2026-04-08 P2, KT Bezalel 2026-04-08 #1-#5	2026-04-09 22:25:31 +00:00
Timmy Time	763e35f47a	feat: dead man switch config fallback engine Some checks failed PR Checklist / pr-checklist (pull_request) Failing after 3m11s Details Automatic fallback chain: Anthropic -> local-llama.cpp -> Ollama -> safe mode. Auto-recovery when primary returns. Reversible config changes with backup.	2026-04-08 21:54:42 +00:00
Timmy Time	a31f58000b	v7.0.0: Fleet architecture checkin — 6 agents alive, release tagging begins Some checks failed PR Checklist / pr-checklist (pull_request) Failing after 2m53s Details	2026-04-08 21:44:53 +00:00