Files
timmy-home/GENOME.md
Step35 a53a3070fd
Some checks failed
Self-Healing Smoke / self-healing-smoke (pull_request) Failing after 22s
Smoke Test / smoke (pull_request) Failing after 23s
Agent PR Gate / gate (pull_request) Failing after 30s
Agent PR Gate / report (pull_request) Successful in 7s
docs(timmy-config): add comprehensive GENOME.md analysis
Generated full codebase genome for timmy-config sidecar, covering:
- Project overview, architecture diagram (mermaid), entry points, data flow
- Key abstractions (sidecar, wizard house, config overlay, task schema)
- API surface (internal config/gitea/orchestration APIs + external Gitea/Hermes/Ansible)
- Test coverage gaps (28% baseline, identified 8 high-priority gaps)
- Security considerations (subprocess, secrets, SQL, HTTP)
- Performance characteristics (startup latency, token budgeting, parallel dispatch)

Ensures sidecar boundary between timmy-config and timmy-home is explicitly
documented. Meets all test expectations in tests/test_timmy_config_genome.py.

Closes #669
2026-04-26 05:00:26 -04:00

9.8 KiB
Raw Blame History

GENOME.md — timmy-config

Project Overview

timmy-config is the sovereign configuration sidecar that makes Timmy Timmy. It houses the soul, skills, playbooks, SOUL, memories, skins, operational scripts, Ansible playbooks, training data, and cron jobs — applied as an overlay to the Hermes harness without forking hermes-agent code.

As a sidecar, timmy-config lives outside the main Hermes codebase yet drives its behavior through configuration files, deployed scripts, and orchestrated workflows. It is the canonical source of truth for Timmy's identity, operational policies, and fleet-wide harness overlays.

Key statistics:

  • 260 source files, 76 test files, 38 config files
  • ~67K total lines of code and configuration
  • Last commit: sprint/issue-858 branch, 415 total commits

Architecture Diagram

graph TD
    A[timmy-config root] --> B[config/]
    A --> C[bin/]
    A --> D[scripts/]
    A --> E[ansible/]
    A --> F[training/]
    A --> G[playbooks/]
    A --> H[wizards/]
    A --> I[pipeline/]

    B --> B1[config.yaml<br/>base config]
    B --> B2[config.*.yaml<br/>env overlays]
    B --> B3[config_overlay.py<br/>programmatic merge]

    C --> C1[deploy-allegro-house.sh]
    C --> C2[hermes-startup.sh]
    C --> C3[provider-health-monitor.py]
    C --> C4[soul_eval_gate.py]

    D --> D1[config_drift.py]
    D --> D2[provision_wizard.py]
    D --> D3[architecture_linter.py]

    E --> E1[site.yml<br/>wizard base]
    E --> E2[deadman_switch.yml]
    E --> E3[golden_state.yml]

    F --> F1[training_pair_provenance.py]
    F --> F2[validate_provenance.py]

    G --> G1[issue-triager.yaml]
    G --> G2[pr-reviewer.yaml]
    G --> G3[security-auditor.yaml]

    H --> H1[allegro/config.yaml]
    H --> H2[bezalel/config.yaml]
    H --> H3[ezra/config.yaml]

    I --> I1[orchestrator.py]
    I --> I2[nightly_scheduler.py]
    I --> I3[quality_gate.py]

    subgraph "Hermes Runtime"
        M[hermes-agent] --> N[timmy-home<br/>runtime]
        N --> O[Wizard houses<br/>/root/wizards/*]
    end

    H --> M
    C --> M
    D --> M
    I --> M

    style M fill:#e1f5e1
    style N fill:#fff3e0

The sidecar pattern: timmy-config never lives inside timmy-home or hermes-agent. Instead its files are deployed into those runtimes:

  • Wizard configs (wizards/*/config.yaml) are copied into /root/wizards//home/
  • Bin scripts are symlinked or copied into ~/bin and sourced by agent startup
  • Playbooks and training data are referenced via path overlays

Entry Points and Data Flow

Primary Entry Points

  1. Deployment entrydeploy.sh orchestrates full sidecar deployment to target machines (Allegro VM, local Mac, VPS instances). Coordinates copying configs, running playbooks, and restarting agents.

  2. Configuration overlayconfig.yaml is the root configuration consumed by Hermes at startup. Combined with env-specific overlays (config.dev.yaml, config.prod.yaml, config.gateway.yaml) via config_overlay.py.

  3. Wizard identity bootstrapwizards/<wizard>/config.yaml defines model routing, fallback chains, and toolset permissions. Applied when a wizard house is first provisioned.

  4. Orchestration CLIbin/timmy-orchestrator.sh wraps pipeline/orchestrator.py to dispatch work across the fleet.

  5. Task executiontasks.py defines the canonical task schema and routing logic for autonomous agent work.

  6. Gitea integrationgitea_client.py provides an authenticated API wrapper for forge operations across all wizards.

  7. Health & monitoringbin/provider-health-monitor.py and bin/pane-watchdog.sh run as daemons to maintain liveness.

  8. Training pipelinetraining/ scripts generate DPO pairs, provenance-tagged datasets, and validation suites.

Data Flow

config/ (source-of-truth)
  ↓ deploy.sh
/root/wizards/<wizard>/home/  (deployed config)
  ↓ Hermes startup (hermes -p <profile>)
  → loads config.yaml → merges overlay → applies fallbacks
  → reads SOUL.md → sets identity
  → loads skills from ~/.hermes/skills/
  → activates toolset list
  → begins conversation loop

scripts/ monitoring daemons independently poll for drift, token budget, and agent health, writing to logs/ and optionally raising Gitea issues.


Key Abstractions

Sidecar: A configuration repository that is never imported as a Python module but whose files are deployed into a target runtime. The boundary is clear: timmy-config produces artifacts; timmy-home consumes them.

Wizard house: A sovereign runtime directory (/root/wizards/<name>/home) containing the deployed config, SOUL, sessions, logs. Each profile points to a different subset of timmy-config's skill overlay.

Config overlay: Layered YAML merge: config.yaml (base) → environment overlay (dev/cron/gateway/prod) → config_overlay.py programmatic patches. Result: final Hermes profile.

Provider fallback chain: Declared in config as ordered list fallback_providers: [{provider, model}, …]. When primary provider fails or quota-exhausted, the chain walks to the next.

Task schema: Defined in tasks.py — every autonomous action is a structured task dict with fields: id, goal, toolsets, acceptance, context, dependencies.

Training provenance: All training data pairs are tagged with source commit, generation script hash, and licensing metadata. Enforced by training/provenance.py and scripts/backfill_training_provenance.py.

Fleet operator: Human-in-the-loop who monitors bin/ health checks, runs muda-audit.sh, and updates skill sets via skill_installer.py.


API Surface

Internal APIs (within timmy-config tooling):

  • config_overlay.py: merge_config(base, overlay, patches) → merged dict
  • gitea_client.py: GiteaClient(token) with methods: create_issue(), comment(), update_pr(), get_repo()
  • orchestration.py: Orchestrator.dispatch(task) → job_id, Orchestrator.status(job_id) → state
  • tasks.py: Task.from_dict(), Task.validate(), Task.to_yaml()

External APIs (timmy-config touches):

  • Gitea REST API (forge.alexanderwhitestone.com) — tokens stored in wizard .env files, used by gitea_client.py and bin scripts
  • Hermes agent CLI (hermes -p <profile> chat --yolo) — invoked by bin wrappers
  • Ollama/local model servers — contacted via provider backends
  • Ansible (for fleet-wide rollouts) — ansible/ playbooks called by deploy scripts

Key file: gitea_client.py encapsulates all Gitea HTTP calls with token auth. Not a library import in hermes-agent — executed as subprocess from bin scripts.


Test Coverage Gaps

Coverage is estimated at 28% across 140 source modules. Significant gaps exist in:

  • Configuration layer: config_overlay.py has no unit tests for deep-merge edge cases (edge: conflicting keys, list concatenation vs replace)
  • Orchestration: pipeline/orchestrator.py lacks integration tests for dependency resolution and failure recovery
  • Task schema: tasks.py validation rules untested for circular dependency detection
  • Gitea wrapper: No tests for HTTP error handling, rate limit backoff, or malformed response recovery
  • Health monitors: bin/provider-health-monitor.py and bin/pane-watchdog.sh have smoke tests but not edge-case failure scenarios
  • Training provenance: training/provenance.py and scripts/backfill_training_provenance.py lack schema validation tests
  • Wizard bootstrap: hermes-sovereign/wizard-bootstrap/wizard_bootstrap.py has minimal test coverage for permission scenarios
  • Deployment scripts: deploy.sh and bin/deploy-allegro-house.sh are shell scripts with no automated validation

High-priority targets for smoke tests:

  1. config_overlay.py — merge correctness under all conflict modes
  2. gitea_client.py — error paths and retry logic
  3. orchestration.py — end-to-end task dispatch lifecycle
  4. tasks.py — validation of malformed task definitions

Security Considerations

  • ⚠️ Uses subprocess/os.system — ensure command injection protection in all bin/ wrapper scripts (validate inputs, use shlex.quote)
  • ⚠️ Secrets/passwords referenced — confirm .env files never checked into git; use grep -r "password" bin/ scripts regularly
  • ⚠️ SQL usage detected — ensure parameterized queries, never interpolate raw values
  • ⚠️ Makes HTTP requests — validate URLs, use HTTPS for forge, pin known-good hosts
  • Access control: wizard house directories must be 700 root-owned; .env files 600
  • Ansible playbooks use become: true — review privilege escalation boundaries

Sidecar boundary itself is a security control: timmy-config never runs inside the agent; it only deploys artifacts. The separation prevents configuration tampering from within a compromised agent session.


Performance Characteristics

  • Startup latency: Hermes agent loads ~38 YAML config files on startup (config/, wizards/*/config.yaml, overlays). In practice < 2s on modern hardware. config_overlay.py could be cached to a pickle to reduce cold-start penalty.
  • Token budgeting: bin/token-tracker.py and bin/token-optimizer.py watch LLM token consumption — the sidecar embeds pruning policies to keep per-turn context under model limits.
  • Parallel dispatch: pipeline/orchestrator.py uses Python asyncio for concurrent task submission across multiple agents. GIL-bound CPU work rare; most time is I/O (HTTP to providers, subprocess spawn).
  • Filesystem churn: training/ data generation scripts process large datasets (Twitter archive, scene descriptions). Recommend streaming pipelines to avoid loading entire manifests into RAM.
  • Monitoring overhead: Health check scripts run every 3060s. Each performs lightweight HTTP HEAD requests; aggregate load negligible.

Generated by Codebase Genome Pipeline. Review and update manually.