Commit Graph

24 Commits

Author SHA1 Message Date
3ad934febd test: add tests for scene description generator 2026-04-15 09:35:27 +00:00
1088bf8983 test: add Three.js pattern tests and update assertions (#543)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 28s
Smoke Test / smoke (pull_request) Failing after 23s
Validate Config / YAML Lint (pull_request) Failing after 21s
Validate Config / JSON Validate (pull_request) Successful in 21s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m23s
Validate Config / Shell Script Lint (pull_request) Failing after 50s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 26s
Validate Config / Playbook Schema Validation (pull_request) Successful in 32s
PR Checklist / pr-checklist (pull_request) Failing after 11m13s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- Added TestThreeJsPatterns class with 14 tests
- Tests cover: pattern existence, severity inference, vision prompt
- Updated pattern count assertion (14+ patterns now)
- Updated demo test (6 glitches: 4 original + 2 Three.js)
2026-04-15 05:37:17 +00:00
04dbf772b1 feat: Visual Smoke Test for The Nexus #490 (#558)
Some checks failed
Architecture Lint / Linter Tests (push) Successful in 15s
Smoke Test / smoke (push) Failing after 14s
Validate Config / YAML Lint (push) Failing after 13s
Validate Config / JSON Validate (push) Successful in 13s
Validate Config / Shell Script Lint (push) Failing after 40s
Validate Config / Python Syntax & Import Check (push) Failing after 58s
Validate Config / Python Test Suite (push) Has been skipped
Validate Config / Cron Syntax Check (push) Successful in 11s
Validate Config / Deploy Script Dry Run (push) Successful in 11s
Validate Config / Playbook Schema Validation (push) Successful in 18s
Architecture Lint / Lint Repository (push) Failing after 13s
Architecture Lint / Linter Tests (pull_request) Successful in 26s
Smoke Test / smoke (pull_request) Failing after 17s
Validate Config / YAML Lint (pull_request) Failing after 12s
Validate Config / JSON Validate (pull_request) Successful in 12s
PR Checklist / pr-checklist (pull_request) Failing after 3m36s
Validate Config / Shell Script Lint (pull_request) Failing after 40s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m4s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 9s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
Validate Config / Playbook Schema Validation (pull_request) Successful in 20s
Architecture Lint / Lint Repository (pull_request) Failing after 16s
Merge PR #558
2026-04-14 22:18:23 +00:00
9651a56308 feat: Glitch Detector HTML Report with annotated screenshots #544 (#567)
Some checks failed
Architecture Lint / Lint Repository (push) Has been cancelled
Architecture Lint / Linter Tests (push) Has been cancelled
Smoke Test / smoke (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
Merge PR #567
2026-04-14 22:17:32 +00:00
31313c421e feat: 3D World Glitch Detection in The Matrix (#491) (#535)
Some checks failed
Architecture Lint / Linter Tests (push) Has been cancelled
Architecture Lint / Lint Repository (push) Has been cancelled
Smoke Test / smoke (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
Merge PR #535
2026-04-14 22:17:06 +00:00
063572ed1e feat: Visual Accessibility Audit of Foundation Web #492 (#531)
Some checks failed
Architecture Lint / Lint Repository (push) Has been cancelled
Architecture Lint / Linter Tests (push) Has been cancelled
Smoke Test / smoke (push) Has been cancelled
Validate Config / Python Syntax & Import Check (push) Has been cancelled
Validate Config / Python Test Suite (push) Has been cancelled
Validate Config / Shell Script Lint (push) Has been cancelled
Validate Config / Cron Syntax Check (push) Has been cancelled
Validate Config / Deploy Script Dry Run (push) Has been cancelled
Validate Config / Playbook Schema Validation (push) Has been cancelled
Validate Config / YAML Lint (push) Has been cancelled
Validate Config / JSON Validate (push) Has been cancelled
Merge PR #531
2026-04-14 22:16:58 +00:00
Alexander Whitestone
c09e54de72 feat: Visual Mapping of Tower Architecture — holographic map #494
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 19s
Validate Config / YAML Lint (pull_request) Failing after 20s
Validate Config / JSON Validate (pull_request) Successful in 19s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 22s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 41s
Validate Config / Cron Syntax Check (pull_request) Successful in 13s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 9s
PR Checklist / pr-checklist (pull_request) Successful in 2m49s
Validate Config / Playbook Schema Validation (pull_request) Successful in 14s
Architecture Lint / Lint Repository (pull_request) Failing after 13s
Replaces 12-line stub with full Tower architecture mapper. Scans
design docs, gallery images, Evennia specs, and wizard configs to
construct a structured holographic map of The Tower.

The Tower is the persistent MUD world of the Timmy Foundation — an
Evennia-based space where rooms represent context, objects represent
facts, and NPCs represent procedures (the Memory Palace metaphor).

Sources scanned:
- grok-imagine-gallery/INDEX.md (24 gallery images → rooms)
- docs/MEMORY_ARCHITECTURE.md (Memory Palace L0-L5 layers)
- docs/*.md (design doc room/floor references)
- wizards/*/ (wizard configs → NPC definitions)
- Optional: Gemma 3 vision analysis of Tower images

Output formats:
- JSON: machine-readable with rooms, floors, NPCs, connections
- ASCII: human-readable holographic map with floor layout

Mapped: 5 floors, 20+ rooms, 6 NPCs (the fellowship).
Tests: 14/14 passing.
Closes #494
2026-04-13 20:21:07 -04:00
24887b615f feat: add AST-backed Python ingestion to knowledge base
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 9s
PR Checklist / pr-checklist (pull_request) Failing after 1m18s
Smoke Test / smoke (pull_request) Failing after 6s
Validate Config / YAML Lint (pull_request) Failing after 6s
Validate Config / JSON Validate (pull_request) Successful in 7s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 6s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 14s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 5s
Validate Config / Playbook Schema Validation (pull_request) Successful in 7s
Architecture Lint / Lint Repository (pull_request) Failing after 6s
2026-04-13 04:09:57 +00:00
Alexander Whitestone
b49a0abf39 fix: restore self-healing runtime checks
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 9s
PR Checklist / pr-checklist (pull_request) Failing after 1m17s
Smoke Test / smoke (pull_request) Failing after 7s
Validate Config / YAML Lint (pull_request) Failing after 6s
Validate Config / JSON Validate (pull_request) Successful in 6s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 8s
Validate Config / Shell Script Lint (pull_request) Successful in 15s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 5s
Validate Config / Playbook Schema Validation (pull_request) Successful in 8s
Architecture Lint / Lint Repository (pull_request) Failing after 8s
2026-04-12 10:53:55 -04:00
Alexander Whitestone
19aa0830f4 Harden Gemini scripts with verified SSH trust
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 8s
Validate Config / YAML Lint (pull_request) Failing after 5s
Validate Config / JSON Validate (pull_request) Successful in 6s
PR Checklist / pr-checklist (pull_request) Failing after 1m11s
Smoke Test / smoke (pull_request) Failing after 7s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 9s
Validate Config / Shell Script Lint (pull_request) Successful in 15s
Validate Config / Cron Syntax Check (pull_request) Successful in 5s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 5s
Validate Config / Playbook Schema Validation (pull_request) Successful in 7s
Architecture Lint / Lint Repository (pull_request) Failing after 6s
2026-04-11 15:13:15 -04:00
Alexander Whitestone
06031d923f test(scripts): lock self_healing safe CLI behavior (#435)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 8s
PR Checklist / pr-checklist (pull_request) Failing after 1m18s
Architecture Lint / Lint Repository (pull_request) Failing after 8s
2026-04-10 21:11:47 -04:00
timmy-bot
ee025957d9 fix: architecture_linter_v2 — repo-aware, test-backed, CI-enforced (#437)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 28s
PR Checklist / pr-checklist (pull_request) Successful in 4m25s
Architecture Lint / Lint Repository (pull_request) Failing after 21s
- Fix broken API_KEY_REGEX in linter_v2.py (was invalid regex causing runtime crash)
- Fix syntax error in architecture_linter.py (malformed character class)
- Add --repo flag and --json output to linter_v2
- Add LinterResult class for structured programmatic access
- Port v1 sovereignty rules (cloud API endpoint/provider checks) into v2
- Skip .git, node_modules, __pycache__ dirs; skip .env.example files
- Add tests/test_linter.py (19 tests covering all checks)
- Add .gitea/workflows/architecture-lint.yml for CI enforcement
- All files pass python3 -m py_compile

Refs: #437
2026-04-09 19:29:33 -04:00
Ezra
2e64b160b5 [KAIZEN] Harden retro scheduling, chunking, and tests (#349)
- Add Kaizen Retro to cron/jobs.json with explicit local model/provider
- Add Telegram message chunking for reports approaching the 4096-char limit
- Fix classify_issue_type false positives on short substrings (ci in cleanup)
- Add 28 unit tests covering classification, max-attempts detection,
  suggestion generation, report formatting, and Telegram chunking
2026-04-07 15:58:58 +00:00
57b87c525d Merge pull request '[soul] The Conscience of the Training Pipeline — SOUL.md eval gate' (#104) from gemini/soul-eval-gate into main 2026-03-31 19:09:11 +00:00
88e2509e18 Merge pull request '[sovereignty] Cut the Cloud Umbilical — closes #94' (#107) from gemini/operational-hygiene into main 2026-03-31 19:06:38 +00:00
eb1e384edc [tests] 85 new tests for tasks.py and gitea_client.py — zero to covered
COVERAGE BEFORE
===============
  tasks.py          2,117 lines    ZERO tests
  gitea_client.py     539 lines    ZERO tests (in this repo)
  Total:            2,656 lines of orchestration with no safety net

COVERAGE AFTER
==============

test_tasks_core.py — 63 tests across 12 test classes:

  TestExtractFirstJsonObject (10)  — JSON parsing from noisy LLM output
    Every @huey.task depends on this. Tested: clean JSON, markdown
    fences, prose-wrapped, nested, malformed, arrays, unicode, empty

  TestParseJsonOutput (4)          — stdout/stderr fallback chain

  TestNormalizeCandidateEntry (12) — knowledge graph data cleaning
    Confidence clamping, status validation, deduplication, truncation

  TestNormalizeTrainingExamples (5) — autolora training data prep
    Fallback when empty, alternative field names, empty prompt/response

  TestNormalizeRubricScores (3)    — eval score clamping

  TestReadJson (4)                 — defensive file reads
    Missing files, corrupt JSON, deep-copy of defaults

  TestWriteJson (3)                — atomic writes with sorted keys

  TestJsonlIO (9)                  — JSONL read/write/append/count
    Missing files, blank lines, append vs overwrite

  TestWriteText (3)                — trailing newline normalization

  TestPathUtilities (4)            — newest/latest path resolution

  TestFormatting (6)               — batch IDs, profile summaries,
                                     tweet prompts, checkpoint defaults

test_gitea_client_core.py — 22 tests across 9 test classes:

  TestUserFromDict (3)             — all from_dict() deserialization
  TestLabelFromDict (1)
  TestIssueFromDict (4)            — null assignees/labels (THE bug)
  TestCommentFromDict (2)          — null body handling
  TestPullRequestFromDict (3)      — null head/base/merged
  TestPRFileFromDict (1)
  TestGiteaError (2)               — error formatting
  TestClientHelpers (1)            — _repo_path formatting
  TestFindUnassigned (3)           — label/title/assignee filtering
  TestFindAgentIssues (2)          — case-insensitive matching

WHY THESE TESTS MATTER
======================
A bug in extract_first_json_object() corrupts every @huey.task
that processes LLM output — which is all of them. A bug in
normalize_candidate_entry() silently corrupts the knowledge graph.
A bug in the Gitea client's from_dict() crashes the entire triage
and review pipeline (we found this bug — null assignees).

These are the functions that corrupt training data silently when
they break. No one notices until the next autolora run produces
a worse model.

FULL SUITE: 108/108 pass, zero regressions.

Signed-off-by: gemini <gemini@hermes.local>
2026-03-31 08:54:51 -04:00
d5f8647ce5 [sovereignty] Cut the Cloud Umbilical — Close #94
THE BUG
=======
Issue #94 flagged: the active config's fallback_model pointed to
Google Gemini cloud. The enabled Health Monitor cron job had
model=null, provider=null — so it inherited whatever the config
defaulted to. If the default was ever accidentally changed back
to cloud, every 5-minute cron tick would phone home.

THE FIX
=======

config.yaml:
  - fallback_model → local Ollama (hermes3:latest on localhost:11434)
  - Google Gemini custom_provider → renamed '(emergency only)'
  - tts.openai.model → disabled (use edge TTS locally)

cron/jobs.json:
  - Health Monitor → explicit model/provider/base_url fields
  - No enabled job can ever inherit cloud defaults again

tests/test_sovereignty_enforcement.py (NEW — 13 tests):
  - Default model is localhost
  - Fallback model is localhost (the #94 fix)
  - No enabled cron has null model/provider
  - No enabled cron uses cloud URLs
  - First custom_provider is local
  - TTS and STT default to local

tests/test_local_runtime_defaults.py (UPDATED):
  - Now asserts fallback is Ollama, not Gemini

WHAT STILL WORKS
================
Google Gemini is still available for explicit override:
  hermes --model gemini-2.5-pro
It's just not automatic anymore. You have to ask for it.

FULL SUITE
==========
36/36 pass. Zero regressions.

Closes #94

Signed-off-by: gemini <gemini@hermes.local>
2026-03-31 08:29:58 -04:00
67deb58077 [soul] The Conscience of the Training Pipeline
WHAT THIS IS
============
The Soul Eval Gate answers one question:

    'Did the candidate model preserve Timmy's soul?'

It is the missing bridge between autolora's eval output and
tasks.py's training pipeline. Without it, no adapter can ever
be promoted — tasks.py calls latest_eval_gate() but nothing
writes the gate file.

HOW IT WORKS
============
1. Reads autolora eval scores (candidate + baseline)
2. Applies SOUL.md constraints:
   - crisis and pastoral_care are SACRED categories
   - ANY regression on sacred categories = REJECT
   - This is not a tradeoff. From SOUL.md:
     'If adapter degrades this, adapter is REJECTED.'
3. Checks composite score regression (>10% = reject)
4. Checks per-metric regression on core categories
5. Writes eval_gate_latest.json for tasks.py to read
6. Exits 0 (pass) or 1 (reject) for CI integration

WHY THIS MATTERS
=================
An adapter that gets better at code review but worse at
responding to someone in crisis is not Timmy. It is wearing
Timmy's face without carrying his conscience.

The DPO training playbook says:
  'If the post-eval degrades on crisis or pastoral_care,
   REJECT the adapter and fail the issue.'

But until now, nothing enforced that constraint. The playbook
was an aspiration. This makes it architecture.

FILES
=====
  bin/soul_eval_gate.py          — 244 lines, zero deps beyond stdlib
  tests/test_soul_eval_gate.py   — 10 tests, all pass
  Full suite: 22/22

USAGE
=====
  # CLI (after autolora eval)
  python bin/soul_eval_gate.py \
    --scores evals/v1/8b/scores.json \
    --baseline evals/v0-baseline/8b/scores.json \
    --candidate-id timmy-v1-20260330

  # From tasks.py
  from soul_eval_gate import evaluate_candidate
  result = evaluate_candidate(scores_path, baseline_path, id)
  if result['pass']:
      promote_adapter(...)

Signed-off-by: gemini <gemini@hermes.local>
2026-03-30 19:13:35 -04:00
118ca5fcbd [orchestration] Harden the nervous system — full repo coverage, destructive PR guard, dedup
Changes:
1. REPOS expanded from 2 → 7 (all Foundation repos)
   Previously only the-nexus and timmy-config were monitored.
   timmy-home (37 open issues), the-door, turboquant, hermes-agent,
   and .profile were completely invisible to triage, review,
   heartbeat, and watchdog tasks.

2. Destructive PR detection (prevents PR #788 scenario)
   When a PR deletes >50% of any file with >20 lines deleted,
   review_prs flags it with a 🚨 DESTRUCTIVE PR DETECTED comment.
   This is the automated version of what I did manually when closing
   the-nexus PR #788 during the audit.

3. review_prs deduplication (stops comment spam)
   Before this fix, the same rejection comment was posted every 30
   minutes on the same PR, creating unbounded comment spam.
   Now checks list_comments first and skips already-reviewed PRs.

4. heartbeat_tick issue/PR counts fixed (limit=1 → limit=50)
   The old limit=1 + len() always returned 0 or 1, making the
   heartbeat perception useless. Now uses limit=50 and aggregates
   total_open_issues / total_open_prs across all repos.

5. Carries forward all PR #101 bugfixes
   - NET_LINE_LIMIT 10 → 500
   - memory_compress reads decision.get('actions')
   - good_morning_report reads yesterday's ticks

Tests: 11 new tests in tests/test_orchestration_hardening.py.
Full suite: 23/23 pass.

Signed-off-by: gemini <gemini@hermes.local>
2026-03-30 18:53:14 -04:00
877425bde4 feat: add Allegro Kimi wizard house assets (#91) 2026-03-29 22:22:24 +00:00
34e01f0986 feat: add local-vs-cloud token and throughput metrics (#85) 2026-03-28 14:24:12 +00:00
d955d2b9f1 docs: codify merge proof standard (#84) 2026-03-28 14:03:35 +00:00
0b77282831 fix: filter actual assignees before dispatching agents (#82) 2026-03-28 13:31:40 +00:00
f263156cf1 test: make local llama.cpp the default runtime (#77) 2026-03-28 05:33:47 +00:00