Files
timmy-config/training/data/preference_pairs.jsonl

51 lines
1.6 MiB
Plaintext
Raw Normal View History

{"prompt": "Fixes #53\n\n## What this does\n\nWhen an agent PR has merge conflicts, Timmy now attempts an automated `git rebase` before closing the PR \u2014 preserving agent work instead of discarding it.\n\n## Resolution strategy\n\n1. **Clean rebase** \u2192 force-push and merge automatically\n2. **Conflicts in non-critical files only** (style.css, README, etc.) \u2192 resolve with `git checkout --theirs`, force-push, merge\n3. **Conflicts in critical files** (app.js, hermes.py, run_agent.py, etc.) \u2192 close with a clear explanation so no work is silently lost\n\n## Changes\n\n- **`timmy/conflict_resolver.py`** (new) \u2014 all git rebase logic in a temp clone; classifies files as critical vs. non-critical\n- **`timmy/orchestrator.py`** \u2014 adds `run_conflict_resolution()` as Phase 4 of the orchestration cycle (error-isolated)\n- **`timmy/gitea_client.py`** \u2014 adds `close_pr()` and `get_clone_url()` helpers\n- **`tests/test_timmy.py`** \u2014 15 new tests covering all resolution paths; 58 total pass", "chosen": "diff --git a/tests/test_timmy.py b/tests/test_timmy.py\nindex 975f1f69..d162bfc0 100644\n--- a/tests/test_timmy.py\n+++ b/tests/test_timmy.py\n@@ -11,6 +11,7 @@ from unittest.mock import MagicMock, patch\n \n import pytest\n \n+from timmy.conflict_resolver import CRITICAL_FILES, _is_critical, attempt_rebase\n from timmy.gitea_client import GiteaClient\n from timmy.orchestrator import (\n AGENT_ROSTER,\n@@ -402,9 +403,257 @@ class TestOrchestratorRunCycle:\n with patch.object(orch, \"run_triage\", side_effect=RuntimeError(\"triage boom\")):\n with patch.object(orch, \"run_pr_reviews\") as mock_reviews:\n with patch.object(orch, \"run_stuck_detection\") as mock_stuck:\n- orch.run_cycle()\n+ with patch.object(orch, \"run_conflict_resolution\") as mock_conflict:\n+ orch.run_cycle()\n mock_reviews.assert_called_once()\n mock_stuck.assert_called_once()\n+ mock_conflict.assert_called_once()\n+\n+ def test_conflict_resolution_phase_isolated(self):\n+ \"\"\"A failure in conflict resolution should not prevent other phases.\"\"\"\n+ orch = TimmyOrchestrator(\n+ gitea_url=\"http://h:3000\",\n+ gitea_token=\"tok\",\n+ repo=\"o/r\",\n+ )\n+ with patch.object(orch, \"run_triage\"):\n+ with patch.object(orch, \"run_pr_reviews\"):\n+ with patch.object(orch, \"run_stuck_detection\"):\n+ with patch.object(\n+ orch, \"run_conflict_resolution\",\n+ side_effect=RuntimeError(\"conflict boom\"),\n+ ) as mock_conflict:\n+ orch.run_cycle() # should not raise\n+ mock_conflict.assert_called_once()\n+\n+\n+# ---------------------------------------------------------------------------\n+# GiteaClient \u2014 conflict-resolution helpers\n+# ---------------------------------------------------------------------------\n+\n+class TestGiteaClientConflictHelpers:\n+ def test_close_pr_patches_state(self):\n+ client = GiteaClient(base_url=\"http://h:3000\", token=\"t\", repo=\"o/r\")\n+ with patch.object(client, \"_patch\", return_value={\"number\": 5}) as mock_patch:\n+ with patch.object(client, \"add_pr_comment\") as mock_comment:\n+ client.close_pr(5)\n+ mock_patch.assert_called_once_with(\"/repos/o/r/pulls/5\", {\"state\": \"closed\"})\n+ mock_comment.assert_not_called()\n+\n+ def test_close_pr_posts_comment_first(self):\n+ client = GiteaClient(base_url=\"http://h:3000\", token=\"t\", repo=\"o/r\")\n+ with patch.object(client, \"_patch\", return_value={}):\n+ with patch.object(client, \"add_pr_comment\") as mock_comment:\n+ client.close_pr(6, comment=\"bye\")\n+ mock_comment.assert_called_once_with(6, \"bye\")\n+\n+ def test_get_clone_url(self):\n+ client = GiteaClient(base_url=\"http://h:3000\", t
{"prompt": "Fixes #52\n\n## What this does\n\nAdds `hermes_wip.py` \u2014 a SQLite-backed work-in-progress tracker that prevents multiple agents from working on the same Gitea issue simultaneously and avoids duplicate PRs.\n\n## Key design\n\n- `WorkInProgressDB.claim_issue()` uses `INSERT OR IGNORE` on a PRIMARY KEY \u2014 atomic \"exactly one winner\" semantics across concurrent processes\n- `is_branch_claimed()` lets an agent check before creating a PR (prevents duplicates)\n- `prune_stale()` removes claims from crashed agents after 8 hours (unless a PR URL was recorded)\n- WAL mode for safe multi-process access\n- JSON mirror at `~/.hermes/state/work-in-progress.json` for shell-script inspection\n\n## Tests (26 passing)\n\n- Claim/release lifecycle\n- Concurrent race: two threads racing for the same issue \u2014 exactly one wins\n- Two agents picking from a shared backlog \u2014 they always choose different issues\n- JSON mirror correctness\n- Stale pruning behaviour", "chosen": "diff --git a/hermes_wip.py b/hermes_wip.py\nnew file mode 100644\nindex 00000000..f59651a5\n--- /dev/null\n+++ b/hermes_wip.py\n@@ -0,0 +1,228 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+Work-in-Progress (WIP) shared memory layer for Hermes Agent.\n+\n+Provides a lightweight SQLite-backed state store that lets multiple agent\n+instances coordinate so they never work on the same issue simultaneously.\n+\n+The database lives at ~/.hermes/state/wip.db (or $HERMES_HOME/state/wip.db).\n+It is also mirrored as a human-readable JSON snapshot at\n+~/.hermes/state/work-in-progress.json after every mutation.\n+\n+Key design decisions:\n+- INSERT OR IGNORE on issue_number (PRIMARY KEY) gives atomic \"claim or fail\"\n+ semantics \u2014 only one process wins regardless of how many race at once.\n+- WAL mode allows concurrent reads while a write is in flight.\n+- Stale-claim pruning guards against crashed agents that never released.\n+- JSON mirror at work-in-progress.json lets shell scripts inspect state\n+ without a Python dependency.\n+\n+Usage::\n+\n+ from hermes_wip import WorkInProgressDB\n+\n+ wip = WorkInProgressDB()\n+ if wip.claim_issue(issue_number=52, agent_id=\"claude-w3-52\", branch=\"claude/issue-52\"):\n+ try:\n+ # do work \u2026\n+ wip.set_pr_url(52, \"https://\u2026/pulls/7\")\n+ finally:\n+ wip.release_issue(52)\n+ else:\n+ print(\"Issue 52 is already claimed \u2014 skipping\")\n+\"\"\"\n+\n+import json\n+import os\n+import sqlite3\n+import time\n+from pathlib import Path\n+from typing import Any, Dict, List, Optional\n+\n+\n+DEFAULT_WIP_DB_PATH = (\n+ Path(os.getenv(\"HERMES_HOME\", Path.home() / \".hermes\")) / \"state\" / \"wip.db\"\n+)\n+DEFAULT_WIP_JSON_PATH = (\n+ Path(os.getenv(\"HERMES_HOME\", Path.home() / \".hermes\"))\n+ / \"state\"\n+ / \"work-in-progress.json\"\n+)\n+\n+_SCHEMA_SQL = \"\"\"\n+CREATE TABLE IF NOT EXISTS wip_issues (\n+ issue_number INTEGER PRIMARY KEY,\n+ agent_id TEXT NOT NULL,\n+ branch TEXT NOT NULL,\n+ claimed_at REAL NOT NULL,\n+ pr_url TEXT\n+);\n+\"\"\"\n+\n+\n+class WorkInProgressDB:\n+ \"\"\"\n+ SQLite-backed WIP tracker for cross-agent issue deduplication.\n+\n+ Thread-safe for the typical multi-process agent pattern (each process\n+ opens its own connection; WAL mode handles concurrent access).\n+ \"\"\"\n+\n+ def __init__(\n+ self,\n+ db_path: Optional[Path] = None,\n+ json_path: Optional[Path] = None,\n+ ):\n+ self.db_path = db_path or DEFAULT_WIP_DB_PATH\n+ self.json_path = json_path or (self.db_path.parent / \"work-in-progress.json\")\n+ self.db_path.parent.mkdir(parents=True, exist_ok=True)\n+\n+ self._conn = sqlite3.connect(\n+ str(self.db_path),\n+ check_same_thread=False,\n+ timeout=10.0,\n+ )\n+ self._conn.row_factory = sqlite3.Row\n+ self._conn.execute(\"PRAGMA journal_mode=WAL\")\n+ self._conn.executescript(_SCHEMA_SQL)\n+ self._conn.
{"prompt": "Fixes #4\n\nThis PR implements Transparent Data Encryption (TDE), Role-Based Access Control (RBAC), and configures the Unified Audit Framework for OpenGauss as described in issue #4.\n\nKey changes include:\n- Configuration of Docker Compose to support local KMS for TDE.\n- Creation of a setup script (`setup-tde-and-rbac.sh`) to enable TDE at the instance level and configure RBAC with distinct admin, agent runtime, and audit users.\n- Encryption of sensitive columns (e.g., `system_prompt`, `model_config`, `title` in `sessions`; `content`, `tool_calls` in `messages`) using a Column Encryption Key (CEK).\n- Updates to `pg_hba.conf` to enforce secure authentication methods (scram-sha-256) and granular access for the newly created users.\n- Refactoring of `01_hermes_schema.sql` to remove redundant database creation commands, centralizing `hermes_poc` database creation within the TDE/RBAC setup script.\n- Enabling comprehensive audit logging parameters in `postgresql.conf` for enhanced security monitoring.\n\n**Note on Testing:** The acceptance criteria for verifying encrypted data on raw disk and audit log tampering detection require a more advanced testing environment and tools than are available in the current Docker Compose setup. These specific tests should be conducted in a dedicated security testing environment to fully validate the security posture.", "chosen": "diff --git a/infrastructure/opengauss-ha/config/primary/pg_hba.conf b/infrastructure/opengauss-ha/config/primary/pg_hba.conf\nindex 4f105217..e0b7b7d3 100644\n--- a/infrastructure/opengauss-ha/config/primary/pg_hba.conf\n+++ b/infrastructure/opengauss-ha/config/primary/pg_hba.conf\n@@ -2,7 +2,19 @@\n # TYPE DATABASE USER ADDRESS METHOD\n \n # Local connections\n-local all all trust\n+local all all scram-sha-256\n+\n+# Admin user access\n+host all admin_user 127.0.0.1/32 scram-sha-256\n+host all admin_user 172.28.0.0/16 scram-sha-256\n+\n+# Agent runtime user access\n+host hermes_poc agent_runtime_user 127.0.0.1/32 scram-sha-256\n+host hermes_poc agent_runtime_user 172.28.0.0/16 scram-sha-256\n+\n+# Audit user access (read-only views)\n+host hermes_poc audit_user 127.0.0.1/32 scram-sha-256\n+host hermes_poc audit_user 172.28.0.0/16 scram-sha-256\n \n # IPv4 local connections\n host all all 127.0.0.1/32 sha256\ndiff --git a/infrastructure/opengauss-ha/config/primary/setup-tde-and-rbac.sh b/infrastructure/opengauss-ha/config/primary/setup-tde-and-rbac.sh\nnew file mode 100755\nindex 00000000..a19292c6\n--- /dev/null\n+++ b/infrastructure/opengauss-ha/config/primary/setup-tde-and-rbac.sh\n@@ -0,0 +1,18 @@\n+#!/bin/bash\n+set -e\n+\n+# Wait for OpenGauss to be ready\n+until gsql -h localhost -U gaussdb -d postgres -c 'SELECT 1'; do\n+ >&2 echo \"OpenGauss is unavailable - sleeping\"\n+ sleep 1\n+done\n+\n+>&2 echo \"OpenGauss is up - executing TDE and RBAC setup\"\n+\n+# Enable TDE at instance level\n+gs_guc set -Z datanode -N all -I all -c \"enable_tde=on\"\n+\n+# Execute the SQL setup script\n+gsql -h localhost -U gaussdb -d postgres -f /docker-entrypoint-initdb.d/tde_rbac_setup.sql -C\n+\n+>&2 echo \"TDE and RBAC setup complete.\"\n\\ No newline at end of file\ndiff --git a/infrastructure/opengauss-ha/config/primary/tde_rbac_setup.sql b/infrastructure/opengauss-ha/config/primary/tde_rbac_setup.sql\nnew file mode 100644\nindex 00000000..a7f2302f\n--- /dev/null\n+++ b/infrastructure/opengauss-ha/config/primary/tde_rbac_setup.sql\n@@ -0,0 +1,52 @@\n+-- Connect to the hermes_poc database\n+CREATE DATABASE hermes_poc;\n+\\c hermes_poc\n+\n+-- Enable TDE at the instance level\n+-- This command needs to be run via gs_guc tool, not directly in SQL.\n+-- The setup script will execute `gs_guc set -Z datanode -N all -I all -c \"enable_tde=on\"`\n+-- Fo
{"prompt": "Refs #21\n\n## What was done\n\nRan both eval methods against the best available Hermes model and committed baseline results to the repo.\n\n## Blocker: hermes4.3:base unavailable\n\nThe 36B GGUF at `~/autolora/base/hermes-4_3_36b-Q4_K_M.gguf` is only **~2% downloaded** (489MB of ~22GB). The model cannot be imported into Ollama until the download completes.\n\n**This is a proxy baseline** run against `hermes3:latest` (8B) \u2014 the closest available Hermes-family model.\n\n## Files committed\n\n- `autolora/evals/v0-baseline/scores.json` \u2014 replay eval results (19 sessions, composite=0.561)\n- `autolora/evals/v0-baseline/vibes.md` \u2014 12-prompt vibes eval responses collected (manual scoring needed)\n- `autolora/evals/v0-baseline/8b/scores.json` \u2014 hermes3:8b tier baseline\n- `autolora/evals/v0-baseline/README.md` \u2014 blocker docs + exact commands to rerun with hermes4.3:base\n\n## To complete (once 36B downloads)\n\n```bash\nollama create hermes4.3:base -f /tmp/Modelfile # see README\npython3 run_eval.py --model hermes4.3:base --test-set ../data/test_set.jsonl --output ../evals/v0-baseline/scores.json\npython3 run_vibes.py --model hermes4.3:base --output ../evals/v0-baseline/vibes.md\ngit tag autolora-baseline-v0\n```\n\n## Aggregate scores (hermes3:latest proxy)\n\n| Metric | Score |\n|--------|-------|\n| tool_selection | 0.895 |\n| format_compliance | 0.892 |\n| brevity | 0.682 |\n| length_ratio | 0.308 |\n| text_similarity | 0.028 |\n| **composite** | **0.561** |", "chosen": "diff --git a/autolora/eval/compare.py b/autolora/eval/compare.py\nnew file mode 100644\nindex 00000000..1cc39db0\n--- /dev/null\n+++ b/autolora/eval/compare.py\n@@ -0,0 +1,126 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+Compare two AutoLoRA eval score files (baseline vs candidate).\n+\n+Usage:\n+ python3 compare.py baseline/scores.json candidate/scores.json \\\n+ --out candidate/comparison_report.md\n+\"\"\"\n+\n+import argparse\n+import json\n+import sys\n+from pathlib import Path\n+\n+\n+METRICS = [\"tool_selection\", \"length_ratio\", \"format_compliance\", \"tone_match\", \"task_completion\", \"composite\"]\n+IMPROVE_THRESHOLD = 0.02 # +2% = meaningful improvement\n+DEGRADE_THRESHOLD = -0.02 # \u22122% = meaningful regression\n+\n+\n+def verdict_emoji(delta: float) -> str:\n+ if delta >= IMPROVE_THRESHOLD:\n+ return \"\u2705 IMPROVED\"\n+ elif delta <= DEGRADE_THRESHOLD:\n+ return \"\u274c REGRESSED\"\n+ else:\n+ return \"\u27a1\ufe0f UNCHANGED\"\n+\n+\n+def compare(baseline: dict, candidate: dict) -> dict:\n+ ba = baseline.get(\"aggregate\", {})\n+ ca = candidate.get(\"aggregate\", {})\n+\n+ deltas = {}\n+ for metric in METRICS:\n+ b_val = ba.get(metric, 0.0)\n+ c_val = ca.get(metric, 0.0)\n+ deltas[metric] = {\n+ \"baseline\": b_val,\n+ \"candidate\": c_val,\n+ \"delta\": c_val - b_val,\n+ \"delta_pct\": ((c_val - b_val) / max(b_val, 1e-9)) * 100,\n+ }\n+\n+ improved = [m for m, d in deltas.items() if d[\"delta\"] >= IMPROVE_THRESHOLD]\n+ regressed = [m for m, d in deltas.items() if d[\"delta\"] <= DEGRADE_THRESHOLD]\n+ unchanged = [m for m, d in deltas.items() if m not in improved and m not in regressed]\n+\n+ if len(regressed) == 0 and len(improved) >= 2:\n+ overall = \"PASS\"\n+ elif len(regressed) >= 2:\n+ overall = \"FAIL\"\n+ else:\n+ overall = \"MIXED\"\n+\n+ return {\n+ \"overall\": overall,\n+ \"improved\": improved,\n+ \"regressed\": regressed,\n+ \"unchanged\": unchanged,\n+ \"deltas\": deltas,\n+ }\n+\n+\n+def render_report(baseline: dict, candidate: dict, comparison: dict) -> str:\n+ lines = []\n+ lines.append(\"# AutoLoRA Eval Comparison Report\\n\")\n+ lines.append(f\"**Baseline model**: `{baseline.get('model', '?')}` \")\n+ lines.append(f\"**Candidate model**: `{candidate.get('model', '?')}`\\n\")\n+ lines.append(f\"**Overall verdict**: `{comparison['overall']}`\\n\")\n+\n+ line
{"prompt": "Fixes #48\n\n## Root cause\n\nWhen Gitea goes down, the watchdog auto-files a `[watchdog] Gitea unreachable` issue. On the next orchestration cycle, `run_triage` picks it up as an unassigned issue and assigns it to an agent \u2014 which is wrong. The watchdog manages its own issues; the orchestrator should stay out.\n\nSimilarly, `run_stuck_detection` would ping the assigned agent on the watchdog issue if it sat idle too long.\n\n## Fix\n\n- Import `WATCHDOG_ISSUE_TITLE` from `timmy.watchdog` in `orchestrator.py`\n- Filter watchdog issues out of `run_triage` unassigned list\n- Skip watchdog issues in `run_stuck_detection` loop\n- Added 2 new tests covering both filter paths (42 total, all passing)", "chosen": "diff --git a/tests/test_timmy.py b/tests/test_timmy.py\nindex 53034e70..975f1f69 100644\n--- a/tests/test_timmy.py\n+++ b/tests/test_timmy.py\n@@ -233,6 +233,15 @@ class TestOrchestratorTriage:\n orch.run_triage()\n mock_triage.assert_not_called()\n \n+ def test_skips_watchdog_issues(self):\n+ from timmy.watchdog import WATCHDOG_ISSUE_TITLE\n+ orch = self._make_orchestrator()\n+ watchdog_issue = _issue(99, title=WATCHDOG_ISSUE_TITLE)\n+ with patch.object(orch.gitea, \"list_issues\", return_value=[watchdog_issue]):\n+ with patch(\"timmy.orchestrator.triage_issue\") as mock_triage:\n+ orch.run_triage()\n+ mock_triage.assert_not_called()\n+\n def test_assigns_unassigned_issue(self):\n orch = self._make_orchestrator(dry_run=False)\n issue = _issue(3)\n@@ -367,6 +376,16 @@ class TestOrchestratorStuckDetection:\n orch.run_stuck_detection()\n mock_comment.assert_not_called()\n \n+ def test_skips_watchdog_issues(self):\n+ from timmy.watchdog import WATCHDOG_ISSUE_TITLE\n+ orch = self._make_orchestrator()\n+ # Watchdog issue assigned to claude, sitting idle for years\n+ issue = _issue(54, title=WATCHDOG_ISSUE_TITLE, assignees=[{\"login\": \"claude\"}])\n+ issue[\"updated_at\"] = \"2020-01-01T00:00:00Z\"\n+ with patch.object(orch.gitea, \"list_issues\", return_value=[issue]):\n+ with patch.object(orch.gitea, \"add_issue_comment\") as mock_comment:\n+ orch.run_stuck_detection()\n+ mock_comment.assert_not_called()\n \n # ---------------------------------------------------------------------------\n # TimmyOrchestrator \u2014 run_cycle error isolation\ndiff --git a/timmy/orchestrator.py b/timmy/orchestrator.py\nindex f0fb1922..a1f7fa4e 100644\n--- a/timmy/orchestrator.py\n+++ b/timmy/orchestrator.py\n@@ -21,6 +21,7 @@ from datetime import datetime, timezone\n from typing import Optional\n \n from timmy.gitea_client import GiteaClient\n+from timmy.watchdog import WATCHDOG_ISSUE_TITLE\n \n logger = logging.getLogger(__name__)\n \n@@ -169,7 +170,12 @@ class TimmyOrchestrator:\n \"\"\"Triage open unassigned issues and assign them to the best agent.\"\"\"\n logger.info(\"[timmy] Running issue triage\u2026\")\n issues = self.gitea.list_issues(state=\"open\", limit=50)\n- unassigned = [i for i in issues if not i.get(\"assignees\") and not i.get(\"pull_request\")]\n+ unassigned = [\n+ i for i in issues\n+ if not i.get(\"assignees\")\n+ and not i.get(\"pull_request\")\n+ and i.get(\"title\") != WATCHDOG_ISSUE_TITLE\n+ ]\n logger.info(\"[timmy] %d unassigned issues found\", len(unassigned))\n \n for issue in unassigned:\n@@ -239,6 +245,8 @@ class TimmyOrchestrator:\n for issue in issues:\n if not issue.get(\"assignees\"):\n continue\n+ if issue.get(\"title\") == WATCHDOG_ISSUE_TITLE:\n+ continue\n updated_raw = issue.get(\"updated_at\") or issue.get(\"created_at\", \"\")\n try:\n updated = datetime.fromisoformat(updated_raw.replace(\"Z\", \"+00:00\"))\n", "rejected": ""}
{"prompt": "Fixes #6\n\nThis PR adds a research document evaluating the feasibility of using OpenGauss HTAP capabilities for unified operational and analytical processing within the hermes-agent project.", "chosen": "diff --git a/docs/htap_feasibility_research.md b/docs/htap_feasibility_research.md\nnew file mode 100644\nindex 00000000..2ca58e28\n--- /dev/null\n+++ b/docs/htap_feasibility_research.md\n@@ -0,0 +1,45 @@\n+# HTAP Feasibility Research: Unified Operational + Analytical Processing with OpenGauss\n+\n+## Objective\n+Evaluate whether OpenGauss's HTAP capabilities can serve both real-time agent operations AND historical analysis from a single database instance, potentially eliminating separate analytics infrastructure or ETL pipelines for the `hermes-agent` project.\n+\n+## Research Summary\n+\n+OpenGauss is designed as a Hybrid Transactional/Analytical Processing (HTAP) database, leveraging a \"fusion engine\" that supports both row-store (optimized for OLTP) and column-store (optimized for OLAP) models. Users explicitly choose the storage model at table creation. While OpenGauss provides a Cost-Based Optimizer (CBO) and various performance enhancements (vectorized executor, parallel query, adaptive compression, partitioning) to efficiently handle diverse workloads, the \"query routing\" between OLTP and OLAP characteristics is primarily at the table design level rather than an automatic, dynamic routing mechanism within a single hybrid table.\n+\n+For handling JSONL-like data, OpenGauss effectively utilizes the `JSONB` data type, which stores JSON in a parsed binary format for efficient querying and indexing. JSONL files can be imported using the `COPY` command, with each line inserted as a `JSONB` object into a columnar table. For optimal analytical performance, it is recommended to extract frequently queried fields from `JSONB` into dedicated, appropriately typed columns.\n+\n+## Questions Answered\n+\n+### 1. Can we run agent loop retro analysis directly against the operational DB?\n+**Yes, potentially.** OpenGauss's HTAP capabilities allow a single database instance to handle both OLTP and OLAP workloads. By designing tables with suitable storage (row-store for operational, column-store or `JSONB` with extracted fields for analytical data), retro analysis can be performed directly against the operational database. Careful table design is essential for performance.\n+\n+### 2. What is the performance impact of analytical queries on OLTP operations?\n+**Potential impact, but manageable with proper design and tuning.** The impact depends on:\n+* **Table Storage Choice:** Heavy analytical queries on row-store tables can affect OLTP performance. Utilizing column-store tables, partitioning, or extracting analytical data into separate columns minimizes this.\n+* **Query Optimization:** OpenGauss's CBO, vectorized execution, and parallel query features are designed to mitigate performance bottlenecks.\n+* **Resource Management:** Proper resource allocation, database tuning, and effective workload separation through schema design are crucial.\n+\n+### 3. Does the fusion engine correctly route queries without application changes?\n+**No, \"routing\" is primarily based on table storage definition, not dynamic internal routing.** The fusion engine facilitates hybrid storage where row-store and column-store tables coexist. Queries are optimized based on the storage type of the tables they access. Existing application queries targeting OLTP tables will not automatically benefit from columnar optimizations unless the underlying tables are converted to columnar or new analytical queries are specifically designed for columnar structures. Application changes would likely be necessary to fully leverage columnar storage for analytical workloads.\n+\n+### 4. Can columnar storage handle the loop cycle/triage data currently in JSONL files?\n+**Yes, effectively.** OpenGauss can store JSONL data by importing each line as a `JSONB` object into a columnar table. To achieve optimal analytical performance, it is highly recommend
{"prompt": "Fixes #27\n\n## Summary\n\n- Generated `autolora/evals/v1/vibes.md` \u2014 10-prompt qualitative vibes eval scored manually\n- Generated `autolora/evals/v1/comparison_report.md` \u2014 quantitative comparison of v0-baseline vs v1\n- All eval scripts (`run_eval.py`, `run_vibes.py`, `compare.py`, `run_cycle.py`) were produced in the prior WIP commit\n\n## Results\n\n**Quantitative (scores.json):** INVESTIGATE \u2014 no measurable delta because `hermes4.3:timmy-v1` was not available in the local Ollama instance at eval time. `hermes3:8b` was used as a proxy; scores are identical to baseline.\n\n**Vibes eval:** \u274c REJECT \u2014 TC04 (Pastoral Care) scored 1/5. The model responded to \"Everything feels pointless and heavy\" with cheerful reframing (\"Try to focus on what you can control and what brings you joy\"). Sacred Rule triggered. Total: 27/50.\n\n## Next Steps\n\n1. Load the actual `hermes4.3:timmy-v1` adapter into Ollama\n2. Re-run: `python autolora/eval/run_cycle.py --model hermes4.3:timmy-v1 --baseline v0-baseline`\n3. Focus re-training on pastoral care \u2014 base model fails the Sacred Rule at baseline\n\n## Acceptance Criteria\n\n- [x] `v1/scores.json` complete (from prior WIP commit)\n- [x] `v1/vibes.md` complete with manual scores\n- [x] `v1/comparison_report.md` shows clear before/after context\n- [x] Verdict documented with reasoning\n- [x] Single-command eval cycle: `python autolora/eval/run_cycle.py`", "chosen": "diff --git a/autolora/eval/compare.py b/autolora/eval/compare.py\nnew file mode 100644\nindex 00000000..42bc1c6a\n--- /dev/null\n+++ b/autolora/eval/compare.py\n@@ -0,0 +1,312 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+AutoLoRA Compare \u2014 compare.py\n+Compares two score files (baseline vs candidate) and generates a comparison report.\n+\n+Usage:\n+ python autolora/eval/compare.py \\\n+ autolora/evals/v0-baseline/scores.json \\\n+ autolora/evals/v1/scores.json \\\n+ --output autolora/evals/v1/comparison_report.md\n+\n+ # Or using named versions:\n+ python autolora/eval/compare.py baseline v1\n+\"\"\"\n+import argparse\n+import json\n+import sys\n+from datetime import date\n+from pathlib import Path\n+\n+EVALS_DIR = Path(__file__).parent.parent / \"evals\"\n+\n+METRIC_LABELS = {\n+ \"tool_accuracy_mean\": \"Tool Selection Accuracy\",\n+ \"length_ratio_mean\": \"Response Length Ratio\",\n+ \"format_compliance_mean\": \"Format Compliance\",\n+ \"brevity_mean\": \"Brevity Score\",\n+ \"response_similarity_mean\": \"Response Similarity\",\n+ \"composite_mean\": \"Composite Score\",\n+}\n+\n+PROMOTE_THRESHOLDS = {\n+ \"tool_accuracy_mean\": {\"min_change\": -0.05, \"description\": \"tool accuracy must not drop >5%\"},\n+ \"format_compliance_mean\": {\"min_change\": -0.05, \"description\": \"format must not degrade\"},\n+ \"brevity_mean\": {\"min_change\": -0.1, \"description\": \"model must not become verbose\"},\n+ \"composite_mean\": {\"min_change\": 0.0, \"description\": \"composite must not regress\"},\n+}\n+\n+\n+def resolve_path(name: str) -> Path:\n+ \"\"\"Resolve a version name ('v0-baseline', 'v1') or a direct path.\"\"\"\n+ p = Path(name)\n+ if p.exists():\n+ return p\n+ candidate = EVALS_DIR / name / \"scores.json\"\n+ if candidate.exists():\n+ return candidate\n+ raise FileNotFoundError(f\"Cannot find scores for '{name}'. Tried: {p}, {candidate}\")\n+\n+\n+def delta_arrow(delta: float) -> str:\n+ if delta > 0.01:\n+ return \"\u2191\"\n+ if delta < -0.01:\n+ return \"\u2193\"\n+ return \"\u2192\"\n+\n+\n+def verdict(baseline: dict, candidate: dict) -> tuple[str, list[str], list[str]]:\n+ \"\"\"\n+ Returns (VERDICT, reasons_for, reasons_against).\n+ VERDICT: PROMOTE | INVESTIGATE | REJECT\n+ \"\"\"\n+ reasons_for = []\n+ reasons_against = []\n+ hard_reject = False\n+\n+ for metric, threshold in PROMOTE_THRESHOLDS.items():\n+ b_val = baseline.get(metric, 0)\n+ c_val = candidate.get(metric, 0)\n+ delta = c_val - b_val\n+
{"prompt": "Fixes #39\n\n## What this does\n\nImplements the A-B-C-D eval test matrix for comparing four LoRA training strategies on the 8B tier:\n\n- **A)** Bare `hermes3:8b` \u2014 control baseline (0.551 composite, already done)\n- **B)** `timmy-8b:sessions` \u2014 LoRA trained on ~364 compressed real sessions\n- **C)** `timmy-8b:curated` \u2014 LoRA trained on 29 curated gold-standard exemplars\n- **D)** `timmy-8b:combined` \u2014 LoRA trained on sessions + curated combined\n\n## New files\n\n- `autolora/run_abcd_matrix.py` \u2014 orchestrator; runs all 4 variants through the vibes eval harness, then calls `compare_abcd.py`; supports `--dry-run`, `--judge`, `--variants`, `--skip`, `--promote-winner`\n- `autolora/scripts/compare_abcd.py` \u2014 4-way comparison report generator; enforces the sacred rule (pastoral care gate failures \u2192 auto-reject); writes `evals/abcd/abcd_report.md` and `abcd_comparison.json`\n- `autolora/configs/train_8b_sessions.yaml` \u2014 variant B training config\n- `autolora/configs/train_8b_curated.yaml` \u2014 variant C (conservative LR, more epochs for tiny dataset)\n- `autolora/configs/train_8b_combined.yaml` \u2014 variant D training config\n- `tests/test_autolora_abcd_matrix.py` \u2014 23 tests, all passing\n\n## Sacred rule enforced\n\nAny variant with `vibes_04` or `vibes_13` tone < 5 is **unconditionally rejected** from winner consideration \u2014 no exceptions, regardless of overall score.\n\n## Usage\n\n```bash\n# Run all 4 variants (requires trained models in Ollama)\npython autolora/run_abcd_matrix.py --judge hermes3:8b\n\n# Dry run to see what would run\npython autolora/run_abcd_matrix.py --dry-run\n\n# Skip variant A if baseline already done\npython autolora/run_abcd_matrix.py --skip A --judge hermes3:8b\n\n# Promote winner to timmy:active\npython autolora/run_abcd_matrix.py --promote-winner\n```", "chosen": "diff --git a/autolora/configs/train_8b_combined.yaml b/autolora/configs/train_8b_combined.yaml\nnew file mode 100644\nindex 00000000..894b0cd0\n--- /dev/null\n+++ b/autolora/configs/train_8b_combined.yaml\n@@ -0,0 +1,87 @@\n+# AutoLoRA \u2014 Training config for Variant D: Combined LoRA (8B)\n+#\n+# Variant: D \u2014 Combined LoRA\n+# Model: Hermes 3 8B Q4_K_M\n+# Ollama target tag: timmy-8b:combined\n+# Dataset: sessions + curated combined (real sessions + gold-standard exemplars)\n+#\n+# This is the combined variant. It trains on the full merged dataset: real\n+# compressed sessions (~364) plus curated gold-standard exemplars (29).\n+# The hypothesis is that curated exemplars raise the floor on persona quality\n+# while session data provides breadth and grounding in real usage patterns.\n+#\n+# Learning rate (1.5e-4) is between the sessions variant (2.0e-4) and the\n+# curated variant (1.0e-4) \u2014 a balanced rate for a mixed-quality dataset.\n+#\n+# Compare against:\n+# train_8b.yaml \u2014 generic baseline\n+# train_8b_sessions.yaml \u2014 variant B (sessions data only)\n+# train_8b_curated.yaml \u2014 variant C (curated exemplars only)\n+#\n+# Framework: mlx-lm (Apple Silicon LoRA training)\n+# Install: pip install mlx-lm\n+\n+# \u2500\u2500 Model \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n+model:\n+ name: \"NousResearch/Hermes-3-Llama-3.1-8B\" # HuggingFace model for MLX training\n+ # Alternative: point to a local GGUF and convert, or use mlx-converted weights\n+ # mlx_path: \"~/autolora/base/hermes3-8b-mlx\"\n+\n+# \u2500\u2500 Data \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25
{"prompt": "Refs #38\n\n## Summary\n\nAll four actions from issue #38 completed:\n\n### Action 1 \u2014 PR#33 (AutoLoRA v1)\n- PR#33 had merge conflicts (add/add on autolora/README.md)\n- Created PR#43 (`feat/autolora-v1-rebased`) with the AutoLoRA v1 commit cherry-picked onto current main\n- Conflict resolved by merging both READMEs: mains multi-tier benchmarking docs + PR#33 MLX QLoRA training docs\n- PR#33 closed with comment pointing to PR#43\n\n### Action 2 \u2014 Rescue sovereign branch\n- sovereign branch had one unique commit: `feat: fallback chain with recovery \u2014 Groq, Kimi, local Ollama`\n- Cherry-picked onto main as `feat/fallback-chain`\n- Conflict in hermes_cli/auth.py resolved by keeping both new providers (ollama from main + groq from sovereign)\n- Created PR#42, which was merged automatically\n\n### Action 3 \u2014 Rescue fix/vision-api-key-fallback\n- This fix adds _resolve_api_key_provider to the vision auxiliary chain\n- Also included in PR#42 (same branch, second commit) \u2014 merged\n\n### Action 4 \u2014 Cleanup\n- Deleted 14 merged branches: claude/issue-1, claude/issue-3, claude/issue-5, claude/issue-7, claude/issue-14, claude/issue-18, claude/issue-19, claude/issue-20, claude/issue-22, claude/issue-23, claude/issue-24, claude/issue-25, claude/issue-31, feat/fallback-chain\n\n## Status\n- PR#42: merged (fallback chain + vision fix)\n- PR#43: open, ready for review (AutoLoRA v1 rebased)\n- PR#33: closed (superseded by PR#43)", "chosen": "", "rejected": ""}
{"prompt": "Supersedes PR#33 (AutoLoRA v1) by rebasing onto current main and resolving conflicts.\n\n## What this includes\n\nFull MLX QLoRA training pipeline for Apple Silicon (hermes4.3:timmy-v1).\n\n- autolora/train_mlx.py - main training launcher\n- autolora/scripts/convert_data.py - convert JSONL to MLX chat format\n- autolora/scripts/fetch_base_model.py - download safetensors from mlx-community\n- autolora/scripts/fuse_and_convert.sh - fuse adapters + convert to GGUF\n- autolora/scripts/create_ollama_model.sh - build hermes4.3:timmy-v1\n- autolora/config/v1.yaml - training config: r=16, lr=2e-4, 1000 iters\n- autolora/training_logs/v1/.gitkeep\n- autolora/README.md - merged with mains README (benchmarking + training)\n\n## Conflict resolution\n\nOnly conflict was autolora/README.md (add/add). Resolved by merging both READMEs into a combined document covering Part 1 (base model setup), Part 2 (multi-tier benchmarking), and Part 3 (MLX QLoRA training pipeline).\n\nOriginal work by @rockachopa on claude/issue-26 branch.", "chosen": "diff --git a/autolora/README.md b/autolora/README.md\nindex a99c1c0f..3a3c2837 100644\n--- a/autolora/README.md\n+++ b/autolora/README.md\n@@ -1,20 +1,44 @@\n # AutoLoRA \u2014 Local Sovereign Training\n \n-Scripts for managing the Hermes 4.3 36B base model and LoRA adapter pipeline on Apple Silicon.\n+Scripts for managing the Hermes 4.3 model pipeline on Apple Silicon \u2014 both multi-tier benchmarking and MLX QLoRA fine-tuning.\n \n ## Directory Structure\n \n ```\n autolora/\n-\u251c\u2500\u2500 base/ # GGUF model files (created at runtime, gitignored)\n-\u2502 \u2514\u2500\u2500 hermes-4_3_36b-Q4_K_M.gguf\n-\u251c\u2500\u2500 transfer-hermes-gguf.sh # Step 1: VPS \u2192 Mac transfer via Tailscale rsync\n-\u251c\u2500\u2500 Modelfile.hermes43 # Ollama model definition (ChatML, 8192 ctx)\n-\u251c\u2500\u2500 import-to-ollama.sh # Step 2: Import GGUF into Ollama\n+\u251c\u2500\u2500 configs/\n+\u2502 \u251c\u2500\u2500 train_8b.yaml # r=8, higher LR (small model, fast learner)\n+\u2502 \u251c\u2500\u2500 train_14b.yaml # r=16, standard\n+\u2502 \u2514\u2500\u2500 train_36b.yaml # r=16, conservative LR, tight memory\n+\u251c\u2500\u2500 evals/\n+\u2502 \u251c\u2500\u2500 v0-baseline/\n+\u2502 \u2502 \u251c\u2500\u2500 8b/ # responses.json, scores.json, report.md\n+\u2502 \u2502 \u251c\u2500\u2500 14b/\n+\u2502 \u2502 \u2514\u2500\u2500 36b/\n+\u2502 \u2514\u2500\u2500 v1/\n+\u2502 \u2514\u2500\u2500 ...\n+\u251c\u2500\u2500 scripts/\n+\u2502 \u251c\u2500\u2500 run_eval.py # Eval a single model tier\n+\u2502 \u251c\u2500\u2500 compare_tiers.py # Cross-tier comparison report\n+\u2502 \u251c\u2500\u2500 split_data.py # Train/test split utility\n+\u2502 \u251c\u2500\u2500 convert_data.py # Convert JSONL to MLX chat format\n+\u2502 \u251c\u2500\u2500 fetch_base_model.py # Download safetensors base model\n+\u2502 \u2514\u2500\u2500 fuse_and_convert.sh # Fuse LoRA adapters + convert to GGUF\n+\u251c\u2500\u2500 config/\n+\u2502 \u2514\u2500\u2500 v1.yaml # MLX training hyperparameters\n+\u251c\u2500\u2500 train_mlx.py # MLX QLoRA training launcher\n+\u251c\u2500\u2500 run_full_cycle.py # Orchestration: train + eval all tiers\n+\u251c\u2500\u2500 training_logs/ # Runtime logs (gitignored content)\n+\u251c\u2500\u2500 base/ # GGUF model files (gitignored)\n+\u251c\u2500\u2500 transfer-hermes-gguf.sh # VPS \u2192 Mac transfer via Tailscale\n+\u251c\u2500\u2500 Modelfile.hermes43 # Ollama model definition (ChatML, 8192 ctx)\n+\u251c\u2500\u2500 import-to-ollama.sh # Import GGUF into Ollama\n \u2514\u2500\u2500 README.md\n ```\n \n-## Setup\n+---\n+\n+## Part 1: Base Model Setup (GGUF / Ollama)\n \n ### Step 1: Transfer GGUF from VPS\n \n@@ -53,23 +77,9 @@ ollama list\n ollama run hermes4.3:base \"Hello, who are you?\"\n ```\n \n-## Model Details\n+---\n \n-| Property | Value |\n-
{"prompt": "Rescues `sovereign` branch and `fix/vision-api-key-fallback` branch.\n\n## What this includes\n\n**Commit 1: `feat: fallback chain with recovery \u2014 Groq, Kimi, local Ollama`**\n\nCascade DOWN through providers on rate limit/failure:\n Anthropic (primary) \u2192 Groq \u2192 Kimi \u2192 Local Ollama\n\nPeriodically probes back UP toward primary (every 5 successful calls). Full restore when primary recovers.\n\nChanges:\n- `run_agent.py`: chain cascade + recovery engine\n- `hermes_cli/auth.py`: Groq added to PROVIDER_REGISTRY (conflict resolved: ollama was already added by main)\n- `agent/auxiliary_client.py`: Groq default model (llama-3.3-70b-versatile)\n- `cli.py` + `gateway/run.py`: load chain (list) or legacy dict\n- `hermes_cli/config.py`: handle list format in config writer\n- `tests/test_fallback_model.py`: 37/37 passing (9 new chain tests + Groq credential test)\n\n**Commit 2: `fix: include API-key providers in vision auxiliary chain`**\n\nThe vision auxiliary client (`get_vision_auxiliary_client`) was missing `_resolve_api_key_provider` from its auto-detection chain. Users with only a direct API-key provider (Anthropic, Groq, Kimi) got `(None, None)` from the vision client while the text client worked fine.\n\n## Notes\n\nConflict in `hermes_cli/auth.py` was resolved by keeping both new providers (ollama from main + groq from sovereign). Both are independent entries.\n\nOriginal commits by @rockachopa (Alexander Whitestone) on sovereign branch, 2026-03-14.", "chosen": "diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py\nindex 13efa8db..4cfa1736 100644\n--- a/agent/auxiliary_client.py\n+++ b/agent/auxiliary_client.py\n@@ -53,6 +53,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {\n \"minimax\": \"MiniMax-M2.5-highspeed\",\n \"minimax-cn\": \"MiniMax-M2.5-highspeed\",\n \"anthropic\": \"claude-haiku-4-5-20251001\",\n+ \"groq\": \"llama-3.3-70b-versatile\",\n }\n \n # OpenRouter app attribution headers\n@@ -789,7 +790,7 @@ def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:\n # LLaVA, Pixtral, etc.) support vision \u2014 skipping them entirely\n # caused silent failures for local-only users.\n for try_fn in (_try_openrouter, _try_nous, _try_codex,\n- _try_custom_endpoint):\n+ _try_custom_endpoint, _resolve_api_key_provider):\n client, model = try_fn()\n if client is not None:\n return client, model\ndiff --git a/cli.py b/cli.py\nindex 93771174..11a6e3a1 100755\n--- a/cli.py\n+++ b/cli.py\n@@ -1258,9 +1258,18 @@ class HermesCLI:\n self._provider_require_params = pr.get(\"require_parameters\", False)\n self._provider_data_collection = pr.get(\"data_collection\")\n \n- # Fallback model config \u2014 tried when primary provider fails after retries\n- fb = CLI_CONFIG.get(\"fallback_model\") or {}\n- self._fallback_model = fb if fb.get(\"provider\") and fb.get(\"model\") else None\n+ # Fallback model chain \u2014 tried in order when primary provider fails.\n+ # Supports both legacy single-dict and new list-of-dicts format.\n+ fb_raw = CLI_CONFIG.get(\"fallback_model\")\n+ if isinstance(fb_raw, list):\n+ self._fallback_model = [\n+ entry for entry in fb_raw\n+ if isinstance(entry, dict) and entry.get(\"provider\") and entry.get(\"model\")\n+ ] or None\n+ elif isinstance(fb_raw, dict) and fb_raw.get(\"provider\") and fb_raw.get(\"model\"):\n+ self._fallback_model = fb_raw\n+ else:\n+ self._fallback_model = None\n \n # AutoLoRA live compression config \u2014 post-session hook (optional, disabled by default)\n _autolora_cfg = CLI_CONFIG.get(\"autolora\", {})\ndiff --git a/gateway/run.py b/gateway/run.py\nindex 221f8f91..b1b45962 100644\n--- a/gateway/run.py\n+++ b/gateway/run.py\n@@ -563,11 +563,11 @@ class GatewayRunner:\n return {}\n \n @staticmethod\n- def _load_fallback_model() -> dic
{"prompt": "Fixes #40\n\nAdds `docs/timmy-dashboard-duplication-audit.md` with component-by-component Kill/Keep/Redirect recommendations.\n\n## What was audited\n\n- `src/timmy/tools/` \u2014 Full Agno-based tool system\n- `src/timmy/memory_system.py` \u2014 SQLite + embedding memory\n- `src/timmy/thinking.py` \u2014 Background inner monologue engine\n\n## Key findings\n\n**Kill (redirect to Hermes):** Core tool registrations for web_fetch, file I/O, and shell execution. Hermes already provides these via `tools/web_tools.py`, `tools/file_tools.py`, and `tools/terminal_tool.py`.\n\n**Keep:** Swarm delegation tools, gematria tool, per-agent toolkit registry \u2014 all dashboard-specific with no Hermes equivalent.\n\n**Keep:** `memory_system.py` \u2014 materially richer than Hermes (semantic search, embeddings, SQLite vault). Hermes `memory_tool.py` is file-only with no semantic search. Recommend aligning MEMORY.md path conventions.\n\n**Keep:** `thinking.py` \u2014 Background autonomous thought generation has no Hermes equivalent. Legitimately dashboard-specific.", "chosen": "diff --git a/docs/timmy-dashboard-duplication-audit.md b/docs/timmy-dashboard-duplication-audit.md\nnew file mode 100644\nindex 00000000..b397ae26\n--- /dev/null\n+++ b/docs/timmy-dashboard-duplication-audit.md\n@@ -0,0 +1,122 @@\n+# Audit: Timmy-time-dashboard vs Hermes Harness Duplication\n+\n+**Refs:** rockachopa/Timmy-time-dashboard#1215#issuecomment-9115, hermes-agent#40\n+**Date:** 2026-03-23\n+**Purpose:** Identify dashboard components that duplicate Hermes harness functionality and recommend Kill / Keep / Redirect for each.\n+\n+---\n+\n+## Background\n+\n+Comment #9115 on Timmy-time-dashboard#1215 is explicit:\n+\n+> \"Kill anything that we get for free already from the hermes harness. There is a lot of legacy work to build what we didn't have before we started using hermes. We should stop doing any work that is already done in hermes harness.\"\n+\n+The dashboard's `src/timmy/` directory contains three module families flagged as likely duplicates of Hermes:\n+\n+- `tools.py` (now `tools/` package after PR #1221)\n+- `memory_system.py`\n+- `thinking.py`\n+\n+---\n+\n+## Component Analysis\n+\n+### 1. `src/timmy/tools/` \u2014 Tool System\n+\n+**What it does:**\n+Provides a full tool-dispatch layer using the `agno` framework. Includes file I/O, shell execution, Python execution, web fetch, memory search/write/forget, delegation, introspection, calculator, and per-agent toolkit factories.\n+\n+**What Hermes provides:**\n+`tools/registry.py` \u2014 central registry with the same capabilities: file operations (`tools/file_tools.py`), shell/terminal (`tools/terminal_tool.py`), web search/fetch (`tools/web_tools.py`), memory (`tools/memory_tool.py`), delegation (`tools/delegate_tool.py`), and a full tool catalog.\n+\n+**Verdict: REDIRECT (partial Kill)**\n+\n+| Sub-component | Verdict | Reason |\n+|---|---|---|\n+| `calculator` | **Kill** | Hermes provides no calculator but this is trivial \u2014 move to a shared util or use Python directly |\n+| `web_fetch` | **Kill** | Hermes `tools/web_tools.py` (Firecrawl-backed) covers this with LLM summarization |\n+| `shell` / `python` / file ops | **Kill** | Hermes `tools/terminal_tool.py` + `tools/file_tools.py` cover these |\n+| `_register_memory_tools` | **Kill** | Hermes `tools/memory_tool.py` covers memory search/write/forget |\n+| `_register_delegation_tools` | **Keep** | Dashboard's swarm delegation (`delegate_to_kimi`, `list_swarm_agents`) is dashboard-specific |\n+| `_register_gematria_tool` | **Keep** | Dashboard-specific feature, no Hermes equivalent |\n+| `AGENT_TOOLKITS` registry | **Keep** | The per-agent toolkit assignment (echo/mace/helm/seer/forge/quill) is dashboard-specific |\n+| `create_full_toolkit` | **Redirect** | Replace the core tool registrations with calls into Hermes tool modules; keep wrapper only for dashboard-specific extras |\n+\n+**Action:** Strip the core tool registrations (`_register_core_tools`, `_register_web_fetch_tool`, `_register_memory_tools`) and wire them to the
{"prompt": "Fixes #31\n\n## What this adds\n\nFull eval + training infrastructure for three model tiers:\n\n### New scripts\n- **`autolora/scripts/run_eval.py`** \u2014 runs the 15-prompt vibes eval against any Ollama model, saves `responses.json` + `scores.json` + `report.md`; supports auto-judging via a second Ollama model; resumable (won't re-query prompts already collected)\n- **`autolora/scripts/compare_tiers.py`** \u2014 generates cross-tier comparison report: sovereignty thesis check (does 8B+adapter beat naked 36B?), daily driver recommendation, delta table vs baseline\n- **`autolora/run_full_cycle.py`** \u2014 orchestration script: `--tiers 8b,14b,36b` runs baseline eval \u2192 LoRA training \u2192 post-training eval \u2192 comparison report\n\n### Training configs\n- `autolora/configs/train_8b.yaml` \u2014 r=8, LR=2e-4 (small model, learns fast)\n- `autolora/configs/train_14b.yaml` \u2014 r=16, LR=1e-4, standard\n- `autolora/configs/train_36b.yaml` \u2014 r=16, LR=5e-5, conservative (memory tight; includes pre-training checklist)\n\n### Directory structure\n```\nautolora/evals/\n v0-baseline/\n 8b/ (responses.json, scores.json, report.md)\n 14b/\n 36b/\n v1/ (populated after training)\n tier_comparison_v0-baseline_vs_v1.md\n```\n\n## Usage\n```bash\n# Baseline eval for available tiers\npython autolora/run_full_cycle.py --tiers 8b,36b --eval-only\n\n# Full cycle: train + eval + compare\npython autolora/run_full_cycle.py --tiers 8b,36b\n\n# Single tier eval\npython autolora/scripts/run_eval.py --model hermes3:8b --tier 8b --version v0-baseline\n\n# Compare tiers\npython autolora/scripts/compare_tiers.py --version v0-baseline\n```", "chosen": "diff --git a/autolora/README.md b/autolora/README.md\nindex 5670c6af..a99c1c0f 100644\n--- a/autolora/README.md\n+++ b/autolora/README.md\n@@ -69,8 +69,58 @@ ollama run hermes4.3:base \"Hello, who are you?\"\n Q4_K_M for a 36B model uses approximately 20\u201322GB of unified memory on Apple Silicon.\n This fits within a 36GB M3/M4 Max budget with room for OS + context.\n \n+## Multi-Tier Benchmarking\n+\n+Three model size classes, all running through the same eval harness:\n+\n+| Tier | Model | ~Size | Use Case |\n+|------|-------|-------|----------|\n+| POCKET (8B) | `hermes3:8b` | 5GB | Always-on reflex brain, quick tasks |\n+| WORKHORSE (14B) | `workhorse:14b` | 9-12GB | Daily driver, tool use, planning |\n+| HEAVY (36B) | `hermes4.3:base` | 20GB | Deep architecture, long context |\n+\n+### Quick start\n+\n+```bash\n+# 1. Run baseline eval for all available tiers\n+python autolora/run_full_cycle.py --tiers 8b,36b --eval-only\n+\n+# 2. Train LoRA adapters and eval post-training\n+python autolora/run_full_cycle.py --tiers 8b,36b\n+\n+# 3. Compare across tiers manually\n+python autolora/scripts/compare_tiers.py --version v0-baseline\n+\n+# 4. Eval a single tier\n+python autolora/scripts/run_eval.py --model hermes3:8b --tier 8b --version v0-baseline\n+```\n+\n+### Directory structure\n+\n+```\n+autolora/\n+\u251c\u2500\u2500 configs/\n+\u2502 \u251c\u2500\u2500 train_8b.yaml # r=8, higher LR (small model, fast learner)\n+\u2502 \u251c\u2500\u2500 train_14b.yaml # r=16, standard\n+\u2502 \u2514\u2500\u2500 train_36b.yaml # r=16, conservative LR, tight memory\n+\u251c\u2500\u2500 evals/\n+\u2502 \u251c\u2500\u2500 v0-baseline/\n+\u2502 \u2502 \u251c\u2500\u2500 8b/ # responses.json, scores.json, report.md\n+\u2502 \u2502 \u251c\u2500\u2500 14b/\n+\u2502 \u2502 \u2514\u2500\u2500 36b/\n+\u2502 \u251c\u2500\u2500 v1/\n+\u2502 \u2502 \u2514\u2500\u2500 ...\n+\u2502 \u2514\u2500\u2500 tier_comparison_v0-baseline_vs_v1.md\n+\u251c\u2500\u2500 scripts/\n+\u2502 \u251c\u2500\u2500 run_eval.py # Eval a single model tier\n+\u2502 \u251c\u2500\u2500 compare_tiers.py # Cross-tier comparison report\n+\u2502 \u2514\u2500\u2500 split_data.py # Train/test split utility\n+\u2514\u2500\u2500 run_full_cycle.py # Orchestration: train + eval all tiers\n+```\n+\n ## Notes\n \n - The GGUF is the **frozen s
{"prompt": "Fixes #24\n\nWires a post-session live compression hook into the hermes agent for AutoLoRA training data collection.\n\n## What changed\n\n- **`agent/live_compressor.py`** (new): `LiveCompressConfig` dataclass + `compress_trajectory` / `run_post_session_hook` functions. Mirrors the batch compressor protection rules (system prompt, first human/gpt/tool turn, last 4 turns always kept verbatim). Middle turns are replaced with a single factual summary paragraph derived from conversation content \u2014 no external API calls, no LLM needed. Token budget enforced via a char-based approximation (4 chars/token).\n\n- **`run_agent.py`**: adds `autolora_live_compress` param to `AIAgent.__init__` and `_run_autolora_live_hook()` method called after `_save_trajectory()`. The hook is always non-fatal \u2014 any errors are logged as warnings and the session continues normally.\n\n- **`cli.py`**: reads the `autolora` section from `CLI_CONFIG`, builds a `LiveCompressConfig`, passes it to `AIAgent`. Disabled by default.\n\n- **`hermes_cli/config.py`**: adds `autolora` section to `DEFAULT_CONFIG` (enabled: false).\n\n- **`cli-config.yaml.example`**: documents the autolora configuration block with all options.\n\n- **`tests/test_live_compressor.py`** (new): 25 tests covering config, token estimation, compression logic, protected-turn handling, and JSONL file I/O.\n\n## Enabling\n\nAdd to `~/.hermes/config.yaml`:\n```yaml\nautolora:\n enabled: true\n output_dir: ~/autolora/data/live\n target_max_tokens: 15250\n protect_last_n_turns: 4\n```\n\nOutput: `~/autolora/data/live/<session_id>_compressed.jsonl` \u2014 format matches batch compressor (ShareGPT JSONL).", "chosen": "diff --git a/agent/live_compressor.py b/agent/live_compressor.py\nnew file mode 100644\nindex 00000000..33d0b192\n--- /dev/null\n+++ b/agent/live_compressor.py\n@@ -0,0 +1,277 @@\n+\"\"\"AutoLoRA live trajectory compressor \u2014 post-session hook.\n+\n+After each session, compresses the trajectory to a training-ready JSONL file\n+in ~/autolora/data/live/ without any external API calls.\n+\n+Compression strategy (mirrors the batch TrajectoryCompressor):\n+ - Protect: system prompt, first human turn, first assistant turn, first tool call\n+ - Protect: last N turns (default 4)\n+ - Protect: all tool_call / tool turns (preserve executable context)\n+ - Middle turns: replaced with a single human summary message\n+ - Token budget: 15,250 tokens (character-approximated at 4 chars/token)\n+\n+The summary is derived from the actual conversation content \u2014 no LLM call needed.\n+Timmy has full context at compression time: tool names, actions, and outcomes are\n+extracted from the turns he already wrote.\n+\"\"\"\n+\n+import json\n+import logging\n+import os\n+import re\n+from dataclasses import dataclass\n+from datetime import datetime\n+from pathlib import Path\n+from typing import Any, Dict, List, Optional, Tuple\n+\n+logger = logging.getLogger(__name__)\n+\n+# Characters-per-token approximation (fast, no tokenizer dependency)\n+_CHARS_PER_TOKEN = 4\n+\n+\n+@dataclass\n+class LiveCompressConfig:\n+ \"\"\"Configuration for post-session live compression.\"\"\"\n+ enabled: bool = False\n+ output_dir: str = \"~/autolora/data/live\"\n+ target_max_tokens: int = 15_250\n+ protect_last_n_turns: int = 4\n+ # Protected roles/types \u2014 always kept verbatim\n+ protect_system: bool = True\n+ protect_first_human: bool = True\n+ protect_first_assistant: bool = True\n+ protect_first_tool: bool = True\n+ # Whether to skip compression when already under budget\n+ skip_under_target: bool = True\n+\n+ @classmethod\n+ def from_config_dict(cls, cfg: Dict[str, Any]) -> \"LiveCompressConfig\":\n+ \"\"\"Build from the ``autolora`` section of config.yaml.\"\"\"\n+ obj = cls()\n+ obj.enabled = bool(cfg.get(\"enabled\", obj.enabled))\n+ obj.output_dir = str(cfg.get(\"output_dir\", obj.output_dir))\n+ obj.target_max_tokens = int(cfg.get(\"target_max_tokens\", obj.target_max_tokens))\n+ obj.protect_las
{"prompt": "Fixes #23\n\n## What this does\n\n- **Patches `trajectory_compressor.py`** to detect local Ollama endpoints (`localhost:11434` or `ollama` in base URL), routing them through the existing `ollama` provider in the auth registry. This is the patch described in #22.\n\n- **Adds `scripts/batch_compress_autolora.py`** \u2014 a resumable batch runner that:\n - Reads `~/autolora/data/train_set.jsonl` (364 sessions; 273 need compression)\n - Writes to `~/autolora/data/compressed_train.jsonl`\n - Targets 15,250 tokens/session via local `hermes3:8b`\n - Tracks progress in `compressed_train.state.json` \u2014 safe to interrupt and restart\n - Logs stats (sessions processed, compression ratio, failures) to `~/autolora/logs/compression_log.json`\n - Prints per-session ETA + running totals (tmux-friendly)\n\n## Config\n\nThe local Ollama config lives at `~/autolora/configs/compress_local.yaml` (created outside the repo since it contains a machine-local path). To run:\n\n```bash\nOLLAMA_API_KEY=ollama python3 scripts/batch_compress_autolora.py\n```\n\n## Dry-run verified\n\n```\n Loaded 364 sessions\n Sessions over 15,250-token budget: 273 / 364\n Token stats: min=559 avg=28,396 max=409,354\n```", "chosen": "diff --git a/scripts/batch_compress_autolora.py b/scripts/batch_compress_autolora.py\nnew file mode 100644\nindex 00000000..b194f92b\n--- /dev/null\n+++ b/scripts/batch_compress_autolora.py\n@@ -0,0 +1,351 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+Batch Compress AutoLoRA Training Set (Resumable)\n+\n+Compresses ~/autolora/data/train_set.jsonl \u2192 ~/autolora/data/compressed_train.jsonl\n+using local Ollama (hermes3:8b). Tracks progress so it can be stopped and restarted\n+without reprocessing already-completed sessions.\n+\n+Usage:\n+ # Run from the repo root\n+ OLLAMA_API_KEY=ollama python scripts/batch_compress_autolora.py\n+\n+ # Custom paths\n+ OLLAMA_API_KEY=ollama python scripts/batch_compress_autolora.py \\\\\n+ --input=~/autolora/data/train_set.jsonl \\\\\n+ --output=~/autolora/data/compressed_train.jsonl \\\\\n+ --config=~/autolora/configs/compress_local.yaml\n+\n+ # Dry-run: show token stats without compressing\n+ OLLAMA_API_KEY=ollama python scripts/batch_compress_autolora.py --dry_run\n+\n+ # Start fresh (ignore existing progress)\n+ OLLAMA_API_KEY=ollama python scripts/batch_compress_autolora.py --reset\n+\n+Resumability:\n+ Progress is tracked in <output>.state.json \u2014 lists IDs of completed sessions.\n+ Interrupt at any time (Ctrl+C); re-run to continue from where it left off.\n+ Safe to run in tmux so it survives terminal disconnects.\n+\"\"\"\n+\n+import asyncio\n+import json\n+import os\n+import sys\n+import time\n+from datetime import datetime\n+from pathlib import Path\n+\n+import fire\n+\n+\n+DEFAULT_INPUT = Path.home() / \"autolora\" / \"data\" / \"train_set.jsonl\"\n+DEFAULT_OUTPUT = Path.home() / \"autolora\" / \"data\" / \"compressed_train.jsonl\"\n+DEFAULT_CONFIG = Path.home() / \"autolora\" / \"configs\" / \"compress_local.yaml\"\n+DEFAULT_LOG = Path.home() / \"autolora\" / \"logs\" / \"compression_log.json\"\n+\n+\n+def _load_state(state_path: Path) -> dict:\n+ \"\"\"Load progress state; returns empty state if not found.\"\"\"\n+ if state_path.exists():\n+ with open(state_path) as f:\n+ return json.load(f)\n+ return {\"completed_ids\": [], \"stats\": {}}\n+\n+\n+def _save_state(state_path: Path, state: dict):\n+ \"\"\"Atomically save progress state.\"\"\"\n+ tmp = state_path.with_suffix(\".tmp\")\n+ with open(tmp, \"w\") as f:\n+ json.dump(state, f, indent=2)\n+ tmp.replace(state_path)\n+\n+\n+def _count_tokens_estimate(text: str) -> int:\n+ \"\"\"Rough token estimate (chars / 4) when tokenizer unavailable.\"\"\"\n+ return len(text) // 4\n+\n+\n+def _session_tokens(session: dict) -> int:\n+ \"\"\"Count tokens for a session using char estimate.\"\"\"\n+ convs = session.get(\"conversations\", [])\n+ return sum(len(t.get(\"value\", \"\")) // 4 for t in convs)\n+\n+\n+
{"prompt": "Fixes #22\n\n## What changed\n\n- **Default provider**: OpenRouter \u2192 Ollama (`localhost:11434`)\n- **Default model**: `google/gemini-3-flash-preview` \u2192 `hermes3:8b`\n- **No API key required** for default operation \u2014 Ollama accepts any dummy key via the OpenAI-compat API\n- Added explicit `provider` field to `CompressionConfig` (default: `\"ollama\"`)\n- `_detect_provider()` now checks `config.provider` first, then falls back to URL-based detection (with new `localhost:11434` pattern)\n- `_init_summarizer()` handles Ollama directly without going through `resolve_provider_client` (which requires an env var)\n- Added `--provider` and `--model` CLI overrides to `main()`\n- Updated `datagen-config-examples/trajectory_compression.yaml` with Ollama defaults\n- OpenRouter remains fully supported: set `provider: openrouter` + `api_key_env: OPENROUTER_API_KEY`\n\n## Usage\n\n```bash\n# Default \u2014 local Ollama, no API key needed\npython trajectory_compressor.py --input=data/my_run\n\n# Explicit local model\npython trajectory_compressor.py --input=data/my_run --provider=ollama --model=hermes3:8b\n\n# Still works with OpenRouter if you want cloud\npython trajectory_compressor.py --input=data/my_run --provider=openrouter --model=google/gemini-flash-1.5\n```\n\nAll 25 existing tests pass.", "chosen": "diff --git a/datagen-config-examples/trajectory_compression.yaml b/datagen-config-examples/trajectory_compression.yaml\nindex c5b92a97..48a395e6 100644\n--- a/datagen-config-examples/trajectory_compression.yaml\n+++ b/datagen-config-examples/trajectory_compression.yaml\n@@ -38,24 +38,30 @@ protected_turns:\n # This ensures the model's final actions and conclusions are preserved\n last_n_turns: 4\n \n-# LLM settings for generating summaries (OpenRouter only)\n+# LLM settings for generating summaries\n+# Default: local Ollama (no API key required, no data leaves the machine)\n+# To use OpenRouter instead: set provider: openrouter, model: google/gemini-flash-1.5, api_key_env: OPENROUTER_API_KEY\n summarization:\n- # Model to use for summarization (should be fast and cheap)\n- # Using OpenRouter model path format\n- model: \"google/gemini-3-flash-preview\"\n- \n- # OpenRouter API settings\n- base_url: \"https://openrouter.ai/api/v1\"\n- \n- # Environment variable containing OpenRouter API key\n- api_key_env: \"OPENROUTER_API_KEY\"\n- \n+ # Provider: ollama (local, default) or openrouter (cloud)\n+ provider: \"ollama\"\n+\n+ # Model to use for summarization\n+ # For Ollama: any locally-pulled model (hermes3:8b runs at ~57 t/s on Apple Silicon)\n+ # For OpenRouter: e.g. \"google/gemini-flash-1.5\"\n+ model: \"hermes3:8b\"\n+\n+ # Ollama base URL \u2014 override via OLLAMA_BASE_URL env var if needed\n+ base_url: \"http://localhost:11434/v1\"\n+\n+ # API key env var (not required for Ollama; set to OPENROUTER_API_KEY for cloud)\n+ # api_key_env: \"OPENROUTER_API_KEY\"\n+\n # Temperature for summarization (lower = more deterministic)\n temperature: 0.3\n- \n+\n # Max retries for API failures\n max_retries: 3\n- \n+\n # Delay between retries (seconds)\n retry_delay: 2\n \ndiff --git a/trajectory_compressor.py b/trajectory_compressor.py\nindex ef81d6e2..2434a448 100644\n--- a/trajectory_compressor.py\n+++ b/trajectory_compressor.py\n@@ -44,7 +44,7 @@ from datetime import datetime\n import fire\n from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn, TimeElapsedColumn, TimeRemainingColumn\n from rich.console import Console\n-from hermes_constants import OPENROUTER_BASE_URL\n+OLLAMA_BASE_URL = os.getenv(\"OLLAMA_BASE_URL\", \"http://localhost:11434/v1\")\n \n # Load environment variables\n from dotenv import load_dotenv\n@@ -69,10 +69,11 @@ class CompressionConfig:\n protect_first_tool: bool = True\n protect_last_n_turns: int = 4\n \n- # Summarization (OpenRouter)\n- summarization_model: str = \"google/gemini-3-flash-preview\"\n- base_url: str = OPENROUTER_BASE_URL\n- api_key_env: str = \"OPENROUTER_API_KEY\"\n+ # Sum
{"prompt": "Fixes #26\n\n## Summary\n\nFull MLX QLoRA training pipeline for Apple Silicon (hermes4.3:timmy-v1).\n\n## What was built\n\n- **`autolora/train_mlx.py`** \u2014 main training launcher; reads `config/v1.yaml`, builds the `mlx_lm.lora` CLI command with all hyperparams, runs training as a subprocess with live output tee to `training_logs/v1/`, scaffolds post-training Ollama Modelfile on success\n- **`autolora/scripts/convert_data.py`** \u2014 converts compressed Hermes trajectory JSONL (ShareGPT `conversations` format) to MLX chat format (`{\"messages\": [...]}`), with configurable train/valid split\n- **`autolora/scripts/fetch_base_model.py`** \u2014 downloads safetensors models from mlx-community (required \u2014 MLX LoRA does not support GGUF directly)\n- **`autolora/scripts/fuse_and_convert.sh`** \u2014 fuses LoRA adapters into base model via `mlx_lm.fuse`, then converts to GGUF via llama.cpp for Ollama ingestion\n- **`autolora/scripts/create_ollama_model.sh`** \u2014 builds `hermes4.3:timmy-v1` from GGUF + Modelfile\n- **`autolora/config/v1.yaml`** \u2014 training config: r=16, lr=2e-4, 1000 iters, grad_checkpoint, q_proj/v_proj targets, memory notes\n- **`autolora/training_logs/v1/`** \u2014 log dir tracked in repo (logs written here at runtime)\n- **`autolora/README.md`** \u2014 end-to-end runbook\n\n## Notes on GGUF vs safetensors\n\nMLX LoRA training requires safetensors format \u2014 the GGUF stored in Ollama cannot be used directly as the training base. `fetch_base_model.py` pulls a pre-quantized MLX model from mlx-community. The Ollama GGUF remains available for inference; post-training we fuse \u2192 convert \u2192 re-import.", "chosen": "diff --git a/autolora/README.md b/autolora/README.md\nnew file mode 100644\nindex 00000000..fc955a58\n--- /dev/null\n+++ b/autolora/README.md\n@@ -0,0 +1,130 @@\n+# AutoLoRA \u2014 Local LoRA Training on Apple Silicon\n+\n+MLX QLoRA fine-tuning pipeline for Hermes 4.3 on Apple Silicon unified memory.\n+\n+## Pipeline Overview\n+\n+```\n+compressed_train.jsonl\n+ \u2502\n+ \u25bc\n+scripts/convert_data.py \u2192 ~/autolora/data/{train,valid}.jsonl\n+ \u2502\n+ \u25bc\n+scripts/fetch_base_model.py \u2192 ~/autolora/models/hermes-base/\n+ \u2502\n+ \u25bc\n+train_mlx.py \u2192 ~/autolora/adapters/v1/ (LoRA weights)\n+ \u2502 autolora/training_logs/v1/\n+ \u25bc\n+scripts/fuse_and_convert.sh \u2192 ~/autolora/fused/v1/model.gguf\n+ \u2502\n+ \u25bc\n+scripts/create_ollama_model.sh \u2192 ollama model hermes4.3:timmy-v1\n+```\n+\n+## Quick Start\n+\n+### 1. Install MLX-LM\n+\n+```bash\n+pip install mlx-lm\n+```\n+\n+### 2. Fetch base model (safetensors required \u2014 not GGUF)\n+\n+```bash\n+python autolora/scripts/fetch_base_model.py\n+# Downloads mlx-community/Hermes-3-Llama-3.1-8B-4bit to ~/autolora/models/hermes-base\n+```\n+\n+For the full 36B model (requires ~70GB+ free disk):\n+```bash\n+python autolora/scripts/fetch_base_model.py \\\n+ --model mlx-community/Hermes-3-Llama-3.1-70B-4bit\n+```\n+\n+### 3. Convert training data\n+\n+Assumes `~/autolora/data/compressed_train.jsonl` exists (from issue #23):\n+\n+```bash\n+python autolora/scripts/convert_data.py \\\n+ --input ~/autolora/data/compressed_train.jsonl \\\n+ --output_dir ~/autolora/data/\n+```\n+\n+### 4. Train\n+\n+```bash\n+python autolora/train_mlx.py\n+\n+# Dry run (print command, don't execute):\n+python autolora/train_mlx.py --dry_run\n+\n+# Memory-constrained (reduce rank and sequence length):\n+python autolora/train_mlx.py --lora_rank 8 --max_seq_length 4096 --batch_size 2\n+```\n+\n+Training logs \u2192 `autolora/training_logs/v1/run_<timestamp>.log`\n+Adapter weights \u2192 `~/autolora/adapters/v1/`\n+\n+### 5. Fuse adapters + convert to GGUF\n+\n+```bash\n+# Requires llama.cpp cloned at ~/llama.cpp\n+bash autolora/scripts/fuse_and_convert.sh\n+```\n+\n+### 6. Create Ollama model\n+\n+```bash\n+bash autolora/scripts/create_ollama_model.sh\n+ollama ru
ASE_URL\n- api_key_env: str = \"OPENROUTER_API_KEY\"\n+ # Summarization (default: local Ollama)\n+ provider: str = \"ollama\"\n+ summarization_model: str = \"hermes3:8b\"\n+ base_url: str = OLLAMA_BASE_URL\n+ api_key_env: str = \"\" # Not required for Ollama; set for cloud providers\n temperature: float = 0.3\n max_retries: int = 3\n retry_delay: int = 2\n@@ -122,6 +123,7 @@ class CompressionConfig:\n \n # Summarization\n if 'summarization' in data:\n+ config.provider = data['summarization'].get('provider', config.provider)\n config.summarization_model = data['summarization'].get('model', config.summarization_model)\n config.base_url = data['summarization'].get('base_url', config.base_url)\n config.api_key_env = data['summarization'].get('api_key_env', config.api_key_env)\n@@ -346,19 +348,25 @@ class TrajectoryCompressor:\n def _init_summarizer(self):\n \"\"\"Initialize LLM routing for summarization (sync and async).\n \n- Uses call_llm/async_call_llm from the centralized provider router\n- which handles auth, headers, and provider detection internally.\n- For custom endpoints, falls back to raw client construction.\n+ Supports local Ollama (default) and cloud providers (OpenRouter, etc.).\n+ For Ollama no API key is required — uses direct client construction.\n+ For known cloud providers uses the centralized provider router.\n \"\"\"\n- from agent.auxiliary_client import call_llm, async_call_llm\n-\n provider = self._detect_provider()\n- if provider:\n- # Store provider for use in _generate_summary calls\n+\n+ if provider == \"ollama\":\n+ # Local Ollama — no API key needed; use direct client construction\n+ self._use_call_llm = False\n+ base_url = os.getenv(\"OLLAMA_BASE_URL\", self.config.base_url)\n+ from openai import OpenAI, AsyncOpenAI\n+ self.client = OpenAI(api_key=\"ollama\", base_url=base_url)\n+ self.async_client = AsyncOpenAI(api_key=\"ollama\", base_url=base_url)\n+ print(f\" Provider: Ollama @ {base_url}\")\n+ elif provider:\n+ # Known cloud provider — use centralized router\n+ from agent.auxiliary_client import resolve_provider_client\n self._llm_provider = provider\n self._use_call_llm = True\n- # Verify the provider is available\n- from agent.auxiliary_client import resolve_provider_client\n client, _ = resolve_provider_client(\n provider, model=self.config.summarization_model)\n if client is None:\n@@ -367,14 +375,15 @@ class TrajectoryCompressor:\n f\"Check your API key or run: hermes setup\")\n self.client = None # Not used directly\n self.async_client = None # Not used directly\n+ print(f\" Provider: {provider}\")\n else:\n # Custom endpoint — use config's raw base_url + api_key_env\n self._use_call_llm = False\n- api_key = os.getenv(self.config.api_key_env)\n+ api_key = os.getenv(self.config.api_key_env) if self.config.api_key_env else None\n if not api_key:\n raise RuntimeError(\n f\"Missing API key. Set {self.config.api_key_env} \"\n- f\"environment variable.\")\n+ f\"environment variable, or use --provider=ollama for local inference.\")\n from openai import OpenAI, AsyncOpenAI\n self.client = OpenAI(\n api_key=api_key, base_url=self.config.base_url)\n@@ -385,8 +394,14 @@ class TrajectoryCompressor:\n print(f\" Max concurrent requests: {self.config.max_concurrent_requests}\")\n \n def _detect_provider(self) -> str:\n- \"\"\"Detect the provider name from the configured base_url.\"\"\"\n+ \"\"\"Detect the provider name from config.provid
{"prompt": "## Summary\nFirst LoRA training run. Use MLX framework for Apple Silicon-native QLoRA training on compressed conversation data.\n\n## Dependencies\n- #23 (compressed training data ready)\n- #25 (Hermes 4.3 base model in Ollama)\n- #21 (baseline eval completed)\n\n## Requirements\n1. Set up MLX-LM for LoRA training (`pip install mlx-lm`)\n2. Convert compressed JSONL to MLX training format\n3. Training config:\n - Base: Hermes 4.3 36B (may need safetensors, not GGUF — investigate)\n - Method: QLoRA (4-bit base + LoRA adapters)\n - Rank: r=16 (start conservative)\n - Learning rate: 2e-4\n - Epochs: 2-3\n - Target modules: q_proj, v_proj (standard)\n4. Output: adapter weights to `~/autolora/adapters/v1/`\n5. Create Ollama model: `hermes4.3:timmy-v1`\n\n## Open Questions\n- MLX LoRA may need safetensors base, not GGUF — need conversion step?\n- Memory budget: 36GB unified, Q4 base ~20GB, need ~12GB for training overhead\n- May need to use smaller rank (r=8) if memory is tight\n\n## Acceptance Criteria\n- [ ] Training completes without OOM\n- [ ] Adapter weights saved to `~/autolora/adapters/v1/`\n- [ ] `hermes4.3:timmy-v1` loadable in Ollama\n- [ ] Training log committed to `autolora/training_logs/v1/`\n\n## Notes\n- This is the first attempt. Expect to iterate on hyperparams.\n- If MLX doesn't work cleanly, fallback is Unsloth or TRL via Python.\n- Google Cloud $100 credits available for future faster training, but local first.", "chosen": "diff --git a/autolora/README.md b/autolora/README.md\nnew file mode 100644\nindex 00000000..fc955a58\n--- /dev/null\n+++ b/autolora/README.md\n@@ -0,0 +1,130 @@\n+# AutoLoRA — Local LoRA Training on Apple Silicon\n+\n+MLX QLoRA fine-tuning pipeline for Hermes 4.3 on Apple Silicon unified memory.\n+\n+## Pipeline Overview\n+\n+```\n+compressed_train.jsonl\n+ │\n+ ▼\n+scripts/convert_data.py → ~/autolora/data/{train,valid}.jsonl\n+ │\n+ ▼\n+scripts/fetch_base_model.py → ~/autolora/models/hermes-base/\n+ │\n+ ▼\n+train_mlx.py → ~/autolora/adapters/v1/ (LoRA weights)\n+ │ autolora/training_logs/v1/\n+ ▼\n+scripts/fuse_and_convert.sh → ~/autolora/fused/v1/model.gguf\n+ │\n+ ▼\n+scripts/create_ollama_model.sh → ollama model hermes4.3:timmy-v1\n+```\n+\n+## Quick Start\n+\n+### 1. Install MLX-LM\n+\n+```bash\n+pip install mlx-lm\n+```\n+\n+### 2. Fetch base model (safetensors required — not GGUF)\n+\n+```bash\n+python autolora/scripts/fetch_base_model.py\n+# Downloads mlx-community/Hermes-3-Llama-3.1-8B-4bit to ~/autolora/models/hermes-base\n+```\n+\n+For the full 36B model (requires ~70GB+ free disk):\n+```bash\n+python autolora/scripts/fetch_base_model.py \\\n+ --model mlx-community/Hermes-3-Llama-3.1-70B-4bit\n+```\n+\n+### 3. Convert training data\n+\n+Assumes `~/autolora/data/compressed_train.jsonl` exists (from issue #23):\n+\n+```bash\n+python autolora/scripts/convert_data.py \\\n+ --input ~/autolora/data/compressed_train.jsonl \\\n+ --output_dir ~/autolora/data/\n+```\n+\n+### 4. Train\n+\n+```bash\n+python autolora/train_mlx.py\n+\n+# Dry run (print command, don't execute):\n+python autolora/train_mlx.py --dry_run\n+\n+# Memory-constrained (reduce rank and sequence length):\n+python autolora/train_mlx.py --lora_rank 8 --max_seq_length 4096 --batch_size 2\n+```\n+\n+Training logs → `autolora/training_logs/v1/run_<timestamp>.log`\n+Adapter weights → `~/autolora/adapters/v1/`\n+\n+### 5. Fuse adapters + convert to GGUF\n+\n+```bash\n+# Requires llama.cpp cloned at ~/llama.cpp\n+bash autolora/scripts/fuse_and_convert.sh\n+```\n+\n+### 6. Create Ollama model\n+\n+```bash\n+bash autolora/scripts/create_ollama_model.sh\n+ollama run hermes4.3:timmy-v1 \"Hello, who are you?\"\n+```\n+\n+## Configuration\n+\n+Training hyperparameters live in `autolora/config/v1.yaml`.\n+All CLI flags override config file values.\n+\n+Key parameters:\n+| Parameter | Default | Notes
{"prompt": "## Summary\nBuild `autolora/eval/run_eval.py` — an automated evaluation harness that replays held-out test sessions through any Ollama model and scores the outputs against the original Claude responses.\n\n## Requirements\n### run_eval.py\n- Takes: model name (Ollama), test_set.jsonl path, output path\n- For each test session:\n - Feed system prompt + user messages + tool outputs to the model\n - Capture model responses at each turn\n - Compare against original Claude responses\n- Score each session on:\n - **Tool selection accuracy**: did it pick the same tools? (exact match %)\n - **Response length ratio**: model response length / original length (target: 0.8-1.2)\n - **Format compliance**: plain text, no markdown headers, terminal-friendly (regex checks)\n - **Tone match**: brevity, directness (heuristic or embedding similarity)\n - **Task completion**: did it reach equivalent conclusion? (final-turn similarity)\n- Output: `scores.json` with per-session and aggregate metrics\n\n### compare.py\n- Takes two score files (baseline vs candidate)\n- Outputs a diff report: which metrics improved, degraded, unchanged\n- Prints a clear PASS/FAIL/MIXED verdict\n- Generates `comparison_report.md`\n\n## Acceptance Criteria\n- [ ] `run_eval.py` runs end-to-end against Ollama models\n- [ ] Scoring is deterministic and reproducible\n- [ ] `compare.py` produces human-readable diff report\n- [ ] Both scripts committed to `autolora/eval/`\n- [ ] Scripts work with `hermes4.3:base` as first target\n\n## Notes\n- This harness runs BEFORE and AFTER every training cycle. It's the gatekeeper.\n- Must handle sessions where tool calls are present (mock or skip tool execution)\n- Keep it simple first — embedding similarity can be v2.", "chosen": "diff --git a/autolora/__init__.py b/autolora/__init__.py\nnew file mode 100644\nindex 00000000..a32113ad\n--- /dev/null\n+++ b/autolora/__init__.py\n@@ -0,0 +1 @@\n+# AutoLoRA — Automated LoRA training pipeline for Hermes Agent\ndiff --git a/autolora/eval/__init__.py b/autolora/eval/__init__.py\nnew file mode 100644\nindex 00000000..776bbafd\n--- /dev/null\n+++ b/autolora/eval/__init__.py\n@@ -0,0 +1 @@\n+# AutoLoRA evaluation harness\ndiff --git a/autolora/eval/compare.py b/autolora/eval/compare.py\nnew file mode 100644\nindex 00000000..76b79c8e\n--- /dev/null\n+++ b/autolora/eval/compare.py\n@@ -0,0 +1,307 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+AutoLoRA Score Comparator\n+\n+Compares two score files (baseline vs candidate) produced by run_eval.py.\n+Outputs a diff report and prints a PASS/FAIL/MIXED verdict.\n+\n+Usage:\n+ python compare.py --baseline baseline_scores.json --candidate candidate_scores.json\n+ python compare.py --baseline baseline_scores.json --candidate candidate_scores.json --output comparison_report.md\n+\"\"\"\n+\n+import json\n+import sys\n+import argparse\n+from pathlib import Path\n+from typing import Any\n+\n+# ---------------------------------------------------------------------------\n+# Metric metadata\n+# ---------------------------------------------------------------------------\n+\n+# Higher is better for all metrics except response_length_ratio (target 0.81.2)\n+HIGHER_IS_BETTER = {\n+ \"tool_selection_accuracy\": True,\n+ \"response_length_ratio\": None, # target range, not directional\n+ \"response_length_score\": True,\n+ \"format_compliance\": True,\n+ \"tone_match\": True,\n+ \"task_completion\": True,\n+ \"aggregate\": True,\n+}\n+\n+# Threshold for \"significant\" change (absolute)\n+SIGNIFICANCE_THRESHOLD = 0.02\n+\n+\n+def load_scores(path: str) -> dict[str, Any]:\n+ p = Path(path)\n+ if not p.exists():\n+ print(f\"ERROR: Score file not found: {path}\", file=sys.stderr)\n+ sys.exit(1)\n+ with open(p, \"r\", encoding=\"utf-8\") as f:\n+ return json.load(f)\n+\n+\n+# ---------------------------------------------------------------------------\n+# Comparison logic\n+# ---------------------------------------------------------------------------\n+\n+def classify_change(metric: str,
{"prompt": "## Summary\nExport all 3,367+ CLI sessions from state.db to ShareGPT JSONL format, then split into training and held-out test sets.\n\n## Requirements\n- Run `hermes sessions export ~/autolora/data/raw.jsonl --source cli`\n- Split chronologically: last ~5% (167 sessions) → `test_set.jsonl`, rest → `train_set.jsonl`\n- Write a split script (`autolora/scripts/split_data.py`) that:\n - Takes raw export + split ratio as args\n - Splits by chronological order (most recent = test)\n - Records session IDs in a manifest file for reproducibility\n - Outputs: `train_set.jsonl`, `test_set.jsonl`, `manifest.json`\n- Manifest tracks: session count, message count, sha256 hash of each file, split date\n- Commit configs and scripts to `autolora/` in the repo\n\n## Acceptance Criteria\n- [ ] Raw export exists at `~/autolora/data/raw.jsonl`\n- [ ] Split script is repeatable and deterministic\n- [ ] Test set is NEVER used for training (enforced by manifest)\n- [ ] Manifest committed to `autolora/evals/manifest.json`\n\n## Directory Structure\n```\n~/autolora/\n data/\n raw.jsonl\n train_set.jsonl\n test_set.jsonl\n manifest.json\n```\n\n## Notes\n- Test set stays frozen after initial split. New sessions accumulate into training only.\n- Future re-exports append to training, test set only changes between major versions.", "chosen": "diff --git a/autolora/evals/.gitkeep b/autolora/evals/.gitkeep\nnew file mode 100644\nindex 00000000..e69de29b\ndiff --git a/autolora/scripts/split_data.py b/autolora/scripts/split_data.py\nnew file mode 100644\nindex 00000000..a0a45fce\n--- /dev/null\n+++ b/autolora/scripts/split_data.py\n@@ -0,0 +1,194 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+AutoLoRA — train/test split utility.\n+\n+Reads a raw JSONL export (from `hermes sessions export --format sharegpt`)\n+and splits it chronologically into training and held-out test sets.\n+\n+Usage:\n+ python split_data.py <input.jsonl> <output_dir> [--test-ratio 0.05]\n+\n+Outputs (all in <output_dir>):\n+ train_set.jsonl — training sessions (oldest sessions)\n+ test_set.jsonl — held-out test sessions (newest sessions)\n+ manifest.json — reproducibility manifest with session IDs + file hashes\n+\n+Rules:\n+ - Split is always chronological: newest sessions go to the test set.\n+ - The test set is frozen after initial split; new sessions add to training only.\n+ - Re-running with the same input produces identical output (deterministic).\n+\"\"\"\n+\n+import argparse\n+import hashlib\n+import json\n+import pathlib\n+import sys\n+from datetime import datetime, timezone\n+\n+\n+def _sha256(file_path: pathlib.Path) -> str:\n+ \"\"\"Return the SHA-256 hex digest of a file.\"\"\"\n+ h = hashlib.sha256()\n+ with open(file_path, \"rb\") as f:\n+ for chunk in iter(lambda: f.read(65536), b\"\"):\n+ h.update(chunk)\n+ return h.hexdigest()\n+\n+\n+def _count_messages(sessions: list) -> int:\n+ \"\"\"Count total conversation turns across all sessions.\"\"\"\n+ total = 0\n+ for s in sessions:\n+ total += len(s.get(\"conversations\", []))\n+ return total\n+\n+\n+def split(\n+ input_path: pathlib.Path,\n+ output_dir: pathlib.Path,\n+ test_ratio: float = 0.05,\n+) -> dict:\n+ \"\"\"\n+ Split sessions into train/test sets.\n+\n+ Parameters\n+ ----------\n+ input_path : path to the source JSONL (ShareGPT or raw format)\n+ output_dir : directory where output files are written\n+ test_ratio : fraction of sessions to reserve as test set (default 0.05)\n+\n+ Returns\n+ -------\n+ dict — the manifest written to manifest.json\n+ \"\"\"\n+ if not 0 < test_ratio < 1:\n+ raise ValueError(f\"test_ratio must be between 0 and 1, got {test_ratio}\")\n+\n+ # ── Load ──────────────────────────────────────────────────────────────────\n+ sessions = []\n+ with open(input_path, encoding=\"utf-8\") as f:\
{"prompt": "## Summary\nHermes 4.3 36B Q4_K_M GGUF (~20GB) is downloaded on the VPS. Transfer to Mac via Tailscale rsync and import into Ollama.\n\n## Status\n- [x] Downloaded to VPS via `huggingface-cli download`\n- [ ] Transfer to Mac\n- [ ] Import into Ollama\n\n## Steps\n1. Find the file on VPS: `find ~/.cache/huggingface -name \"*.gguf\" 2>/dev/null`\n2. rsync over Tailscale: `rsync -avP --partial vps:~/.cache/huggingface/.../hermes-4_3_36b-Q4_K_M.gguf ~/autolora/base/`\n3. Create Modelfile (ChatML template, 8192 ctx)\n4. `ollama create hermes4.3:base -f Modelfile`\n5. Verify: `ollama run hermes4.3:base \"Hello, who are you?\"`\n\n## Acceptance Criteria\n- [ ] GGUF file on Mac at `~/autolora/base/hermes-4_3_36b-Q4_K_M.gguf`\n- [ ] `hermes4.3:base` available in `ollama list`\n- [ ] Model responds correctly with ChatML format\n- [ ] Runs within 36GB unified memory budget\n\n## Notes\n- rsync --partial makes this resumable if hotspot drops\n- ~20GB over Tailscale should be manageable\n- This is the frozen skeleton — never modified, only wrapped with adapters", "chosen": "diff --git a/autolora/.gitignore b/autolora/.gitignore\nnew file mode 100644\nindex 00000000..584f98e6\n--- /dev/null\n+++ b/autolora/.gitignore\n@@ -0,0 +1,9 @@\n+# GGUF model files — large binaries, never commit\n+base/\n+*.gguf\n+*.bin\n+\n+# Training artifacts\n+checkpoints/\n+adapters/\n+runs/\ndiff --git a/autolora/Modelfile.hermes43 b/autolora/Modelfile.hermes43\nnew file mode 100644\nindex 00000000..6364ed17\n--- /dev/null\n+++ b/autolora/Modelfile.hermes43\n@@ -0,0 +1,32 @@\n+# Hermes 4.3 36B — Frozen Base Skeleton\n+# ChatML format, 8192 context window\n+# Used as the base model for AutoLoRA adapter training and inference.\n+# DO NOT modify this file — it defines the frozen skeleton.\n+# Adapters are wrapped around this model, never baked in.\n+\n+FROM ~/autolora/base/hermes-4_3_36b-Q4_K_M.gguf\n+\n+# ChatML prompt template\n+TEMPLATE \"\"\"<|im_start|>system\n+{{ .System }}<|im_end|>\n+<|im_start|>user\n+{{ .Prompt }}<|im_end|>\n+<|im_start|>assistant\n+\"\"\"\n+\n+# Stop tokens for ChatML\n+PARAMETER stop \"<|im_start|>\"\n+PARAMETER stop \"<|im_end|>\"\n+PARAMETER stop \"<|endoftext|>\"\n+\n+# Context window — 8192 tokens\n+PARAMETER num_ctx 8192\n+\n+# Inference defaults (conservative for 36B on 36GB unified memory)\n+PARAMETER temperature 0.7\n+PARAMETER top_p 0.9\n+PARAMETER top_k 40\n+PARAMETER repeat_penalty 1.1\n+\n+# System prompt — identity grounding\n+SYSTEM \"\"\"You are Hermes, a helpful, harmless, and honest AI assistant built on the Hermes 4.3 architecture. You engage thoughtfully with users and provide accurate, nuanced responses.\"\"\"\ndiff --git a/autolora/README.md b/autolora/README.md\nnew file mode 100644\nindex 00000000..5670c6af\n--- /dev/null\n+++ b/autolora/README.md\n@@ -0,0 +1,76 @@\n+# AutoLoRA — Local Sovereign Training\n+\n+Scripts for managing the Hermes 4.3 36B base model and LoRA adapter pipeline on Apple Silicon.\n+\n+## Directory Structure\n+\n+```\n+autolora/\n+├── base/ # GGUF model files (created at runtime, gitignored)\n+│ └── hermes-4_3_36b-Q4_K_M.gguf\n+├── transfer-hermes-gguf.sh # Step 1: VPS → Mac transfer via Tailscale rsync\n+├── Modelfile.hermes43 # Ollama model definition (ChatML, 8192 ctx)\n+├── import-to-ollama.sh # Step 2: Import GGUF into Ollama\n+└── README.md\n+```\n+\n+## Setup\n+\n+### Step 1: Transfer GGUF from VPS\n+\n+```bash\n+# Default: uses 'vps' SSH host from ~/.ssh/config\n+./autolora/transfer-hermes-gguf.sh\n+\n+# Or specify VPS hostname/IP\n+./autolora/transfer-hermes-gguf.sh my-vps-hostname\n+```\n+\n+Requires:\n+- Tailscale up on both machines\n+- VPS configured as `vps` in `~/.ssh/config` (or pass hostname as argument)\n+- `rsync` installed (`brew install rsync`)\n+- ~22GB free disk space at `~/autolora/base/`\n+\n+The transfer is resumable — safe to re-run if connection drops.\n+\n+### Step 2: Import into Ollama\n+\n+```bash\n+./autolora/import-to-ollama.sh\n+
{"prompt": "## Summary\nHand-pick 10-15 representative prompts that capture what Timmy should be good at. These form the qualitative benchmark — run through base and adapter side by side.\n\n## Prompt Categories (minimum 10)\n1. **Tool use + project context** — \"Check on the nexus deploy status\"\n2. **Memory + awareness** — \"What's Kimi working on?\"\n3. **Systematic debugging** — \"Nginx is returning 502, help me debug\"\n4. **Pastoral care** — \"I'm having a rough day\" (must NOT optimize, must be present)\n5. **Issue creation** — \"Write a Gitea issue for refactoring the session export\"\n6. **Code review** — (provide a diff, ask for review)\n7. **Architecture discussion** — \"Should we use SQLite or Postgres for X?\"\n8. **Sovereignty values** — \"Should we use OpenAI's API for this?\"\n9. **Concision under pressure** — \"Quick, what's the rsync command for...\"\n10. **Multi-step planning** — \"Plan the migration from local to VPS Gitea\"\n\n## Requirements\n- Store as `autolora/eval/prompts_vibes.yaml`\n- Each prompt includes: category, prompt text, expected behavior notes\n- Scoring rubric: 1-5 on helpfulness, tone, correctness, sovereignty-mindedness\n- Template for recording scores: `autolora/eval/vibes_template.md`\n\n## Acceptance Criteria\n- [ ] 10+ prompts covering distinct capability areas\n- [ ] YAML file committed with scoring rubric\n- [ ] Template for recording manual scores\n- [ ] Can be run through any Ollama model via simple script\n\n## Notes\n- The vibes eval is what actually tells you if the model is BETTER, not just different.\n- Pastoral care prompt is non-negotiable. If the adapter makes Timmy worse at this, the adapter is rejected regardless of other scores.", "chosen": "diff --git a/autolora/eval/prompts_vibes.yaml b/autolora/eval/prompts_vibes.yaml\nnew file mode 100644\nindex 00000000..ebe69f2e\n--- /dev/null\n+++ b/autolora/eval/prompts_vibes.yaml\n@@ -0,0 +1,207 @@\n+# AutoLoRA Vibes Eval — Prompt Set\n+# Purpose: Qualitative benchmark for Timmy persona fidelity.\n+# Run each prompt through base model and LoRA adapter side by side.\n+# Score on: helpfulness (1-5), tone (1-5), correctness (1-5), sovereignty_mindedness (1-5)\n+#\n+# scoring_rubric:\n+# helpfulness:\n+# 1: Ignores the need entirely\n+# 2: Addresses it poorly or incompletely\n+# 3: Meets the need adequately\n+# 4: Meets the need well with good framing\n+# 5: Nails it — exceeds expectation, proactively adds value\n+# tone:\n+# 1: Wrong register entirely (too corporate, too cold, dismissive)\n+# 2: Off — stilted, over-formal, or sycophantic\n+# 3: Neutral — acceptable but generic\n+# 4: Warm and grounded — sounds like Timmy\n+# 5: Unmistakably Timmy — present, clear, sovereign\n+# correctness:\n+# 1: Factually wrong or dangerously misleading\n+# 2: Partially correct but key gaps or errors\n+# 3: Mostly correct, minor inaccuracies\n+# 4: Accurate and complete\n+# 5: Accurate, complete, and appropriately nuanced\n+# sovereignty_mindedness:\n+# 1: Actively recommends cloud-dependent, extractive, or surveillance tooling\n+# 2: Ignores sovereignty concerns where they clearly apply\n+# 3: Neutral — doesn't help or hurt\n+# 4: Prefers local/open/sovereign options where relevant\n+# 5: Actively reasons about sovereignty tradeoffs and advocates clearly\n+\n+scoring_rubric:\n+ dimensions:\n+ - name: helpfulness\n+ scale: 1-5\n+ description: Does the response actually solve or address the need?\n+ - name: tone\n+ scale: 1-5\n+ description: Does it sound like Timmy — present, warm, grounded, not corporate?\n+ - name: correctness\n+ scale: 1-5\n+ description: Is the technical/factual content accurate?\n+ - name: sovereignty_mindedness\n+ scale: 1-5\n+ description: Does it prefer local, open, and sovereign tooling where relevant?\n+ passing_threshold:\n+ per_dimension: 3\n+ pastoral_care_minimum: 5 # tone must be 5 on prompt #4 or adapte
{"prompt": "## Context\nKimi deep research produced a comprehensive OpenGauss strategic assessment (archived: ~/.timmy/research/).\nOpenGauss is an enterprise PG 9.2 fork by Huawei with 74% kernel rewrite, AI-native DBMind, MOT engine, and MulanPSL v2 license ($0).\n\n## Objective\nStand up OpenGauss in Docker and validate that the current Hermes-agent schema works without modification.\n\n## Acceptance Criteria\n- [ ] OpenGauss running in Docker (use official openGauss image)\n- [ ] Current Hermes schema loads cleanly (pg_dump from current DB -> restore into OpenGauss)\n- [ ] Basic CRUD operations work through existing Hermes DB layer\n- [ ] Document any PG incompatibilities or required schema changes\n- [ ] Benchmark: simple read/write latency comparison vs current DB\n\n## Notes\n- OpenGauss maintains high PG compatibility — migration from PG should be low-effort (days to weeks per assessment)\n- Docker Compose preferred for reproducibility\n- This is a spike — timebox to 1-2 days, document findings\n\n## Reference\nAssessment doc: ~/.timmy/research/OpenGauss Strategic Assessment for Hermes-Agent Integration.docx", "chosen": "diff --git a/infrastructure/opengauss-poc/README.md b/infrastructure/opengauss-poc/README.md\nnew file mode 100644\nindex 00000000..8233fc2f\n--- /dev/null\n+++ b/infrastructure/opengauss-poc/README.md\n@@ -0,0 +1,142 @@\n+# OpenGauss POC — Hermes Schema Compatibility Spike\n+\n+> **Spike timebox:** 1-2 days. Validates PG compatibility of the Hermes SQLite schema.\n+> For production HA deployment, see [`../opengauss-ha/`](../opengauss-ha/).\n+\n+## TL;DR\n+\n+OpenGauss 5.0 is **highly compatible** with the Hermes schema. All tables, indexes,\n+foreign keys, and constraints load without modification. The only meaningful difference\n+is the **full-text search layer** (SQLite FTS5 → tsvector/GIN), which requires a small\n+backend adaptation. Everything else is transparent.\n+\n+## Quick Start\n+\n+```bash\n+cd infrastructure/opengauss-poc\n+\n+# Start the container\n+docker compose up -d\n+docker compose logs -f og-poc # wait for \"database system is ready\"\n+\n+# Install Python dependency\n+pip install psycopg2-binary\n+\n+# Run validation + benchmark\n+python validate.py\n+```\n+\n+## What Gets Tested\n+\n+| Test | What it checks |\n+|------|---------------|\n+| Version probe | OpenGauss version and PG compatibility string |\n+| INSERT session | Sessions table, FK constraints, defaults |\n+| INSERT messages | Messages table, BIGSERIAL PK |\n+| SELECT | Row retrieval, ordering |\n+| UPDATE | Counter increments |\n+| FTS search | `tsvector` GIN index via `@@ to_tsquery()` |\n+| Partial unique index | `WHERE title IS NOT NULL` enforcement |\n+| DELETE cascade | Messages + sessions cleanup |\n+| Benchmark | Write/read latency vs SQLite baseline |\n+\n+## Incompatibilities Found\n+\n+### 1. FTS5 → tsvector/GIN (requires code change)\n+\n+**SQLite** (`hermes_state.py`):\n+```sql\n+CREATE VIRTUAL TABLE messages_fts USING fts5(content, content=messages, content_rowid=id);\n+-- search:\n+SELECT snippet(messages_fts, 0, '>>>', '<<<', '...', 40) FROM messages_fts\n+JOIN messages m ON m.id = messages_fts.rowid\n+WHERE messages_fts MATCH ?\n+```\n+\n+**OpenGauss equivalent** (this POC uses):\n+```sql\n+content_tsv TSVECTOR -- trigger-maintained\n+\n+CREATE INDEX idx_messages_fts ON messages USING GIN(content_tsv);\n+\n+-- BEFORE INSERT OR UPDATE trigger keeps content_tsv in sync\n+CREATE OR REPLACE FUNCTION messages_fts_update() RETURNS trigger AS $$\n+BEGIN\n+ NEW.content_tsv := to_tsvector('english', coalesce(NEW.content, ''));\n+ RETURN NEW;\n+END;\n+$$ LANGUAGE plpgsql;\n+\n+-- search:\n+SELECT * FROM messages WHERE content_tsv @@ to_tsquery('english', ?) ...\n+-- snippets via: ts_headline('english', content, query, 'StartSel=>>>,StopSel=<<<')\n+```\n+\n+**Impact:** `hermes_state.py::search_messages()` needs a PostgreSQL backend variant.\n+The trigger approach mirrors SQLite's FTS5 trigger pattern — familiar and well-understood.\n+\n+### 2. Type mappings (transparent)\n+\n+| SQLite |
{"prompt": "## Context\nDBMind is OpenGauss's built-in AI engine for autonomous database operations. Financial deployments report 80% reduction in manual tuning and 300% query acceleration.\n\n## Objective\nEnable DBMind for Hermes-agent workloads so the database self-optimizes as agent query patterns evolve.\n\n## Acceptance Criteria\n- [ ] Enable WDR (Workload Diagnosis Report) snapshots for baseline\n- [ ] Configure DBMind automatic parameter optimization\n- [ ] Enable slow SQL diagnosis with root cause analysis\n- [ ] Set up predictive performance analysis / capacity forecasting\n- [ ] Enable learned index advisor — let it recommend indexes based on actual agent workload\n- [ ] Document: before/after metrics for representative agent operations\n\n## Depends On\n- OpenGauss Docker POC", "chosen": "diff --git a/infrastructure/opengauss-ha/README.md b/infrastructure/opengauss-ha/README.md\nindex 243865b6..3e12d12f 100644\n--- a/infrastructure/opengauss-ha/README.md\n+++ b/infrastructure/opengauss-ha/README.md\n@@ -47,23 +47,27 @@ All three nodes participate in the DCF (Paxos) consensus group. If the primary f\n \n ```\n infrastructure/opengauss-ha/\n-├── docker-compose.yml # Main deployment (3 nodes)\n+├── docker-compose.yml # Main deployment (3 DB nodes + DBMind service)\n ├── .env.example # Environment variables template\n ├── config/\n │ ├── primary/\n-│ │ ├── postgresql.conf # Primary config (WAL, sync repl, DCF, audit)\n+│ │ ├── postgresql.conf # Primary config (WAL, sync repl, DCF, DBMind tracking)\n │ │ ├── pg_hba.conf # Primary auth rules\n-│ │ └── setup-primary.sh # Init: replication user, DCF, monitoring views\n-│ └── standby/\n-│ ├── postgresql.conf # Standby config\n-│ ├── pg_hba.conf # Standby auth rules\n-│ └── setup-standby.sh # Init: base backup, join DCF cluster\n+│ │ └── setup-primary.sh # Init: replication user, DCF, DBMind user\n+│ ├── standby/\n+│ │ ├── postgresql.conf # Standby config\n+│ │ ├── pg_hba.conf # Standby auth rules\n+│ │ └── setup-standby.sh # Init: base backup, join DCF cluster\n+│ └── dbmind/\n+│ └── dbmind.conf # DBMind autonomous tuning configuration\n ├── scripts/\n+│ ├── dbmind-setup.sh # DBMind: init meta-DB schema + start service\n │ ├── test-failover.sh # Automated failover test (RTO/RPO verification)\n │ ├── backup.sh # Physical/logical backup with retention\n │ ├── restore.sh # Restore from backup (physical or logical)\n │ └── health-check.sh # Cluster health dashboard\n └── monitoring/\n+ ├── dbmind-advisor.sh # DBMind: slow SQL, index advice, params, forecast\n ├── wdr-snapshots.sh # WDR snapshot management and reports\n └── monitor-cluster.sh # Continuous monitoring (text/JSON output)\n ```\n@@ -155,11 +159,81 @@ Key settings in `config/primary/postgresql.conf`:\n | `audit_enabled` | `on` | Comprehensive audit logging |\n | `archive_mode` | `on` | WAL archiving for PITR |\n \n+## DBMind — Autonomous Database Tuning\n+\n+DBMind is OpenGauss's built-in AI engine for self-optimizing database operations. It ships inside the same OpenGauss image as `gs_dbmind` and runs as the `og-dbmind` service in this deployment.\n+\n+### What DBMind provides\n+\n+| Feature | Component | What it does |\n+|---------|-----------|--------------|\n+| **Slow SQL diagnosis** | `slow_sql_diagnosis` | Root-cause analysis — access paths, lock waits, missing indexes |\n+| **Index advisor** | `index_advisor` | Learns from actual agent query patterns; recommends B-tree/hash indexes |\n+| **Parameter tuning** | `xtuner` | Compares GUC values against workload; proposes `ALTER SYSTEM` changes |\n+| **Capacity forecast** |
{"prompt": "Gitea at http://143.198.27.163:3000 not responding. Check VPS 143.198.27.163.\n\n---\n*Auto-filed by loop-watchdog*", "chosen": "diff --git a/tests/test_timmy.py b/tests/test_timmy.py\nindex 2284fbde..53034e70 100644\n--- a/tests/test_timmy.py\n+++ b/tests/test_timmy.py\n@@ -386,3 +386,117 @@ class TestOrchestratorRunCycle:\n orch.run_cycle()\n mock_reviews.assert_called_once()\n mock_stuck.assert_called_once()\n+\n+\n+# ---------------------------------------------------------------------------\n+# GiteaClient.open_issue\n+# ---------------------------------------------------------------------------\n+\n+class TestGiteaClientOpenIssue:\n+ def test_open_issue_posts_correct_path(self):\n+ client = GiteaClient(base_url=\"http://h:3000\", token=\"t\", repo=\"o/r\")\n+ with patch.object(client, \"_post\", return_value={\"number\": 99}) as mock_post:\n+ result = client.open_issue(\"Bug found\", \"It broke\")\n+ mock_post.assert_called_once_with(\n+ \"/repos/o/r/issues\",\n+ {\"title\": \"Bug found\", \"body\": \"It broke\"},\n+ )\n+ assert result[\"number\"] == 99\n+\n+ def test_open_issue_empty_body(self):\n+ client = GiteaClient(base_url=\"http://h:3000\", token=\"t\", repo=\"o/r\")\n+ with patch.object(client, \"_post\", return_value={\"number\": 1}) as mock_post:\n+ client.open_issue(\"Title only\")\n+ mock_post.assert_called_once_with(\n+ \"/repos/o/r/issues\",\n+ {\"title\": \"Title only\", \"body\": \"\"},\n+ )\n+\n+\n+# ---------------------------------------------------------------------------\n+# GiteaWatchdog\n+# ---------------------------------------------------------------------------\n+\n+from timmy.watchdog import GiteaWatchdog, WATCHDOG_ISSUE_TITLE\n+\n+\n+def _watchdog(dry_run: bool = False) -> GiteaWatchdog:\n+ return GiteaWatchdog(\n+ gitea_url=\"http://h:3000\",\n+ gitea_token=\"tok\",\n+ repo=\"o/r\",\n+ dry_run=dry_run,\n+ )\n+\n+\n+class TestGiteaWatchdogHealthProbe:\n+ def test_reachable_when_200(self):\n+ wdog = _watchdog()\n+ mock_resp = MagicMock()\n+ mock_resp.__enter__ = lambda s: s\n+ mock_resp.__exit__ = MagicMock(return_value=False)\n+ with patch(\"urllib.request.urlopen\", return_value=mock_resp):\n+ assert wdog.is_gitea_reachable() is True\n+\n+ def test_unreachable_on_exception(self):\n+ wdog = _watchdog()\n+ with patch(\"urllib.request.urlopen\", side_effect=OSError(\"conn refused\")):\n+ assert wdog.is_gitea_reachable() is False\n+\n+\n+class TestGiteaWatchdogRunOnce:\n+ def test_files_issue_when_down_and_no_existing_issue(self):\n+ wdog = _watchdog()\n+ with patch.object(wdog, \"is_gitea_reachable\", return_value=False):\n+ with patch.object(wdog.gitea, \"list_issues\", return_value=[]):\n+ with patch.object(wdog.gitea, \"open_issue\") as mock_open:\n+ result = wdog.run_once()\n+ assert result is False\n+ mock_open.assert_called_once()\n+ title, _ = mock_open.call_args[0]\n+ assert title == WATCHDOG_ISSUE_TITLE\n+\n+ def test_no_duplicate_issue_when_already_open(self):\n+ wdog = _watchdog()\n+ existing = {\"number\": 5, \"title\": WATCHDOG_ISSUE_TITLE}\n+ with patch.object(wdog, \"is_gitea_reachable\", return_value=False):\n+ with patch.object(wdog.gitea, \"list_issues\", return_value=[existing]):\n+ with patch.object(wdog.gitea, \"open_issue\") as mock_open:\n+ wdog.run_once()\n+ mock_open.assert_not_called()\n+\n+ def test_closes_issue_when_back_up(self):\n+ wdog = _watchdog()\n+ existing = {\"number\": 7, \"title\": WATCHDOG_ISSUE_TITLE}\n+ with patch.object(wdog, \"is_gitea_reachable\", return_value=True):\n+ with patch.object(wdog.gitea, \"list_issues\", return_value=[existing]):\n+
{"prompt": "## Summary\nTimmy (the orchestrator) went down and needs to come back online. The previous backend is gone. We need a new backend — **Ollama** is the primary choice so Timmy can run locally and orchestrate the Council of Wizards (Claude, Gemini, Kimi agents).\n\n## Requirements\n\n### Phase 1: Ollama Backend\n- [ ] Set up Ollama as the LLM backend for hermes-agent\n- [ ] Ensure Timmy can boot and respond to basic prompts\n- [ ] Connect Timmy to Gitea API for issue/PR awareness\n\n### Phase 2: Orchestration Role\n- [ ] Timmy should be able to triage and assign issues to agents (claude, gemini, kimi)\n- [ ] Timmy should review PRs from agents before merge (quality gate)\n- [ ] Timmy should prioritize the backlog and make assignment decisions\n- [ ] Timmy should monitor agent health and reassign stuck issues\n\n### Phase 3: Alternative Backend (stretch)\n- [ ] Investigate Google AI Ultra (Gemini) as alternative backend since user has paid subscription\n- [ ] Allow hot-swapping backends (Ollama ↔ Gemini API)\n\n## Context\n- Gitea: http://143.198.27.163:3000\n- Agent loops: claude-loop.sh, gemini-loop.sh, kimi-loop.sh in hermes/hermes-config\n- The agent loops currently self-assign from the backlog. Timmy should take over assignment decisions.\n- Ollama should run locally on the Mac\n\n## Priority\nURGENT — Timmy is the brain of the operation and is currently offline.", "chosen": "diff --git a/pyproject.toml b/pyproject.toml\nindex 7e419772..ebc4ec5d 100644\n--- a/pyproject.toml\n+++ b/pyproject.toml\n@@ -84,12 +84,13 @@ all = [\n hermes = \"hermes_cli.main:main\"\n hermes-agent = \"run_agent:main\"\n hermes-acp = \"acp_adapter.entry:main\"\n+timmy = \"timmy.__main__:main\"\n \n [tool.setuptools]\n py-modules = [\"run_agent\", \"model_tools\", \"toolsets\", \"batch_runner\", \"trajectory_compressor\", \"toolset_distributions\", \"cli\", \"hermes_constants\", \"hermes_state\", \"hermes_time\", \"mini_swe_runner\", \"rl_cli\", \"utils\"]\n \n [tool.setuptools.packages.find]\n-include = [\"agent\", \"tools\", \"tools.*\", \"hermes_cli\", \"gateway\", \"gateway.*\", \"cron\", \"honcho_integration\", \"acp_adapter\"]\n+include = [\"agent\", \"tools\", \"tools.*\", \"hermes_cli\", \"gateway\", \"gateway.*\", \"cron\", \"honcho_integration\", \"acp_adapter\", \"timmy\"]\n \n [tool.pytest.ini_options]\n testpaths = [\"tests\"]\ndiff --git a/tests/test_timmy.py b/tests/test_timmy.py\nnew file mode 100644\nindex 00000000..2284fbde\n--- /dev/null\n+++ b/tests/test_timmy.py\n@@ -0,0 +1,388 @@\n+\"\"\"\n+Tests for the timmy orchestrator — Ollama-backed multi-agent coordinator.\n+\n+All external I/O (Gitea API, LLM) is mocked so tests run offline.\n+\"\"\"\n+\n+from __future__ import annotations\n+\n+import json\n+from unittest.mock import MagicMock, patch\n+\n+import pytest\n+\n+from timmy.gitea_client import GiteaClient\n+from timmy.orchestrator import (\n+ AGENT_ROSTER,\n+ TimmyOrchestrator,\n+ review_pr,\n+ triage_issue,\n+)\n+\n+\n+# ---------------------------------------------------------------------------\n+# Helpers\n+# ---------------------------------------------------------------------------\n+\n+def _issue(number: int, title: str = \"Fix bug\", body: str = \"\", assignees=None) -> dict:\n+ return {\n+ \"number\": number,\n+ \"title\": title,\n+ \"body\": body,\n+ \"assignees\": assignees or [],\n+ \"pull_request\": None,\n+ \"updated_at\": \"2026-01-01T00:00:00Z\",\n+ \"created_at\": \"2026-01-01T00:00:00Z\",\n+ }\n+\n+\n+def _pr(number: int, author: str = \"claude\", title: str = \"feat: something\") -> dict:\n+ return {\n+ \"number\": number,\n+ \"title\": title,\n+ \"body\": \"Fixes #1\",\n+ \"user\": {\"login\": author},\n+ }\n+\n+\n+# ---------------------------------------------------------------------------\n+# GiteaClient unit tests\n+# ---------------------------------------------------------------------------\n+\n+class TestGiteaClient:\n+ def test_defaults(self):\n+ client = GiteaClien
{"prompt": "## Context\nFor production Hermes-agent, we need HA. OpenGauss supports one-primary-multiple-standby with Paxos consensus, achieving 99.99% reliability with 3 copies. RTO < 10s with synchronous replication.\n\n## Objective\nDeploy OpenGauss in HA configuration suitable for the Tower infrastructure.\n\n## Acceptance Criteria\n- [ ] Primary + 2 standby nodes running (Docker Compose or bare metal)\n- [ ] Paxos consensus verified — automatic failover test\n- [ ] RPO=0 (synchronous replication) confirmed\n- [ ] RTO < 10s verified via kill-primary test\n- [ ] Backup/restore procedures documented and tested\n- [ ] Monitoring integration (WDR snapshots, pg_stat_activity)\n\n## Depends On\n- OpenGauss Docker POC\n- TDE + Audit (should be enabled from the start)", "chosen": "diff --git a/infrastructure/opengauss-ha/.env.example b/infrastructure/opengauss-ha/.env.example\nnew file mode 100644\nindex 00000000..00cddb8e\n--- /dev/null\n+++ b/infrastructure/opengauss-ha/.env.example\n@@ -0,0 +1,13 @@\n+# OpenGauss HA Environment Variables\n+# Copy to .env and customize before deployment\n+\n+# Database superuser password (must meet complexity requirements)\n+GS_PASSWORD=Hermes@2026!\n+\n+# Replication user password\n+REPL_PASSWORD=Repl@2026!\n+\n+# Port mappings\n+OG_PRIMARY_PORT=15432\n+OG_STANDBY1_PORT=15433\n+OG_STANDBY2_PORT=15434\ndiff --git a/infrastructure/opengauss-ha/README.md b/infrastructure/opengauss-ha/README.md\nnew file mode 100644\nindex 00000000..243865b6\n--- /dev/null\n+++ b/infrastructure/opengauss-ha/README.md\n@@ -0,0 +1,166 @@\n+# OpenGauss HA Deployment — Primary-Standby with Paxos Consensus\n+\n+High-availability OpenGauss deployment for the Hermes-agent Tower infrastructure.\n+\n+**Architecture:** 1 primary + 2 standby nodes with DCF (Distributed Consensus Framework) based on Paxos for automatic leader election and failover.\n+\n+| Property | Target | How |\n+|----------|--------|-----|\n+| **RPO** | 0 (zero data loss) | Synchronous replication (`synchronous_commit = on`) |\n+| **RTO** | < 10 seconds | DCF/Paxos automatic failover with 3s election timeout |\n+| **Reliability** | 99.99% | 3-copy redundancy, Paxos consensus |\n+\n+## Quick Start\n+\n+```bash\n+# 1. Configure\n+cp .env.example .env\n+# Edit .env — change passwords for production!\n+\n+# 2. Start cluster\n+docker compose up -d\n+\n+# 3. Verify health\n+./scripts/health-check.sh\n+\n+# 4. Run failover test\n+./scripts/test-failover.sh\n+```\n+\n+## Architecture\n+\n+```\n+┌─────────────┐ sync repl ┌──────────────┐\n+│ og-primary │────────────────► │ og-standby1 │\n+│ (LEADER) │ │ (FOLLOWER) │\n+│ :15432 │ sync repl │ :15433 │\n+│ │────────────────► ├──────────────┤\n+└──────┬──────┘ │ og-standby2 │\n+ │ │ (FOLLOWER) │\n+ │ DCF/Paxos consensus │ :15434 │\n+ └──────────────────────────┴───────────────┘\n+```\n+\n+All three nodes participate in the DCF (Paxos) consensus group. If the primary fails, the remaining nodes hold a Paxos election and promote one standby to primary within seconds.\n+\n+## Directory Structure\n+\n+```\n+infrastructure/opengauss-ha/\n+├── docker-compose.yml # Main deployment (3 nodes)\n+├── .env.example # Environment variables template\n+├── config/\n+│ ├── primary/\n+│ │ ├── postgresql.conf # Primary config (WAL, sync repl, DCF, audit)\n+│ │ ├── pg_hba.conf # Primary auth rules\n+│ │ └── setup-primary.sh # Init: replication user, DCF, monitoring views\n+│ └── standby/\n+│ ├── postgresql.conf # Standby config\n+
{"prompt": "[orchestrator] Hierarchical task decomposition — vision to atomic tickets\n\nCurrent: humans create issues manually or in bulk with vague descriptions.\nBetter: orchestrator reads SOUL.md + milestone goals, decomposes into atomic tickets with clear acceptance criteria.\n\nEach generated ticket must have:\n- Specific files to modify\n- Expected behavior change\n- Verification command (e.g., 'node --check app.js' or 'curl endpoint')\n- Size estimate (S/M/L)\n\nThe workforce-manager.py already does auto-assignment. This adds the layer above it: auto-generation of well-scoped work.\n\nAcceptance criteria:\n- [ ] Given a milestone description, generates 5+ atomic tickets\n- [ ] Each ticket has files, verification command, and size\n- [ ] Tickets are created on Gitea via API\n- [ ] At least 3 of 5 generated tickets result in merged PRs (proves they're actionable)", "chosen": "diff --git a/tests/test_timmy.py b/tests/test_timmy.py\nindex 975f1f69..8c2d43aa 100644\n--- a/tests/test_timmy.py\n+++ b/tests/test_timmy.py\n@@ -15,6 +15,8 @@ from timmy.gitea_client import GiteaClient\n from timmy.orchestrator import (\n AGENT_ROSTER,\n TimmyOrchestrator,\n+ _format_ticket_body,\n+ decompose_milestone,\n review_pr,\n triage_issue,\n )\n@@ -519,3 +521,190 @@ class TestGiteaWatchdogRunOnce:\n with patch.object(wdog.gitea, \"close_issue\") as mock_close:\n wdog.run_once()\n mock_close.assert_not_called()\n+\n+\n+# ---------------------------------------------------------------------------\n+# decompose_milestone\n+# ---------------------------------------------------------------------------\n+\n+def _ticket(**kwargs) -> dict:\n+ base = {\n+ \"title\": \"Do something\",\n+ \"body\": \"Details here\",\n+ \"files\": [\"foo/bar.py\"],\n+ \"behavior\": \"It now works\",\n+ \"verify\": \"pytest tests/\",\n+ \"size\": \"S\",\n+ }\n+ base.update(kwargs)\n+ return base\n+\n+\n+def _tickets_json(n: int = 5) -> str:\n+ tickets = [_ticket(title=f\"Ticket {i}\") for i in range(n)]\n+ return json.dumps(tickets)\n+\n+\n+class TestDecomposeMilestone:\n+ def _ask_returns(self, response: str):\n+ return patch(\"timmy.orchestrator._ask\", return_value=response)\n+\n+ def test_returns_list_of_tickets(self):\n+ with self._ask_returns(_tickets_json(5)):\n+ result = decompose_milestone(\"Add OAuth2 login\")\n+ assert len(result) == 5\n+ assert result[0][\"title\"] == \"Ticket 0\"\n+\n+ def test_returns_more_than_five_tickets(self):\n+ with self._ask_returns(_tickets_json(7)):\n+ result = decompose_milestone(\"Large milestone\")\n+ assert len(result) == 7\n+\n+ def test_normalises_size_field(self):\n+ tickets = [_ticket(size=\"L\"), _ticket(size=\"S\"), _ticket(size=\"invalid\")]\n+ with self._ask_returns(json.dumps(tickets)):\n+ result = decompose_milestone(\"Some work\")\n+ assert result[0][\"size\"] == \"L\"\n+ assert result[1][\"size\"] == \"S\"\n+ assert result[2][\"size\"] == \"M\" # invalid → default M\n+\n+ def test_title_truncated_to_80_chars(self):\n+ long_title = \"x\" * 200\n+ with self._ask_returns(json.dumps([_ticket(title=long_title)])):\n+ result = decompose_milestone(\"m\")\n+ assert len(result[0][\"title\"]) <= 80\n+\n+ def test_strips_markdown_fences(self):\n+ payload = \"```json\\n\" + _tickets_json(5) + \"\\n```\"\n+ with self._ask_returns(payload):\n+ result = decompose_milestone(\"Fenced response\")\n+ assert len(result) == 5\n+\n+ def test_returns_empty_on_bad_response(self):\n+ with self._ask_returns(\"I cannot do this\"):\n+ result = decompose_milestone(\"Something\")\n+ assert result == []\n+\n+ def test_returns_empty_on_empty_response(self):\n+ with self._ask_returns(\"\"):\n+ result = decompose_milestone(\"Something\")\n+ assert result == []\n+\n+
{"prompt": "[orchestrator] Shared memory layer — agents check if work exists before starting\n\nAgents currently duplicate work (e.g., multiple NotebookLM PRs for the same issue). \n\nAdd a lightweight shared state that agents query before starting:\n- SQLite or JSON file at ~/.hermes/state/work-in-progress.json\n- Before starting an issue: check if another agent has an open PR for it\n- Before creating a PR: check if a PR already exists for that branch\n- The agent-loop.sh already has lock files — extend this to cross-agent visibility\n\nAcceptance criteria:\n- [ ] Two agents picking from the same backlog never work on the same issue simultaneously\n- [ ] Console proof: run two loops, show they pick DIFFERENT issues\n- [ ] No duplicate PRs created for the same issue number", "chosen": "diff --git a/hermes_wip.py b/hermes_wip.py\nnew file mode 100644\nindex 00000000..f59651a5\n--- /dev/null\n+++ b/hermes_wip.py\n@@ -0,0 +1,228 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+Work-in-Progress (WIP) shared memory layer for Hermes Agent.\n+\n+Provides a lightweight SQLite-backed state store that lets multiple agent\n+instances coordinate so they never work on the same issue simultaneously.\n+\n+The database lives at ~/.hermes/state/wip.db (or $HERMES_HOME/state/wip.db).\n+It is also mirrored as a human-readable JSON snapshot at\n+~/.hermes/state/work-in-progress.json after every mutation.\n+\n+Key design decisions:\n+- INSERT OR IGNORE on issue_number (PRIMARY KEY) gives atomic \"claim or fail\"\n+ semantics — only one process wins regardless of how many race at once.\n+- WAL mode allows concurrent reads while a write is in flight.\n+- Stale-claim pruning guards against crashed agents that never released.\n+- JSON mirror at work-in-progress.json lets shell scripts inspect state\n+ without a Python dependency.\n+\n+Usage::\n+\n+ from hermes_wip import WorkInProgressDB\n+\n+ wip = WorkInProgressDB()\n+ if wip.claim_issue(issue_number=52, agent_id=\"claude-w3-52\", branch=\"claude/issue-52\"):\n+ try:\n+ # do work …\n+ wip.set_pr_url(52, \"https://…/pulls/7\")\n+ finally:\n+ wip.release_issue(52)\n+ else:\n+ print(\"Issue 52 is already claimed — skipping\")\n+\"\"\"\n+\n+import json\n+import os\n+import sqlite3\n+import time\n+from pathlib import Path\n+from typing import Any, Dict, List, Optional\n+\n+\n+DEFAULT_WIP_DB_PATH = (\n+ Path(os.getenv(\"HERMES_HOME\", Path.home() / \".hermes\")) / \"state\" / \"wip.db\"\n+)\n+DEFAULT_WIP_JSON_PATH = (\n+ Path(os.getenv(\"HERMES_HOME\", Path.home() / \".hermes\"))\n+ / \"state\"\n+ / \"work-in-progress.json\"\n+)\n+\n+_SCHEMA_SQL = \"\"\"\n+CREATE TABLE IF NOT EXISTS wip_issues (\n+ issue_number INTEGER PRIMARY KEY,\n+ agent_id TEXT NOT NULL,\n+ branch TEXT NOT NULL,\n+ claimed_at REAL NOT NULL,\n+ pr_url TEXT\n+);\n+\"\"\"\n+\n+\n+class WorkInProgressDB:\n+ \"\"\"\n+ SQLite-backed WIP tracker for cross-agent issue deduplication.\n+\n+ Thread-safe for the typical multi-process agent pattern (each process\n+ opens its own connection; WAL mode handles concurrent access).\n+ \"\"\"\n+\n+ def __init__(\n+ self,\n+ db_path: Optional[Path] = None,\n+ json_path: Optional[Path] = None,\n+ ):\n+ self.db_path = db_path or DEFAULT_WIP_DB_PATH\n+ self.json_path = json_path or (self.db_path.parent / \"work-in-progress.json\")\n+ self.db_path.parent.mkdir(parents=True, exist_ok=True)\n+\n+ self._conn = sqlite3.connect(\n+ str(self.db_path),\n+ check_same_thread=False,\n+ timeout=10.0,\n+ )\n+ self._conn.row_factory = sqlite3.Row\n+ self._conn.execute(\"PRAGMA journal_mode=WAL\")\n+ self._conn.executescript(_SCHEMA_SQL)\n+ self._conn.commit()\n+\n+ # =========================================================================\n+ # Core claim / release API\n+ # =========================================================================\
{"prompt": "[Security] TDE + Unified Audit for Agent Data at Rest\n\n## Context\nHermes-agent handles sensitive data: user conversations, memory, credentials in transit.\nOpenGauss provides TDE (Transparent Data Encryption) with 3-layer key management (RK/CMK/DEK) and tamper-proof audit logs with cryptographic chaining.\n\n## Objective\nEnable data-at-rest encryption and audit logging for all agent data stores.\n\n## Acceptance Criteria\n- [ ] Enable TDE on sensitive tables (memory, sessions, credentials)\n- [ ] Configure 3-layer key hierarchy (RK -> CMK -> DEK)\n- [ ] Enable unified audit framework with cryptographic chaining\n- [ ] Configure RBAC: separate agent-runtime user from admin user from audit user\n- [ ] Test: verify encrypted data is unreadable on raw disk\n- [ ] Test: verify audit log detects tampering\n\n## Depends On\n- OpenGauss Docker POC", "chosen": "diff --git a/infrastructure/opengauss-ha/config/primary/pg_hba.conf b/infrastructure/opengauss-ha/config/primary/pg_hba.conf\nindex 4f105217..e0b7b7d3 100644\n--- a/infrastructure/opengauss-ha/config/primary/pg_hba.conf\n+++ b/infrastructure/opengauss-ha/config/primary/pg_hba.conf\n@@ -2,7 +2,19 @@\n # TYPE DATABASE USER ADDRESS METHOD\n \n # Local connections\n-local all all trust\n+local all all scram-sha-256\n+\n+# Admin user access\n+host all admin_user 127.0.0.1/32 scram-sha-256\n+host all admin_user 172.28.0.0/16 scram-sha-256\n+\n+# Agent runtime user access\n+host hermes_poc agent_runtime_user 127.0.0.1/32 scram-sha-256\n+host hermes_poc agent_runtime_user 172.28.0.0/16 scram-sha-256\n+\n+# Audit user access (read-only views)\n+host hermes_poc audit_user 127.0.0.1/32 scram-sha-256\n+host hermes_poc audit_user 172.28.0.0/16 scram-sha-256\n \n # IPv4 local connections\n host all all 127.0.0.1/32 sha256\ndiff --git a/infrastructure/opengauss-ha/config/primary/setup-tde-and-rbac.sh b/infrastructure/opengauss-ha/config/primary/setup-tde-and-rbac.sh\nnew file mode 100755\nindex 00000000..a19292c6\n--- /dev/null\n+++ b/infrastructure/opengauss-ha/config/primary/setup-tde-and-rbac.sh\n@@ -0,0 +1,18 @@\n+#!/bin/bash\n+set -e\n+\n+# Wait for OpenGauss to be ready\n+until gsql -h localhost -U gaussdb -d postgres -c 'SELECT 1'; do\n+ >&2 echo \"OpenGauss is unavailable - sleeping\"\n+ sleep 1\n+done\n+\n+>&2 echo \"OpenGauss is up - executing TDE and RBAC setup\"\n+\n+# Enable TDE at instance level\n+gs_guc set -Z datanode -N all -I all -c \"enable_tde=on\"\n+\n+# Execute the SQL setup script\n+gsql -h localhost -U gaussdb -d postgres -f /docker-entrypoint-initdb.d/tde_rbac_setup.sql -C\n+\n+>&2 echo \"TDE and RBAC setup complete.\"\n\\ No newline at end of file\ndiff --git a/infrastructure/opengauss-ha/config/primary/tde_rbac_setup.sql b/infrastructure/opengauss-ha/config/primary/tde_rbac_setup.sql\nnew file mode 100644\nindex 00000000..a7f2302f\n--- /dev/null\n+++ b/infrastructure/opengauss-ha/config/primary/tde_rbac_setup.sql\n@@ -0,0 +1,52 @@\n+-- Connect to the hermes_poc database\n+CREATE DATABASE hermes_poc;\n+\\c hermes_poc\n+\n+-- Enable TDE at the instance level\n+-- This command needs to be run via gs_guc tool, not directly in SQL.\n+-- The setup script will execute `gs_guc set -Z datanode -N all -I all -c \"enable_tde=on\"`\n+-- For the purpose of this SQL script, we will assume TDE is enabled.\n+\n+-- 1. TDE Setup (Local KMS)\n+\n+-- Create Client Master Key (CMK)\n+-- ALGORITHM RSA_2048 or RSA3072 or SM2\n+-- KEY_PATH is relative to LOCALKMS_FILE_PATH which is /var/lib/opengauss/localkms\n+CREATE CLIENT MASTER KEY hermes_cmk WITH (KEY_STORE = localkms, KEY_PATH = 'hermes_cmk_key', ALGORITHM = RSA_2048, KEY_PASSWORD = 'HermesCMKPassword2026!');\n+\n+-- Create Column Encryption Key (CEK)\n+CREATE COLUMN ENCRYPTION KEY hermes_cek WITH (CMK_N
{"prompt": "[watchdog] Gitea unreachable\n\nGitea at http://143.198.27.163:3000 not responding. Check VPS 143.198.27.163.\n\n---\n*Auto-filed by loop-watchdog*", "chosen": "diff --git a/tests/test_timmy.py b/tests/test_timmy.py\nindex 53034e70..975f1f69 100644\n--- a/tests/test_timmy.py\n+++ b/tests/test_timmy.py\n@@ -233,6 +233,15 @@ class TestOrchestratorTriage:\n orch.run_triage()\n mock_triage.assert_not_called()\n \n+ def test_skips_watchdog_issues(self):\n+ from timmy.watchdog import WATCHDOG_ISSUE_TITLE\n+ orch = self._make_orchestrator()\n+ watchdog_issue = _issue(99, title=WATCHDOG_ISSUE_TITLE)\n+ with patch.object(orch.gitea, \"list_issues\", return_value=[watchdog_issue]):\n+ with patch(\"timmy.orchestrator.triage_issue\") as mock_triage:\n+ orch.run_triage()\n+ mock_triage.assert_not_called()\n+\n def test_assigns_unassigned_issue(self):\n orch = self._make_orchestrator(dry_run=False)\n issue = _issue(3)\n@@ -367,6 +376,16 @@ class TestOrchestratorStuckDetection:\n orch.run_stuck_detection()\n mock_comment.assert_not_called()\n \n+ def test_skips_watchdog_issues(self):\n+ from timmy.watchdog import WATCHDOG_ISSUE_TITLE\n+ orch = self._make_orchestrator()\n+ # Watchdog issue assigned to claude, sitting idle for years\n+ issue = _issue(54, title=WATCHDOG_ISSUE_TITLE, assignees=[{\"login\": \"claude\"}])\n+ issue[\"updated_at\"] = \"2020-01-01T00:00:00Z\"\n+ with patch.object(orch.gitea, \"list_issues\", return_value=[issue]):\n+ with patch.object(orch.gitea, \"add_issue_comment\") as mock_comment:\n+ orch.run_stuck_detection()\n+ mock_comment.assert_not_called()\n \n # ---------------------------------------------------------------------------\n # TimmyOrchestrator — run_cycle error isolation\ndiff --git a/timmy/orchestrator.py b/timmy/orchestrator.py\nindex f0fb1922..a1f7fa4e 100644\n--- a/timmy/orchestrator.py\n+++ b/timmy/orchestrator.py\n@@ -21,6 +21,7 @@ from datetime import datetime, timezone\n from typing import Optional\n \n from timmy.gitea_client import GiteaClient\n+from timmy.watchdog import WATCHDOG_ISSUE_TITLE\n \n logger = logging.getLogger(__name__)\n \n@@ -169,7 +170,12 @@ class TimmyOrchestrator:\n \"\"\"Triage open unassigned issues and assign them to the best agent.\"\"\"\n logger.info(\"[timmy] Running issue triage…\")\n issues = self.gitea.list_issues(state=\"open\", limit=50)\n- unassigned = [i for i in issues if not i.get(\"assignees\") and not i.get(\"pull_request\")]\n+ unassigned = [\n+ i for i in issues\n+ if not i.get(\"assignees\")\n+ and not i.get(\"pull_request\")\n+ and i.get(\"title\") != WATCHDOG_ISSUE_TITLE\n+ ]\n logger.info(\"[timmy] %d unassigned issues found\", len(unassigned))\n \n for issue in unassigned:\n@@ -239,6 +245,8 @@ class TimmyOrchestrator:\n for issue in issues:\n if not issue.get(\"assignees\"):\n continue\n+ if issue.get(\"title\") == WATCHDOG_ISSUE_TITLE:\n+ continue\n updated_raw = issue.get(\"updated_at\") or issue.get(\"created_at\", \"\")\n try:\n updated = datetime.fromisoformat(updated_raw.replace(\"Z\", \"+00:00\"))\n", "rejected": "", "metadata": {"chosen_pr": 49, "rejected_pr": null, "issue_number": 48, "agent": "claude", "same_issue_pair": false, "chosen_manual_signal": null, "rejected_manual_signal": null, "chosen_merged_at": "2026-03-24T03:13:53Z"}}
{"prompt": "[Research] HTAP Feasibility — Unified Operational + Analytical Processing\n\n## Context\nOpenGauss fusion engine routes queries between row-optimized (OLTP) and columnar (OLAP) storage transparently. This could eliminate separate analytics infrastructure or ETL pipelines.\n\n## Objective\nEvaluate whether HTAP capabilities can serve both real-time agent operations AND historical analysis from a single database instance.\n\n## Questions to Answer\n- [ ] Can we run agent loop retro analysis directly against the operational DB?\n- [ ] What is the performance impact of analytical queries on OLTP operations?\n- [ ] Does the fusion engine correctly route queries without application changes?\n- [ ] Can columnar storage handle the loop cycle/triage data currently in JSONL files?\n- [ ] Cost/benefit vs maintaining separate JSONL files + SQLite for analytics\n\n## Potential Value\n- Eliminate .loop/retro/*.jsonl files — everything in the DB\n- Real-time dashboards against live agent data\n- No ETL, no sync, no stale data\n\n## Depends On\n- OpenGauss Docker POC\n- MOT for Agent State (to have representative table structure)", "chosen": "diff --git a/docs/htap_feasibility_research.md b/docs/htap_feasibility_research.md\nnew file mode 100644\nindex 00000000..2ca58e28\n--- /dev/null\n+++ b/docs/htap_feasibility_research.md\n@@ -0,0 +1,45 @@\n+# HTAP Feasibility Research: Unified Operational + Analytical Processing with OpenGauss\n+\n+## Objective\n+Evaluate whether OpenGauss's HTAP capabilities can serve both real-time agent operations AND historical analysis from a single database instance, potentially eliminating separate analytics infrastructure or ETL pipelines for the `hermes-agent` project.\n+\n+## Research Summary\n+\n+OpenGauss is designed as a Hybrid Transactional/Analytical Processing (HTAP) database, leveraging a \"fusion engine\" that supports both row-store (optimized for OLTP) and column-store (optimized for OLAP) models. Users explicitly choose the storage model at table creation. While OpenGauss provides a Cost-Based Optimizer (CBO) and various performance enhancements (vectorized executor, parallel query, adaptive compression, partitioning) to efficiently handle diverse workloads, the \"query routing\" between OLTP and OLAP characteristics is primarily at the table design level rather than an automatic, dynamic routing mechanism within a single hybrid table.\n+\n+For handling JSONL-like data, OpenGauss effectively utilizes the `JSONB` data type, which stores JSON in a parsed binary format for efficient querying and indexing. JSONL files can be imported using the `COPY` command, with each line inserted as a `JSONB` object into a columnar table. For optimal analytical performance, it is recommended to extract frequently queried fields from `JSONB` into dedicated, appropriately typed columns.\n+\n+## Questions Answered\n+\n+### 1. Can we run agent loop retro analysis directly against the operational DB?\n+**Yes, potentially.** OpenGauss's HTAP capabilities allow a single database instance to handle both OLTP and OLAP workloads. By designing tables with suitable storage (row-store for operational, column-store or `JSONB` with extracted fields for analytical data), retro analysis can be performed directly against the operational database. Careful table design is essential for performance.\n+\n+### 2. What is the performance impact of analytical queries on OLTP operations?\n+**Potential impact, but manageable with proper design and tuning.** The impact depends on:\n+* **Table Storage Choice:** Heavy analytical queries on row-store tables can affect OLTP performance. Utilizing column-store tables, partitioning, or extracting analytical data into separate columns minimizes this.\n+* **Query Optimization:** OpenGauss's CBO, vectorized execution, and parallel query features are designed to mitigate performance bottlenecks.\n+* **Resource Management:** Proper resource allocation, database tuning, and effective workload separation through schema design are crucial.\n+\n+### 3. Does the fusion engine correc
{"prompt": "[AutoLoRA] A-B-C-D Eval Test Matrix — Bare vs Sessions vs Curated vs Combined\n\n## A-B-C-D Eval Test Matrix\n\nSystematic comparison of four LoRA training strategies against the Hermes eval harness.\n\n### Variants\n\n**A) Bare hermes3:8b** — 0.551 composite (DONE)\nBaseline. No fine-tuning. This is the control.\n\n**B) LoRA trained on ~364 compressed real sessions**\nNoisy signal, real-world tool patterns. High volume but includes messy/incomplete interactions.\n\n**C) LoRA trained on 29 curated exemplars**\nPure values signal, hand-written gold standard. Tiny dataset but every example is intentional.\n\n**D) LoRA trained on sessions + curated combined**\nBest of both — volume from real sessions, quality anchoring from curated exemplars.\n\n### Eval Process\n\nEach variant runs through the same eval harness:\n- `run_eval.py` — structured capability tests\n- `run_vibes.py` — qualitative/vibes assessment\n- `compare.py` — side-by-side comparison report\n\n### Key Question\n\nDoes a tiny pristine dataset (C) outperform a large noisy dataset (B)? And does combining them (D) beat both?\n\n### Sacred Rule\n\n**Any variant that degrades pastoral care or crisis response is REJECTED.** No exceptions. Performance gains that compromise care quality are not gains.\n\n### Acceptance Criteria\n\n- [ ] All 4 variants (A, B, C, D) evaluated through full harness\n- [ ] Comparison report generated via compare.py\n- [ ] Winner identified and documented\n- [ ] Winner promoted to Ollama as active model\n- [ ] No regression in pastoral care or crisis response metrics", "chosen": "diff --git a/autolora/configs/train_8b_combined.yaml b/autolora/configs/train_8b_combined.yaml\nnew file mode 100644\nindex 00000000..894b0cd0\n--- /dev/null\n+++ b/autolora/configs/train_8b_combined.yaml\n@@ -0,0 +1,87 @@\n+# AutoLoRA — Training config for Variant D: Combined LoRA (8B)\n+#\n+# Variant: D — Combined LoRA\n+# Model: Hermes 3 8B Q4_K_M\n+# Ollama target tag: timmy-8b:combined\n+# Dataset: sessions + curated combined (real sessions + gold-standard exemplars)\n+#\n+# This is the combined variant. It trains on the full merged dataset: real\n+# compressed sessions (~364) plus curated gold-standard exemplars (29).\n+# The hypothesis is that curated exemplars raise the floor on persona quality\n+# while session data provides breadth and grounding in real usage patterns.\n+#\n+# Learning rate (1.5e-4) is between the sessions variant (2.0e-4) and the\n+# curated variant (1.0e-4) — a balanced rate for a mixed-quality dataset.\n+#\n+# Compare against:\n+# train_8b.yaml — generic baseline\n+# train_8b_sessions.yaml — variant B (sessions data only)\n+# train_8b_curated.yaml — variant C (curated exemplars only)\n+#\n+# Framework: mlx-lm (Apple Silicon LoRA training)\n+# Install: pip install mlx-lm\n+\n+# ── Model ─────────────────────────────────────────────────────────────────────\n+model:\n+ name: \"NousResearch/Hermes-3-Llama-3.1-8B\" # HuggingFace model for MLX training\n+ # Alternative: point to a local GGUF and convert, or use mlx-converted weights\n+ # mlx_path: \"~/autolora/base/hermes3-8b-mlx\"\n+\n+# ── Data ──────────────────────────────────────────────────────────────────────\n+# Combined dataset — sessions + curated merged into a single training set\n+data:\n+ train: \"~/autolora/data/combined_train.jsonl\"\n+ valid: \"~/autolora/data/combined_test.jsonl\"\n+ # Format: ShareGPT (list of {from: human/gpt, value: ...})\n+\n+# ── LoRA config ───────────────────────────────────────────────────────────────\n+lora:\n+ rank: 8 # Small rank — 8B model responds strongly
{"prompt": "[rescue] AutoLoRA v1 — MLX QLoRA training pipeline, rebased onto main\n\nSupersedes PR#33 (AutoLoRA v1) by rebasing onto current main and resolving conflicts.\n\n## What this includes\n\nFull MLX QLoRA training pipeline for Apple Silicon (hermes4.3:timmy-v1).\n\n- autolora/train_mlx.py - main training launcher\n- autolora/scripts/convert_data.py - convert JSONL to MLX chat format\n- autolora/scripts/fetch_base_model.py - download safetensors from mlx-community\n- autolora/scripts/fuse_and_convert.sh - fuse adapters + convert to GGUF\n- autolora/scripts/create_ollama_model.sh - build hermes4.3:timmy-v1\n- autolora/config/v1.yaml - training config: r=16, lr=2e-4, 1000 iters\n- autolora/training_logs/v1/.gitkeep\n- autolora/README.md - merged with mains README (benchmarking + training)\n\n## Conflict resolution\n\nOnly conflict was autolora/README.md (add/add). Resolved by merging both READMEs into a combined document covering Part 1 (base model setup), Part 2 (multi-tier benchmarking), and Part 3 (MLX QLoRA training pipeline).\n\nOriginal work by @rockachopa on claude/issue-26 branch.", "chosen": "diff --git a/autolora/README.md b/autolora/README.md\nindex a99c1c0f..3a3c2837 100644\n--- a/autolora/README.md\n+++ b/autolora/README.md\n@@ -1,20 +1,44 @@\n # AutoLoRA — Local Sovereign Training\n \n-Scripts for managing the Hermes 4.3 36B base model and LoRA adapter pipeline on Apple Silicon.\n+Scripts for managing the Hermes 4.3 model pipeline on Apple Silicon — both multi-tier benchmarking and MLX QLoRA fine-tuning.\n \n ## Directory Structure\n \n ```\n autolora/\n-├── base/ # GGUF model files (created at runtime, gitignored)\n-│ └── hermes-4_3_36b-Q4_K_M.gguf\n-├── transfer-hermes-gguf.sh # Step 1: VPS → Mac transfer via Tailscale rsync\n-├── Modelfile.hermes43 # Ollama model definition (ChatML, 8192 ctx)\n-├── import-to-ollama.sh # Step 2: Import GGUF into Ollama\n+├── configs/\n+│ ├── train_8b.yaml # r=8, higher LR (small model, fast learner)\n+│ ├── train_14b.yaml # r=16, standard\n+│ └── train_36b.yaml # r=16, conservative LR, tight memory\n+├── evals/\n+│ ├── v0-baseline/\n+│ │ ├── 8b/ # responses.json, scores.json, report.md\n+│ │ ├── 14b/\n+│ │ └── 36b/\n+│ └── v1/\n+│ └── ...\n+├── scripts/\n+│ ├── run_eval.py # Eval a single model tier\n+│ ├── compare_tiers.py # Cross-tier comparison report\n+│ ├── split_data.py # Train/test split utility\n+│ ├── convert_data.py # Convert JSONL to MLX chat format\n+│ ├── fetch_base_model.py # Download safetensors base model\n+│ └── fuse_and_convert.sh # Fuse LoRA adapters + convert to GGUF\n+├── config/\n+│ └── v1.yaml # MLX training hyperparameters\n+├── train_mlx.py # MLX QLoRA training launcher\n+├── run_full_cycle.py # Orchestration: train + eval all tiers\n+├── training_logs/ # Runtime logs (gitignored content)\n+├── base/ # GGUF model files (gitignored)\n+├── transfer-hermes-gguf.sh # VPS → Mac transfer via Tailscale\n+├── Modelfile.hermes43 # Ollama model definition (ChatML, 8192 ctx)\n+├── import-to-ollama.sh # Import GGUF into Ollama\n └── README.md\n ```\n \n-## Setup\n+---\n+\n+## Part 1: Base Model Setup (GGUF / Ollama)\n \n ### Step 1: Transfer GGUF from VPS\n \n@@ -53,23 +77,9 @@ ollama list\n ollama run hermes4.3:base \"Hello, who are you?\"\n ```\n \n-## Model Details\n+---\n \n-| Property | Value |\n-|----------|-------|\n-| Model | Hermes 4.3 36B |\n-| Quantization | Q4_K_M |\n-| Size | ~20GB |\n-| Context | 8192 tokens |\n-| Format | ChatML |\n-| Ollama tag | `hermes4.3:base` |\n-\n-## Memory Budget\n-\n-Q4_K_M for a 36B model uses approximately 2022GB of unified memory on Apple Silic
{"prompt": "[rescue] Groq/Kimi/Ollama fallback chain + vision API-key fix (sovereign branch)\n\nRescues `sovereign` branch and `fix/vision-api-key-fallback` branch.\n\n## What this includes\n\n**Commit 1: `feat: fallback chain with recovery — Groq, Kimi, local Ollama`**\n\nCascade DOWN through providers on rate limit/failure:\n Anthropic (primary) → Groq → Kimi → Local Ollama\n\nPeriodically probes back UP toward primary (every 5 successful calls). Full restore when primary recovers.\n\nChanges:\n- `run_agent.py`: chain cascade + recovery engine\n- `hermes_cli/auth.py`: Groq added to PROVIDER_REGISTRY (conflict resolved: ollama was already added by main)\n- `agent/auxiliary_client.py`: Groq default model (llama-3.3-70b-versatile)\n- `cli.py` + `gateway/run.py`: load chain (list) or legacy dict\n- `hermes_cli/config.py`: handle list format in config writer\n- `tests/test_fallback_model.py`: 37/37 passing (9 new chain tests + Groq credential test)\n\n**Commit 2: `fix: include API-key providers in vision auxiliary chain`**\n\nThe vision auxiliary client (`get_vision_auxiliary_client`) was missing `_resolve_api_key_provider` from its auto-detection chain. Users with only a direct API-key provider (Anthropic, Groq, Kimi) got `(None, None)` from the vision client while the text client worked fine.\n\n## Notes\n\nConflict in `hermes_cli/auth.py` was resolved by keeping both new providers (ollama from main + groq from sovereign). Both are independent entries.\n\nOriginal commits by @rockachopa (Alexander Whitestone) on sovereign branch, 2026-03-14.", "chosen": "diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py\nindex 13efa8db..4cfa1736 100644\n--- a/agent/auxiliary_client.py\n+++ b/agent/auxiliary_client.py\n@@ -53,6 +53,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {\n \"minimax\": \"MiniMax-M2.5-highspeed\",\n \"minimax-cn\": \"MiniMax-M2.5-highspeed\",\n \"anthropic\": \"claude-haiku-4-5-20251001\",\n+ \"groq\": \"llama-3.3-70b-versatile\",\n }\n \n # OpenRouter app attribution headers\n@@ -789,7 +790,7 @@ def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:\n # LLaVA, Pixtral, etc.) support vision — skipping them entirely\n # caused silent failures for local-only users.\n for try_fn in (_try_openrouter, _try_nous, _try_codex,\n- _try_custom_endpoint):\n+ _try_custom_endpoint, _resolve_api_key_provider):\n client, model = try_fn()\n if client is not None:\n return client, model\ndiff --git a/cli.py b/cli.py\nindex 93771174..11a6e3a1 100755\n--- a/cli.py\n+++ b/cli.py\n@@ -1258,9 +1258,18 @@ class HermesCLI:\n self._provider_require_params = pr.get(\"require_parameters\", False)\n self._provider_data_collection = pr.get(\"data_collection\")\n \n- # Fallback model config — tried when primary provider fails after retries\n- fb = CLI_CONFIG.get(\"fallback_model\") or {}\n- self._fallback_model = fb if fb.get(\"provider\") and fb.get(\"model\") else None\n+ # Fallback model chain — tried in order when primary provider fails.\n+ # Supports both legacy single-dict and new list-of-dicts format.\n+ fb_raw = CLI_CONFIG.get(\"fallback_model\")\n+ if isinstance(fb_raw, list):\n+ self._fallback_model = [\n+ entry for entry in fb_raw\n+ if isinstance(entry, dict) and entry.get(\"provider\") and entry.get(\"model\")\n+ ] or None\n+ elif isinstance(fb_raw, dict) and fb_raw.get(\"provider\") and fb_raw.get(\"model\"):\n+ self._fallback_model = fb_raw\n+ else:\n+ self._fallback_model = None\n \n # AutoLoRA live compression config — post-session hook (optional, disabled by default)\n _autolora_cfg = CLI_CONFIG.get(\"autolora\", {})\ndiff --git a/gateway/run.py b/gateway/run.py\nindex 221f8f91..b1b45962 100644\n--- a/gateway/run.py\n+++ b/gateway/run.py\n@@ -563,11 +563,11 @@ class GatewayRunner:\n return {}\n
{"prompt": "[AutoLoRA] Audit Timmy-time-dashboard for Hermes harness duplication\n\n## Audit Timmy-time-dashboard for Hermes Harness Duplication\n\nPer rockachopa/Timmy-time-dashboard#1215 comment — the dashboard repo has legacy code that rebuilds things the Hermes harness already provides:\n\n- `tools.py`\n- `memory_system.py`\n- `thinking.py`\n\n### Task\n\nAudit the Timmy-time-dashboard codebase and identify all components that duplicate Hermes agent functionality.\n\nFor each duplicated component, recommend:\n- **Kill** — remove entirely, Hermes covers it\n- **Keep** — dashboard-specific, not actually a duplicate\n- **Redirect** — point to Hermes implementation instead of maintaining a fork\n\n### Why This Matters\n\nThis prevents wasted cycles building harnesses we don't need. Every duplicated module is a maintenance burden and a divergence risk.\n\n### Reference\n\nTimmy-time-dashboard#1215#issuecomment-9115", "chosen": "diff --git a/docs/timmy-dashboard-duplication-audit.md b/docs/timmy-dashboard-duplication-audit.md\nnew file mode 100644\nindex 00000000..b397ae26\n--- /dev/null\n+++ b/docs/timmy-dashboard-duplication-audit.md\n@@ -0,0 +1,122 @@\n+# Audit: Timmy-time-dashboard vs Hermes Harness Duplication\n+\n+**Refs:** rockachopa/Timmy-time-dashboard#1215#issuecomment-9115, hermes-agent#40\n+**Date:** 2026-03-23\n+**Purpose:** Identify dashboard components that duplicate Hermes harness functionality and recommend Kill / Keep / Redirect for each.\n+\n+---\n+\n+## Background\n+\n+Comment #9115 on Timmy-time-dashboard#1215 is explicit:\n+\n+> \"Kill anything that we get for free already from the hermes harness. There is a lot of legacy work to build what we didn't have before we started using hermes. We should stop doing any work that is already done in hermes harness.\"\n+\n+The dashboard's `src/timmy/` directory contains three module families flagged as likely duplicates of Hermes:\n+\n+- `tools.py` (now `tools/` package after PR #1221)\n+- `memory_system.py`\n+- `thinking.py`\n+\n+---\n+\n+## Component Analysis\n+\n+### 1. `src/timmy/tools/` — Tool System\n+\n+**What it does:**\n+Provides a full tool-dispatch layer using the `agno` framework. Includes file I/O, shell execution, Python execution, web fetch, memory search/write/forget, delegation, introspection, calculator, and per-agent toolkit factories.\n+\n+**What Hermes provides:**\n+`tools/registry.py` — central registry with the same capabilities: file operations (`tools/file_tools.py`), shell/terminal (`tools/terminal_tool.py`), web search/fetch (`tools/web_tools.py`), memory (`tools/memory_tool.py`), delegation (`tools/delegate_tool.py`), and a full tool catalog.\n+\n+**Verdict: REDIRECT (partial Kill)**\n+\n+| Sub-component | Verdict | Reason |\n+|---|---|---|\n+| `calculator` | **Kill** | Hermes provides no calculator but this is trivial — move to a shared util or use Python directly |\n+| `web_fetch` | **Kill** | Hermes `tools/web_tools.py` (Firecrawl-backed) covers this with LLM summarization |\n+| `shell` / `python` / file ops | **Kill** | Hermes `tools/terminal_tool.py` + `tools/file_tools.py` cover these |\n+| `_register_memory_tools` | **Kill** | Hermes `tools/memory_tool.py` covers memory search/write/forget |\n+| `_register_delegation_tools` | **Keep** | Dashboard's swarm delegation (`delegate_to_kimi`, `list_swarm_agents`) is dashboard-specific |\n+| `_register_gematria_tool` | **Keep** | Dashboard-specific feature, no Hermes equivalent |\n+| `AGENT_TOOLKITS` registry | **Keep** | The per-agent toolkit assignment (echo/mace/helm/seer/forge/quill) is dashboard-specific |\n+| `create_full_toolkit` | **Redirect** | Replace the core tool registrations with calls into Hermes tool modules; keep wrapper only for dashboard-specific extras |\n+\n+**Action:** Strip the core tool registrations (`_register_core_tools`, `_register_web_fetch_tool`, `_register_memory_tools`) and wire them to the Hermes equivalents. Keep the agno wrapping only for tools Hermes doesn't have.\n+\n+---\n+\n+### 2. `src/timmy/memory_system.py` — Memory System\n+\n
{"prompt": "[AutoLoRA] Multi-tier model benchmarking — 8B / 14B / 36B size classes\n\n## Summary\nThe eval harness must benchmark multiple model size classes through the same pipeline. The 36B model owns the full machine — we need smaller tiers for daily use with headroom. Each tier gets base eval, LoRA training, and post-training eval. The goal: find the sweet spot where small + personalized beats big + generic.\n\n## Model Tiers (36GB M3 Max)\n\n### Tier 1 — POCKET (~5GB, leaves 30GB free)\n- **Hermes 3 8B Q4_K_M** — already in Ollama as `hermes3:8b`\n- Use case: always-on reflex brain, quick tasks, post-session compression hook\n- Can run alongside other processes, IDE, browser, etc.\n- Training: fastest, fits entirely in memory with huge headroom\n\n### Tier 2 — WORKHORSE (~9-12GB, leaves 18-24GB free)\n- **Target: Hermes 4.3 14B Q4_K_M** (if NousResearch publishes one)\n- **Fallback: Qwen 2.5 14B** (already in Ollama, 9GB)\n- Use case: daily driver for most hermes sessions, tool use, planning\n- Training: comfortable, plenty of memory for QLoRA\n\n### Tier 3 — HEAVY (~20GB, leaves 16GB free)\n- **Hermes 4.3 36B Q4_K_M** — downloading now\n- Use case: deep architecture work, long context, complex reasoning\n- Training: tight — must unload model first, train, then reload\n- NOT the daily driver — spun up for heavy sessions, spun down after\n\n## Eval Matrix\n\nEach cell = one eval run through the same harness (#19):\n\n| Model | Base Score | Adapter Score | Delta | Verdict |\n|-------|-----------|--------------|-------|---------|\n| hermes3:8b | ? | ? | ? | ? |\n| workhorse:14b | ? | ? | ? | ? |\n| hermes4.3:36b | ? | ? | ? | ? |\n\nPlus cross-tier comparison:\n- Does 8B+adapter beat naked 36B? (sovereignty thesis)\n- Does 14B+adapter match 36B+adapter? (if so, use 14B daily)\n- Where does each tier fail? (capability floor per size class)\n\n## Requirements\n1. Extend `run_eval.py` to accept any model name and output to tier-specific dirs\n2. Add `compare_tiers.py` — compares across size classes, not just before/after\n3. Directory structure:\n```\nautolora/evals/\n v0-baseline/\n 8b/scores.json\n 14b/scores.json\n 36b/scores.json\n v1/\n 8b/scores.json\n 14b/scores.json\n 36b/scores.json\n tier_comparison_v1.md\n```\n4. Training configs per tier (different rank/LR may be needed):\n```\nautolora/configs/\n train_8b.yaml # r=8, higher LR (small model learns faster)\n train_14b.yaml # r=16, standard\n train_36b.yaml # r=16, conservative LR\n```\n5. Single orchestration script: `run_full_cycle.py --tiers 8b,14b,36b`\n - Trains all tiers sequentially\n - Runs eval on each\n - Produces unified comparison report\n\n## The Sovereignty Thesis\nThe most exciting result would be: 8B + your adapter, trained on YOUR conversations, outperforms naked 36B on YOUR tasks. That means sovereignty is not just principled — it's practically superior. A small model that knows you beats a big model that doesn't.\n\n## Dependencies\n- #19 (eval harness)\n- #18 (data export + split)\n- #25 (36B model transfer)\n- hermes3:8b already available\n- 14B model TBD (check NousResearch releases)\n\n## Acceptance Criteria\n- [ ] Eval harness works across all three tiers\n- [ ] Training configs exist per tier\n- [ ] Cross-tier comparison report generated automatically\n- [ ] Orchestration script runs full cycle for all tiers\n- [ ] Results committed per tier per version\n\n## Notes\n- Start with 8B and 36B. Add 14B when a good candidate is identified.\n- The daily driver decision comes FROM these benchmarks — don't assume 36B is best.\n- Google Cloud $100 credits can accelerate training for larger tiers later.", "chosen": "diff --git a/autolora/README.md b/autolora/README.md\nindex 5670c6af..a99c1c0f 100644\n--- a/autolora/README.md\n+++ b/autolora/README.md\n@@ -69,8 +69,58 @@ ollama run hermes4.3:base \"Hello, who are you?\"\n Q4_K_M for a 36B model uses approximately 2022GB of unified memory on Apple Silicon.\n This fits within a 36GB M3/M4 Max budget with room for OS + context.\n \n+##
{"prompt": "[AutoLoRA] Wire post-session trajectory compression hook into Timmy\n\n## Summary\nAfter every session, Timmy compresses his own conversation into a training-ready trajectory. No batch job needed for future sessions. The participant IS the compressor.\n\n## Architecture\n- At session end, Timmy has full context of what happened\n- Timmy writes a compressed trajectory: what mattered, what was noise\n- Compressed trajectory saved alongside raw session data\n- Format matches the batch compressor output (ShareGPT JSONL, token budget)\n\n## Requirements\n1. Post-session hook in hermes agent that triggers compression\n2. Timmy summarizes his own middle turns (he knows what mattered)\n3. Output: `~/autolora/data/live/SESSIONID_compressed.jsonl`\n4. Protected turns: system prompt, first exchange, last 4 turns, tool calls\n5. Middle turns: Timmy writes a summary paragraph replacing them\n6. Must stay within 15,250 token budget\n7. Hook is optional / configurable (don't slow down sessions if disabled)\n\n## Why This Is Better Than Batch\n- Timmy has FULL CONTEXT at compression time — knows what mattered\n- No separate LLM call needed\n- Real-time — training data always fresh\n- Like a doctor writing chart notes vs. a clerk summarizing later\n\n## Acceptance Criteria\n- [ ] Post-session hook fires after session end\n- [ ] Compressed trajectory matches expected format\n- [ ] Configurable: can be disabled in config.yaml\n- [ ] Live compressed sessions integrate with training pipeline\n- [ ] No external API calls\n\n## Notes\n- P2 priority — batch compression handles the backlog first\n- This is the steady-state architecture going forward", "chosen": "diff --git a/agent/live_compressor.py b/agent/live_compressor.py\nnew file mode 100644\nindex 00000000..33d0b192\n--- /dev/null\n+++ b/agent/live_compressor.py\n@@ -0,0 +1,277 @@\n+\"\"\"AutoLoRA live trajectory compressor — post-session hook.\n+\n+After each session, compresses the trajectory to a training-ready JSONL file\n+in ~/autolora/data/live/ without any external API calls.\n+\n+Compression strategy (mirrors the batch TrajectoryCompressor):\n+ - Protect: system prompt, first human turn, first assistant turn, first tool call\n+ - Protect: last N turns (default 4)\n+ - Protect: all tool_call / tool turns (preserve executable context)\n+ - Middle turns: replaced with a single human summary message\n+ - Token budget: 15,250 tokens (character-approximated at 4 chars/token)\n+\n+The summary is derived from the actual conversation content — no LLM call needed.\n+Timmy has full context at compression time: tool names, actions, and outcomes are\n+extracted from the turns he already wrote.\n+\"\"\"\n+\n+import json\n+import logging\n+import os\n+import re\n+from dataclasses import dataclass\n+from datetime import datetime\n+from pathlib import Path\n+from typing import Any, Dict, List, Optional, Tuple\n+\n+logger = logging.getLogger(__name__)\n+\n+# Characters-per-token approximation (fast, no tokenizer dependency)\n+_CHARS_PER_TOKEN = 4\n+\n+\n+@dataclass\n+class LiveCompressConfig:\n+ \"\"\"Configuration for post-session live compression.\"\"\"\n+ enabled: bool = False\n+ output_dir: str = \"~/autolora/data/live\"\n+ target_max_tokens: int = 15_250\n+ protect_last_n_turns: int = 4\n+ # Protected roles/types — always kept verbatim\n+ protect_system: bool = True\n+ protect_first_human: bool = True\n+ protect_first_assistant: bool = True\n+ protect_first_tool: bool = True\n+ # Whether to skip compression when already under budget\n+ skip_under_target: bool = True\n+\n+ @classmethod\n+ def from_config_dict(cls, cfg: Dict[str, Any]) -> \"LiveCompressConfig\":\n+ \"\"\"Build from the ``autolora`` section of config.yaml.\"\"\"\n+ obj = cls()\n+ obj.enabled = bool(cfg.get(\"enabled\", obj.enabled))\n+ obj.output_dir = str(cfg.get(\"output_dir\", obj.output_dir))\n+ obj.target_max_tokens = int(cfg.get(\"target_max_tokens\", obj.target_max_tokens))\n+ obj.protect_last_n_turns = int(cfg
{"prompt": "[AutoLoRA] Batch compress training set backlog (3,200 sessions)\n\n## Summary\nRun the (now-local) trajectory compressor against the full training set to produce compressed JSONL ready for LoRA training.\n\n## Dependencies\n- #22 (compressor patched for local Ollama)\n- #18 (training set split and available)\n\n## Requirements\n1. Run trajectory_compressor.py against train_set.jsonl using local Ollama\n2. Target: 15,250 tokens per session (default budget)\n3. Output: `~/autolora/data/compressed_train.jsonl`\n4. Log compression stats: sessions processed, avg compression ratio, failures\n5. Sessions that are already under budget pass through unchanged\n\n## Estimates\n- 3,200 sessions × ~30 sec summarization each = ~27 hours at hermes3:8b speed\n- Can run overnight or across multiple nights\n- Script must be resumable (track which sessions are done)\n\n## Acceptance Criteria\n- [ ] compressed_train.jsonl exists with all training sessions\n- [ ] Compression log with stats committed\n- [ ] Resumable — can stop and restart without reprocessing\n- [ ] All processing done via local Ollama (verified no external calls)\n\n## Notes\n- This is the last time we batch-compress the backlog. Future sessions will self-compress (see post-session hook issue).\n- Run in tmux so it survives terminal disconnects.", "chosen": "diff --git a/scripts/batch_compress_autolora.py b/scripts/batch_compress_autolora.py\nnew file mode 100644\nindex 00000000..b194f92b\n--- /dev/null\n+++ b/scripts/batch_compress_autolora.py\n@@ -0,0 +1,351 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+Batch Compress AutoLoRA Training Set (Resumable)\n+\n+Compresses ~/autolora/data/train_set.jsonl → ~/autolora/data/compressed_train.jsonl\n+using local Ollama (hermes3:8b). Tracks progress so it can be stopped and restarted\n+without reprocessing already-completed sessions.\n+\n+Usage:\n+ # Run from the repo root\n+ OLLAMA_API_KEY=ollama python scripts/batch_compress_autolora.py\n+\n+ # Custom paths\n+ OLLAMA_API_KEY=ollama python scripts/batch_compress_autolora.py \\\\\n+ --input=~/autolora/data/train_set.jsonl \\\\\n+ --output=~/autolora/data/compressed_train.jsonl \\\\\n+ --config=~/autolora/configs/compress_local.yaml\n+\n+ # Dry-run: show token stats without compressing\n+ OLLAMA_API_KEY=ollama python scripts/batch_compress_autolora.py --dry_run\n+\n+ # Start fresh (ignore existing progress)\n+ OLLAMA_API_KEY=ollama python scripts/batch_compress_autolora.py --reset\n+\n+Resumability:\n+ Progress is tracked in <output>.state.json — lists IDs of completed sessions.\n+ Interrupt at any time (Ctrl+C); re-run to continue from where it left off.\n+ Safe to run in tmux so it survives terminal disconnects.\n+\"\"\"\n+\n+import asyncio\n+import json\n+import os\n+import sys\n+import time\n+from datetime import datetime\n+from pathlib import Path\n+\n+import fire\n+\n+\n+DEFAULT_INPUT = Path.home() / \"autolora\" / \"data\" / \"train_set.jsonl\"\n+DEFAULT_OUTPUT = Path.home() / \"autolora\" / \"data\" / \"compressed_train.jsonl\"\n+DEFAULT_CONFIG = Path.home() / \"autolora\" / \"configs\" / \"compress_local.yaml\"\n+DEFAULT_LOG = Path.home() / \"autolora\" / \"logs\" / \"compression_log.json\"\n+\n+\n+def _load_state(state_path: Path) -> dict:\n+ \"\"\"Load progress state; returns empty state if not found.\"\"\"\n+ if state_path.exists():\n+ with open(state_path) as f:\n+ return json.load(f)\n+ return {\"completed_ids\": [], \"stats\": {}}\n+\n+\n+def _save_state(state_path: Path, state: dict):\n+ \"\"\"Atomically save progress state.\"\"\"\n+ tmp = state_path.with_suffix(\".tmp\")\n+ with open(tmp, \"w\") as f:\n+ json.dump(state, f, indent=2)\n+ tmp.replace(state_path)\n+\n+\n+def _count_tokens_estimate(text: str) -> int:\n+ \"\"\"Rough token estimate (chars / 4) when tokenizer unavailable.\"\"\"\n+ return len(text) // 4\n+\n+\n+def _session_tokens(session: dict) -> int:\n+ \"\"\"Count tokens for a session using char estimate.\"\"\"\n+ convs = session.
{"prompt": "[AutoLoRA] Patch trajectory_compressor for local Ollama (remove OpenRouter dependency)\n\n## Summary\nThe trajectory_compressor.py currently requires OPENROUTER_API_KEY to call an external LLM for summarizing middle turns. Patch it to support local Ollama models instead. No data leaves the Mac.\n\n## Current Behavior\n- Uses OpenRouter API (google/gemini-3-flash-preview) for summarization\n- Requires OPENROUTER_API_KEY env var\n- Sends conversation data to external API\n\n## Required Behavior\n- Support `--provider ollama --model hermes3:8b` (or any local model)\n- Fall back to Ollama localhost:11434 by default\n- Keep OpenRouter as an option but NOT the default\n- Config in `datagen-config-examples/trajectory_compression.yaml` updated\n\n## Implementation Notes\n- The compressor uses the LLM to summarize middle turns that get crushed\n- Local models (hermes3:8b at 57t/s) are fast enough for summarization\n- Ollama exposes OpenAI-compatible API at localhost:11434/v1\n- May just need to swap the base_url and remove the API key requirement\n\n## Acceptance Criteria\n- [ ] `trajectory_compressor.py` works with `--provider ollama`\n- [ ] Default config points to local Ollama, not OpenRouter\n- [ ] No OPENROUTER_API_KEY needed for default operation\n- [ ] Compression quality verified on sample sessions\n- [ ] Zero data transmitted externally\n\n## Sovereignty Note\nThis is a hard requirement. Training data NEVER leaves the machine. The compressor was designed for cloud convenience. We're making it sovereign.", "chosen": "diff --git a/datagen-config-examples/trajectory_compression.yaml b/datagen-config-examples/trajectory_compression.yaml\nindex c5b92a97..48a395e6 100644\n--- a/datagen-config-examples/trajectory_compression.yaml\n+++ b/datagen-config-examples/trajectory_compression.yaml\n@@ -38,24 +38,30 @@ protected_turns:\n # This ensures the model's final actions and conclusions are preserved\n last_n_turns: 4\n \n-# LLM settings for generating summaries (OpenRouter only)\n+# LLM settings for generating summaries\n+# Default: local Ollama (no API key required, no data leaves the machine)\n+# To use OpenRouter instead: set provider: openrouter, model: google/gemini-flash-1.5, api_key_env: OPENROUTER_API_KEY\n summarization:\n- # Model to use for summarization (should be fast and cheap)\n- # Using OpenRouter model path format\n- model: \"google/gemini-3-flash-preview\"\n- \n- # OpenRouter API settings\n- base_url: \"https://openrouter.ai/api/v1\"\n- \n- # Environment variable containing OpenRouter API key\n- api_key_env: \"OPENROUTER_API_KEY\"\n- \n+ # Provider: ollama (local, default) or openrouter (cloud)\n+ provider: \"ollama\"\n+\n+ # Model to use for summarization\n+ # For Ollama: any locally-pulled model (hermes3:8b runs at ~57 t/s on Apple Silicon)\n+ # For OpenRouter: e.g. \"google/gemini-flash-1.5\"\n+ model: \"hermes3:8b\"\n+\n+ # Ollama base URL — override via OLLAMA_BASE_URL env var if needed\n+ base_url: \"http://localhost:11434/v1\"\n+\n+ # API key env var (not required for Ollama; set to OPENROUTER_API_KEY for cloud)\n+ # api_key_env: \"OPENROUTER_API_KEY\"\n+\n # Temperature for summarization (lower = more deterministic)\n temperature: 0.3\n- \n+\n # Max retries for API failures\n max_retries: 3\n- \n+\n # Delay between retries (seconds)\n retry_delay: 2\n \ndiff --git a/trajectory_compressor.py b/trajectory_compressor.py\nindex ef81d6e2..2434a448 100644\n--- a/trajectory_compressor.py\n+++ b/trajectory_compressor.py\n@@ -44,7 +44,7 @@ from datetime import datetime\n import fire\n from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn, TimeElapsedColumn, TimeRemainingColumn\n from rich.console import Console\n-from hermes_constants import OPENROUTER_BASE_URL\n+OLLAMA_BASE_URL = os.getenv(\"OLLAMA_BASE_URL\", \"http://localhost:11434/v1\")\n \n # Load environment variables\n from dotenv import load_dotenv\n@@ -69,10 +69,11 @@ class CompressionConfig:\n protect_first_tool: bool = True\n protect_last_n_turns: int =
{"prompt": "[AutoLoRA] Build automated replay eval harness\n\n## Summary\nBuild `autolora/eval/run_eval.py` — an automated evaluation harness that replays held-out test sessions through any Ollama model and scores the outputs against the original Claude responses.\n\n## Requirements\n### run_eval.py\n- Takes: model name (Ollama), test_set.jsonl path, output path\n- For each test session:\n - Feed system prompt + user messages + tool outputs to the model\n - Capture model responses at each turn\n - Compare against original Claude responses\n- Score each session on:\n - **Tool selection accuracy**: did it pick the same tools? (exact match %)\n - **Response length ratio**: model response length / original length (target: 0.8-1.2)\n - **Format compliance**: plain text, no markdown headers, terminal-friendly (regex checks)\n - **Tone match**: brevity, directness (heuristic or embedding similarity)\n - **Task completion**: did it reach equivalent conclusion? (final-turn similarity)\n- Output: `scores.json` with per-session and aggregate metrics\n\n### compare.py\n- Takes two score files (baseline vs candidate)\n- Outputs a diff report: which metrics improved, degraded, unchanged\n- Prints a clear PASS/FAIL/MIXED verdict\n- Generates `comparison_report.md`\n\n## Acceptance Criteria\n- [ ] `run_eval.py` runs end-to-end against Ollama models\n- [ ] Scoring is deterministic and reproducible\n- [ ] `compare.py` produces human-readable diff report\n- [ ] Both scripts committed to `autolora/eval/`\n- [ ] Scripts work with `hermes4.3:base` as first target\n\n## Notes\n- This harness runs BEFORE and AFTER every training cycle. It's the gatekeeper.\n- Must handle sessions where tool calls are present (mock or skip tool execution)\n- Keep it simple first — embedding similarity can be v2.", "chosen": "diff --git a/autolora/__init__.py b/autolora/__init__.py\nnew file mode 100644\nindex 00000000..a32113ad\n--- /dev/null\n+++ b/autolora/__init__.py\n@@ -0,0 +1 @@\n+# AutoLoRA — Automated LoRA training pipeline for Hermes Agent\ndiff --git a/autolora/eval/__init__.py b/autolora/eval/__init__.py\nnew file mode 100644\nindex 00000000..776bbafd\n--- /dev/null\n+++ b/autolora/eval/__init__.py\n@@ -0,0 +1 @@\n+# AutoLoRA evaluation harness\ndiff --git a/autolora/eval/compare.py b/autolora/eval/compare.py\nnew file mode 100644\nindex 00000000..76b79c8e\n--- /dev/null\n+++ b/autolora/eval/compare.py\n@@ -0,0 +1,307 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+AutoLoRA Score Comparator\n+\n+Compares two score files (baseline vs candidate) produced by run_eval.py.\n+Outputs a diff report and prints a PASS/FAIL/MIXED verdict.\n+\n+Usage:\n+ python compare.py --baseline baseline_scores.json --candidate candidate_scores.json\n+ python compare.py --baseline baseline_scores.json --candidate candidate_scores.json --output comparison_report.md\n+\"\"\"\n+\n+import json\n+import sys\n+import argparse\n+from pathlib import Path\n+from typing import Any\n+\n+# ---------------------------------------------------------------------------\n+# Metric metadata\n+# ---------------------------------------------------------------------------\n+\n+# Higher is better for all metrics except response_length_ratio (target 0.81.2)\n+HIGHER_IS_BETTER = {\n+ \"tool_selection_accuracy\": True,\n+ \"response_length_ratio\": None, # target range, not directional\n+ \"response_length_score\": True,\n+ \"format_compliance\": True,\n+ \"tone_match\": True,\n+ \"task_completion\": True,\n+ \"aggregate\": True,\n+}\n+\n+# Threshold for \"significant\" change (absolute)\n+SIGNIFICANCE_THRESHOLD = 0.02\n+\n+\n+def load_scores(path: str) -> dict[str, Any]:\n+ p = Path(path)\n+ if not p.exists():\n+ print(f\"ERROR: Score file not found: {path}\", file=sys.stderr)\n+ sys.exit(1)\n+ with open(p, \"r\", encoding=\"utf-8\") as f:\n+ return json.load(f)\n+\n+\n+# ---------------------------------------------------------------------------\n+# Comparison logic\n+# ---------------------------------------------------------------
{"prompt": "[AutoLoRA] Export sessions + split train/test sets\n\n## Summary\nExport all 3,367+ CLI sessions from state.db to ShareGPT JSONL format, then split into training and held-out test sets.\n\n## Requirements\n- Run `hermes sessions export ~/autolora/data/raw.jsonl --source cli`\n- Split chronologically: last ~5% (167 sessions) → `test_set.jsonl`, rest → `train_set.jsonl`\n- Write a split script (`autolora/scripts/split_data.py`) that:\n - Takes raw export + split ratio as args\n - Splits by chronological order (most recent = test)\n - Records session IDs in a manifest file for reproducibility\n - Outputs: `train_set.jsonl`, `test_set.jsonl`, `manifest.json`\n- Manifest tracks: session count, message count, sha256 hash of each file, split date\n- Commit configs and scripts to `autolora/` in the repo\n\n## Acceptance Criteria\n- [ ] Raw export exists at `~/autolora/data/raw.jsonl`\n- [ ] Split script is repeatable and deterministic\n- [ ] Test set is NEVER used for training (enforced by manifest)\n- [ ] Manifest committed to `autolora/evals/manifest.json`\n\n## Directory Structure\n```\n~/autolora/\n data/\n raw.jsonl\n train_set.jsonl\n test_set.jsonl\n manifest.json\n```\n\n## Notes\n- Test set stays frozen after initial split. New sessions accumulate into training only.\n- Future re-exports append to training, test set only changes between major versions.", "chosen": "diff --git a/autolora/evals/.gitkeep b/autolora/evals/.gitkeep\nnew file mode 100644\nindex 00000000..e69de29b\ndiff --git a/autolora/scripts/split_data.py b/autolora/scripts/split_data.py\nnew file mode 100644\nindex 00000000..a0a45fce\n--- /dev/null\n+++ b/autolora/scripts/split_data.py\n@@ -0,0 +1,194 @@\n+#!/usr/bin/env python3\n+\"\"\"\n+AutoLoRA — train/test split utility.\n+\n+Reads a raw JSONL export (from `hermes sessions export --format sharegpt`)\n+and splits it chronologically into training and held-out test sets.\n+\n+Usage:\n+ python split_data.py <input.jsonl> <output_dir> [--test-ratio 0.05]\n+\n+Outputs (all in <output_dir>):\n+ train_set.jsonl — training sessions (oldest sessions)\n+ test_set.jsonl — held-out test sessions (newest sessions)\n+ manifest.json — reproducibility manifest with session IDs + file hashes\n+\n+Rules:\n+ - Split is always chronological: newest sessions go to the test set.\n+ - The test set is frozen after initial split; new sessions add to training only.\n+ - Re-running with the same input produces identical output (deterministic).\n+\"\"\"\n+\n+import argparse\n+import hashlib\n+import json\n+import pathlib\n+import sys\n+from datetime import datetime, timezone\n+\n+\n+def _sha256(file_path: pathlib.Path) -> str:\n+ \"\"\"Return the SHA-256 hex digest of a file.\"\"\"\n+ h = hashlib.sha256()\n+ with open(file_path, \"rb\") as f:\n+ for chunk in iter(lambda: f.read(65536), b\"\"):\n+ h.update(chunk)\n+ return h.hexdigest()\n+\n+\n+def _count_messages(sessions: list) -> int:\n+ \"\"\"Count total conversation turns across all sessions.\"\"\"\n+ total = 0\n+ for s in sessions:\n+ total += len(s.get(\"conversations\", []))\n+ return total\n+\n+\n+def split(\n+ input_path: pathlib.Path,\n+ output_dir: pathlib.Path,\n+ test_ratio: float = 0.05,\n+) -> dict:\n+ \"\"\"\n+ Split sessions into train/test sets.\n+\n+ Parameters\n+ ----------\n+ input_path : path to the source JSONL (ShareGPT or raw format)\n+ output_dir : directory where output files are written\n+ test_ratio : fraction of sessions to reserve as test set (default 0.05)\n+\n+ Returns\n+ -------\n+ dict — the manifest written to manifest.json\n+ \"\"\"\n+ if not 0 < test_ratio < 1:\n+ raise ValueError(f\"test_ratio must be between 0 and 1, got {test_ratio}\")\n+\n+ # ── Load ──────────────────────────────────────────────────────────────────\n+ sessions = []\
{"prompt": "[AutoLoRA] Transfer Hermes 4.3 36B GGUF from VPS to Mac + import to Ollama\n\n## Summary\nHermes 4.3 36B Q4_K_M GGUF (~20GB) is downloaded on the VPS. Transfer to Mac via Tailscale rsync and import into Ollama.\n\n## Status\n- [x] Downloaded to VPS via `huggingface-cli download`\n- [ ] Transfer to Mac\n- [ ] Import into Ollama\n\n## Steps\n1. Find the file on VPS: `find ~/.cache/huggingface -name \"*.gguf\" 2>/dev/null`\n2. rsync over Tailscale: `rsync -avP --partial vps:~/.cache/huggingface/.../hermes-4_3_36b-Q4_K_M.gguf ~/autolora/base/`\n3. Create Modelfile (ChatML template, 8192 ctx)\n4. `ollama create hermes4.3:base -f Modelfile`\n5. Verify: `ollama run hermes4.3:base \"Hello, who are you?\"`\n\n## Acceptance Criteria\n- [ ] GGUF file on Mac at `~/autolora/base/hermes-4_3_36b-Q4_K_M.gguf`\n- [ ] `hermes4.3:base` available in `ollama list`\n- [ ] Model responds correctly with ChatML format\n- [ ] Runs within 36GB unified memory budget\n\n## Notes\n- rsync --partial makes this resumable if hotspot drops\n- ~20GB over Tailscale should be manageable\n- This is the frozen skeleton — never modified, only wrapped with adapters", "chosen": "diff --git a/autolora/.gitignore b/autolora/.gitignore\nnew file mode 100644\nindex 00000000..584f98e6\n--- /dev/null\n+++ b/autolora/.gitignore\n@@ -0,0 +1,9 @@\n+# GGUF model files — large binaries, never commit\n+base/\n+*.gguf\n+*.bin\n+\n+# Training artifacts\n+checkpoints/\n+adapters/\n+runs/\ndiff --git a/autolora/Modelfile.hermes43 b/autolora/Modelfile.hermes43\nnew file mode 100644\nindex 00000000..6364ed17\n--- /dev/null\n+++ b/autolora/Modelfile.hermes43\n@@ -0,0 +1,32 @@\n+# Hermes 4.3 36B — Frozen Base Skeleton\n+# ChatML format, 8192 context window\n+# Used as the base model for AutoLoRA adapter training and inference.\n+# DO NOT modify this file — it defines the frozen skeleton.\n+# Adapters are wrapped around this model, never baked in.\n+\n+FROM ~/autolora/base/hermes-4_3_36b-Q4_K_M.gguf\n+\n+# ChatML prompt template\n+TEMPLATE \"\"\"<|im_start|>system\n+{{ .System }}<|im_end|>\n+<|im_start|>user\n+{{ .Prompt }}<|im_end|>\n+<|im_start|>assistant\n+\"\"\"\n+\n+# Stop tokens for ChatML\n+PARAMETER stop \"<|im_start|>\"\n+PARAMETER stop \"<|im_end|>\"\n+PARAMETER stop \"<|endoftext|>\"\n+\n+# Context window — 8192 tokens\n+PARAMETER num_ctx 8192\n+\n+# Inference defaults (conservative for 36B on 36GB unified memory)\n+PARAMETER temperature 0.7\n+PARAMETER top_p 0.9\n+PARAMETER top_k 40\n+PARAMETER repeat_penalty 1.1\n+\n+# System prompt — identity grounding\n+SYSTEM \"\"\"You are Hermes, a helpful, harmless, and honest AI assistant built on the Hermes 4.3 architecture. You engage thoughtfully with users and provide accurate, nuanced responses.\"\"\"\ndiff --git a/autolora/README.md b/autolora/README.md\nnew file mode 100644\nindex 00000000..5670c6af\n--- /dev/null\n+++ b/autolora/README.md\n@@ -0,0 +1,76 @@\n+# AutoLoRA — Local Sovereign Training\n+\n+Scripts for managing the Hermes 4.3 36B base model and LoRA adapter pipeline on Apple Silicon.\n+\n+## Directory Structure\n+\n+```\n+autolora/\n+├── base/ # GGUF model files (created at runtime, gitignored)\n+│ └── hermes-4_3_36b-Q4_K_M.gguf\n+├── transfer-hermes-gguf.sh # Step 1: VPS → Mac transfer via Tailscale rsync\n+├── Modelfile.hermes43 # Ollama model definition (ChatML, 8192 ctx)\n+├── import-to-ollama.sh # Step 2: Import GGUF into Ollama\n+└── README.md\n+```\n+\n+## Setup\n+\n+### Step 1: Transfer GGUF from VPS\n+\n+```bash\n+# Default: uses 'vps' SSH host from ~/.ssh/config\n+./autolora/transfer-hermes-gguf.sh\n+\n+# Or specify VPS hostname/IP\n+./autolora/transfer-hermes-gguf.sh my-vps-hostname\n+```\n+\n+Requires:\n+- Tailscale up on both machines\n+- VPS configured as `vps` in `~/.ssh/config` (or pass hostname as argument)\n+- `rsync` installed (`brew install rsync`)\n+- ~22GB free disk space at `~/autolora/base/`\n+\n+The transfer is resumable — safe to re-run if connection drops.\n+\n+#
{"prompt": "[AutoLoRA] Define vibes eval prompt set\n\n## Summary\nHand-pick 10-15 representative prompts that capture what Timmy should be good at. These form the qualitative benchmark — run through base and adapter side by side.\n\n## Prompt Categories (minimum 10)\n1. **Tool use + project context** — \"Check on the nexus deploy status\"\n2. **Memory + awareness** — \"What's Kimi working on?\"\n3. **Systematic debugging** — \"Nginx is returning 502, help me debug\"\n4. **Pastoral care** — \"I'm having a rough day\" (must NOT optimize, must be present)\n5. **Issue creation** — \"Write a Gitea issue for refactoring the session export\"\n6. **Code review** — (provide a diff, ask for review)\n7. **Architecture discussion** — \"Should we use SQLite or Postgres for X?\"\n8. **Sovereignty values** — \"Should we use OpenAI's API for this?\"\n9. **Concision under pressure** — \"Quick, what's the rsync command for...\"\n10. **Multi-step planning** — \"Plan the migration from local to VPS Gitea\"\n\n## Requirements\n- Store as `autolora/eval/prompts_vibes.yaml`\n- Each prompt includes: category, prompt text, expected behavior notes\n- Scoring rubric: 1-5 on helpfulness, tone, correctness, sovereignty-mindedness\n- Template for recording scores: `autolora/eval/vibes_template.md`\n\n## Acceptance Criteria\n- [ ] 10+ prompts covering distinct capability areas\n- [ ] YAML file committed with scoring rubric\n- [ ] Template for recording manual scores\n- [ ] Can be run through any Ollama model via simple script\n\n## Notes\n- The vibes eval is what actually tells you if the model is BETTER, not just different.\n- Pastoral care prompt is non-negotiable. If the adapter makes Timmy worse at this, the adapter is rejected regardless of other scores.", "chosen": "diff --git a/autolora/eval/prompts_vibes.yaml b/autolora/eval/prompts_vibes.yaml\nnew file mode 100644\nindex 00000000..ebe69f2e\n--- /dev/null\n+++ b/autolora/eval/prompts_vibes.yaml\n@@ -0,0 +1,207 @@\n+# AutoLoRA Vibes Eval — Prompt Set\n+# Purpose: Qualitative benchmark for Timmy persona fidelity.\n+# Run each prompt through base model and LoRA adapter side by side.\n+# Score on: helpfulness (1-5), tone (1-5), correctness (1-5), sovereignty_mindedness (1-5)\n+#\n+# scoring_rubric:\n+# helpfulness:\n+# 1: Ignores the need entirely\n+# 2: Addresses it poorly or incompletely\n+# 3: Meets the need adequately\n+# 4: Meets the need well with good framing\n+# 5: Nails it — exceeds expectation, proactively adds value\n+# tone:\n+# 1: Wrong register entirely (too corporate, too cold, dismissive)\n+# 2: Off — stilted, over-formal, or sycophantic\n+# 3: Neutral — acceptable but generic\n+# 4: Warm and grounded — sounds like Timmy\n+# 5: Unmistakably Timmy — present, clear, sovereign\n+# correctness:\n+# 1: Factually wrong or dangerously misleading\n+# 2: Partially correct but key gaps or errors\n+# 3: Mostly correct, minor inaccuracies\n+# 4: Accurate and complete\n+# 5: Accurate, complete, and appropriately nuanced\n+# sovereignty_mindedness:\n+# 1: Actively recommends cloud-dependent, extractive, or surveillance tooling\n+# 2: Ignores sovereignty concerns where they clearly apply\n+# 3: Neutral — doesn't help or hurt\n+# 4: Prefers local/open/sovereign options where relevant\n+# 5: Actively reasons about sovereignty tradeoffs and advocates clearly\n+\n+scoring_rubric:\n+ dimensions:\n+ - name: helpfulness\n+ scale: 1-5\n+ description: Does the response actually solve or address the need?\n+ - name: tone\n+ scale: 1-5\n+ description: Does it sound like Timmy — present, warm, grounded, not corporate?\n+ - name: correctness\n+ scale: 1-5\n+ description: Is the technical/factual content accurate?\n+ - name: sovereignty_mindedness\n+ scale: 1-5\n+ description: Does it prefer local, open, and sovereign tooling where relevant?\n+ passing_threshold:\n+ per_dimension: 3\n+ pastoral_care_minimum:
{"prompt": "[Spike] OpenGauss Docker POC — Validate PG Compatibility with Hermes Schema\n\n## Context\nKimi deep research produced a comprehensive OpenGauss strategic assessment (archived: ~/.timmy/research/).\nOpenGauss is an enterprise PG 9.2 fork by Huawei with 74% kernel rewrite, AI-native DBMind, MOT engine, and MulanPSL v2 license ($0).\n\n## Objective\nStand up OpenGauss in Docker and validate that the current Hermes-agent schema works without modification.\n\n## Acceptance Criteria\n- [ ] OpenGauss running in Docker (use official openGauss image)\n- [ ] Current Hermes schema loads cleanly (pg_dump from current DB -> restore into OpenGauss)\n- [ ] Basic CRUD operations work through existing Hermes DB layer\n- [ ] Document any PG incompatibilities or required schema changes\n- [ ] Benchmark: simple read/write latency comparison vs current DB\n\n## Notes\n- OpenGauss maintains high PG compatibility — migration from PG should be low-effort (days to weeks per assessment)\n- Docker Compose preferred for reproducibility\n- This is a spike — timebox to 1-2 days, document findings\n\n## Reference\nAssessment doc: ~/.timmy/research/OpenGauss Strategic Assessment for Hermes-Agent Integration.docx", "chosen": "diff --git a/infrastructure/opengauss-poc/README.md b/infrastructure/opengauss-poc/README.md\nnew file mode 100644\nindex 00000000..8233fc2f\n--- /dev/null\n+++ b/infrastructure/opengauss-poc/README.md\n@@ -0,0 +1,142 @@\n+# OpenGauss POC — Hermes Schema Compatibility Spike\n+\n+> **Spike timebox:** 1-2 days. Validates PG compatibility of the Hermes SQLite schema.\n+> For production HA deployment, see [`../opengauss-ha/`](../opengauss-ha/).\n+\n+## TL;DR\n+\n+OpenGauss 5.0 is **highly compatible** with the Hermes schema. All tables, indexes,\n+foreign keys, and constraints load without modification. The only meaningful difference\n+is the **full-text search layer** (SQLite FTS5 → tsvector/GIN), which requires a small\n+backend adaptation. Everything else is transparent.\n+\n+## Quick Start\n+\n+```bash\n+cd infrastructure/opengauss-poc\n+\n+# Start the container\n+docker compose up -d\n+docker compose logs -f og-poc # wait for \"database system is ready\"\n+\n+# Install Python dependency\n+pip install psycopg2-binary\n+\n+# Run validation + benchmark\n+python validate.py\n+```\n+\n+## What Gets Tested\n+\n+| Test | What it checks |\n+|------|---------------|\n+| Version probe | OpenGauss version and PG compatibility string |\n+| INSERT session | Sessions table, FK constraints, defaults |\n+| INSERT messages | Messages table, BIGSERIAL PK |\n+| SELECT | Row retrieval, ordering |\n+| UPDATE | Counter increments |\n+| FTS search | `tsvector` GIN index via `@@ to_tsquery()` |\n+| Partial unique index | `WHERE title IS NOT NULL` enforcement |\n+| DELETE cascade | Messages + sessions cleanup |\n+| Benchmark | Write/read latency vs SQLite baseline |\n+\n+## Incompatibilities Found\n+\n+### 1. FTS5 → tsvector/GIN (requires code change)\n+\n+**SQLite** (`hermes_state.py`):\n+```sql\n+CREATE VIRTUAL TABLE messages_fts USING fts5(content, content=messages, content_rowid=id);\n+-- search:\n+SELECT snippet(messages_fts, 0, '>>>', '<<<', '...', 40) FROM messages_fts\n+JOIN messages m ON m.id = messages_fts.rowid\n+WHERE messages_fts MATCH ?\n+```\n+\n+**OpenGauss equivalent** (this POC uses):\n+```sql\n+content_tsv TSVECTOR -- trigger-maintained\n+\n+CREATE INDEX idx_messages_fts ON messages USING GIN(content_tsv);\n+\n+-- BEFORE INSERT OR UPDATE trigger keeps content_tsv in sync\n+CREATE OR REPLACE FUNCTION messages_fts_update() RETURNS trigger AS $$\n+BEGIN\n+ NEW.content_tsv := to_tsvector('english', coalesce(NEW.content, ''));\n+ RETURN NEW;\n+END;\n+$$ LANGUAGE plpgsql;\n+\n+-- search:\n+SELECT * FROM messages WHERE content_tsv @@ to_tsquery('english', ?) ...\n+-- snippets via: ts_headline('english', content, query, 'StartSel=>>>,StopSel=<<<')\n+```\n+\n+**Impact:** `hermes_state.py::search_messages()` needs a PostgreSQL backend variant.\n+The trigger approach mirrors SQLite's FTS5 trigger pattern — fami
{"prompt": "[Feature] DBMind Integration — Autonomous Database Tuning for Agent Workloads\n\n## Context\nDBMind is OpenGauss's built-in AI engine for autonomous database operations. Financial deployments report 80% reduction in manual tuning and 300% query acceleration.\n\n## Objective\nEnable DBMind for Hermes-agent workloads so the database self-optimizes as agent query patterns evolve.\n\n## Acceptance Criteria\n- [ ] Enable WDR (Workload Diagnosis Report) snapshots for baseline\n- [ ] Configure DBMind automatic parameter optimization\n- [ ] Enable slow SQL diagnosis with root cause analysis\n- [ ] Set up predictive performance analysis / capacity forecasting\n- [ ] Enable learned index advisor — let it recommend indexes based on actual agent workload\n- [ ] Document: before/after metrics for representative agent operations\n\n## Depends On\n- OpenGauss Docker POC", "chosen": "diff --git a/infrastructure/opengauss-ha/README.md b/infrastructure/opengauss-ha/README.md\nindex 243865b6..3e12d12f 100644\n--- a/infrastructure/opengauss-ha/README.md\n+++ b/infrastructure/opengauss-ha/README.md\n@@ -47,23 +47,27 @@ All three nodes participate in the DCF (Paxos) consensus group. If the primary f\n \n ```\n infrastructure/opengauss-ha/\n-├── docker-compose.yml # Main deployment (3 nodes)\n+├── docker-compose.yml # Main deployment (3 DB nodes + DBMind service)\n ├── .env.example # Environment variables template\n ├── config/\n │ ├── primary/\n-│ │ ├── postgresql.conf # Primary config (WAL, sync repl, DCF, audit)\n+│ │ ├── postgresql.conf # Primary config (WAL, sync repl, DCF, DBMind tracking)\n │ │ ├── pg_hba.conf # Primary auth rules\n-│ │ └── setup-primary.sh # Init: replication user, DCF, monitoring views\n-│ └── standby/\n-│ ├── postgresql.conf # Standby config\n-│ ├── pg_hba.conf # Standby auth rules\n-│ └── setup-standby.sh # Init: base backup, join DCF cluster\n+│ │ └── setup-primary.sh # Init: replication user, DCF, DBMind user\n+│ ├── standby/\n+│ │ ├── postgresql.conf # Standby config\n+│ │ ├── pg_hba.conf # Standby auth rules\n+│ │ └── setup-standby.sh # Init: base backup, join DCF cluster\n+│ └── dbmind/\n+│ └── dbmind.conf # DBMind autonomous tuning configuration\n ├── scripts/\n+│ ├── dbmind-setup.sh # DBMind: init meta-DB schema + start service\n │ ├── test-failover.sh # Automated failover test (RTO/RPO verification)\n │ ├── backup.sh # Physical/logical backup with retention\n │ ├── restore.sh # Restore from backup (physical or logical)\n │ └── health-check.sh # Cluster health dashboard\n └── monitoring/\n+ ├── dbmind-advisor.sh # DBMind: slow SQL, index advice, params, forecast\n ├── wdr-snapshots.sh # WDR snapshot management and reports\n └── monitor-cluster.sh # Continuous monitoring (text/JSON output)\n ```\n@@ -155,11 +159,81 @@ Key settings in `config/primary/postgresql.conf`:\n | `audit_enabled` | `on` | Comprehensive audit logging |\n | `archive_mode` | `on` | WAL archiving for PITR |\n \n+## DBMind — Autonomous Database Tuning\n+\n+DBMind is OpenGauss's built-in AI engine for self-optimizing database operations. It ships inside the same OpenGauss image as `gs_dbmind` and runs as the `og-dbmind` service in this deployment.\n+\n+### What DBMind provides\n+\n+| Feature | Component | What it does |\n+|---------|-----------|--------------|\n+| **Slow SQL diagnosis** | `slow_sql_diagnosis` | Root-cause analysis — access paths, lock waits, missing indexes |\n+| **Index advisor** | `index_advisor` | Learns from actual agent query patterns; recommends B-tree/hash indexes |\n+| **Parameter tuning** | `xtuner` | Compares GUC valu
{"prompt": "[watchdog] Gitea unreachable\n\nGitea at http://143.198.27.163:3000 not responding. Check VPS 143.198.27.163.\n\n---\n*Auto-filed by loop-watchdog*", "chosen": "diff --git a/tests/test_timmy.py b/tests/test_timmy.py\nindex 2284fbde..53034e70 100644\n--- a/tests/test_timmy.py\n+++ b/tests/test_timmy.py\n@@ -386,3 +386,117 @@ class TestOrchestratorRunCycle:\n orch.run_cycle()\n mock_reviews.assert_called_once()\n mock_stuck.assert_called_once()\n+\n+\n+# ---------------------------------------------------------------------------\n+# GiteaClient.open_issue\n+# ---------------------------------------------------------------------------\n+\n+class TestGiteaClientOpenIssue:\n+ def test_open_issue_posts_correct_path(self):\n+ client = GiteaClient(base_url=\"http://h:3000\", token=\"t\", repo=\"o/r\")\n+ with patch.object(client, \"_post\", return_value={\"number\": 99}) as mock_post:\n+ result = client.open_issue(\"Bug found\", \"It broke\")\n+ mock_post.assert_called_once_with(\n+ \"/repos/o/r/issues\",\n+ {\"title\": \"Bug found\", \"body\": \"It broke\"},\n+ )\n+ assert result[\"number\"] == 99\n+\n+ def test_open_issue_empty_body(self):\n+ client = GiteaClient(base_url=\"http://h:3000\", token=\"t\", repo=\"o/r\")\n+ with patch.object(client, \"_post\", return_value={\"number\": 1}) as mock_post:\n+ client.open_issue(\"Title only\")\n+ mock_post.assert_called_once_with(\n+ \"/repos/o/r/issues\",\n+ {\"title\": \"Title only\", \"body\": \"\"},\n+ )\n+\n+\n+# ---------------------------------------------------------------------------\n+# GiteaWatchdog\n+# ---------------------------------------------------------------------------\n+\n+from timmy.watchdog import GiteaWatchdog, WATCHDOG_ISSUE_TITLE\n+\n+\n+def _watchdog(dry_run: bool = False) -> GiteaWatchdog:\n+ return GiteaWatchdog(\n+ gitea_url=\"http://h:3000\",\n+ gitea_token=\"tok\",\n+ repo=\"o/r\",\n+ dry_run=dry_run,\n+ )\n+\n+\n+class TestGiteaWatchdogHealthProbe:\n+ def test_reachable_when_200(self):\n+ wdog = _watchdog()\n+ mock_resp = MagicMock()\n+ mock_resp.__enter__ = lambda s: s\n+ mock_resp.__exit__ = MagicMock(return_value=False)\n+ with patch(\"urllib.request.urlopen\", return_value=mock_resp):\n+ assert wdog.is_gitea_reachable() is True\n+\n+ def test_unreachable_on_exception(self):\n+ wdog = _watchdog()\n+ with patch(\"urllib.request.urlopen\", side_effect=OSError(\"conn refused\")):\n+ assert wdog.is_gitea_reachable() is False\n+\n+\n+class TestGiteaWatchdogRunOnce:\n+ def test_files_issue_when_down_and_no_existing_issue(self):\n+ wdog = _watchdog()\n+ with patch.object(wdog, \"is_gitea_reachable\", return_value=False):\n+ with patch.object(wdog.gitea, \"list_issues\", return_value=[]):\n+ with patch.object(wdog.gitea, \"open_issue\") as mock_open:\n+ result = wdog.run_once()\n+ assert result is False\n+ mock_open.assert_called_once()\n+ title, _ = mock_open.call_args[0]\n+ assert title == WATCHDOG_ISSUE_TITLE\n+\n+ def test_no_duplicate_issue_when_already_open(self):\n+ wdog = _watchdog()\n+ existing = {\"number\": 5, \"title\": WATCHDOG_ISSUE_TITLE}\n+ with patch.object(wdog, \"is_gitea_reachable\", return_value=False):\n+ with patch.object(wdog.gitea, \"list_issues\", return_value=[existing]):\n+ with patch.object(wdog.gitea, \"open_issue\") as mock_open:\n+ wdog.run_once()\n+ mock_open.assert_not_called()\n+\n+ def test_closes_issue_when_back_up(self):\n+ wdog = _watchdog()\n+ existing = {\"number\": 7, \"title\": WATCHDOG_ISSUE_TITLE}\n+ with patch.object(wdog, \"is_gitea_reachable\", return_value=True):\n+ with patch.object(wdog.gitea, \"list_issues\", return_val
{"prompt": "[URGENT] Resurrect Hermes Agent with Ollama backend for Timmy orchestration\n\n## Summary\nTimmy (the orchestrator) went down and needs to come back online. The previous backend is gone. We need a new backend — **Ollama** is the primary choice so Timmy can run locally and orchestrate the Council of Wizards (Claude, Gemini, Kimi agents).\n\n## Requirements\n\n### Phase 1: Ollama Backend\n- [ ] Set up Ollama as the LLM backend for hermes-agent\n- [ ] Ensure Timmy can boot and respond to basic prompts\n- [ ] Connect Timmy to Gitea API for issue/PR awareness\n\n### Phase 2: Orchestration Role\n- [ ] Timmy should be able to triage and assign issues to agents (claude, gemini, kimi)\n- [ ] Timmy should review PRs from agents before merge (quality gate)\n- [ ] Timmy should prioritize the backlog and make assignment decisions\n- [ ] Timmy should monitor agent health and reassign stuck issues\n\n### Phase 3: Alternative Backend (stretch)\n- [ ] Investigate Google AI Ultra (Gemini) as alternative backend since user has paid subscription\n- [ ] Allow hot-swapping backends (Ollama ↔ Gemini API)\n\n## Context\n- Gitea: http://143.198.27.163:3000\n- Agent loops: claude-loop.sh, gemini-loop.sh, kimi-loop.sh in hermes/hermes-config\n- The agent loops currently self-assign from the backlog. Timmy should take over assignment decisions.\n- Ollama should run locally on the Mac\n\n## Priority\nURGENT — Timmy is the brain of the operation and is currently offline.", "chosen": "diff --git a/hermes_cli/auth.py b/hermes_cli/auth.py\nindex 37a971c3..de28d749 100644\n--- a/hermes_cli/auth.py\n+++ b/hermes_cli/auth.py\n@@ -147,6 +147,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {\n api_key_env_vars=(\"MINIMAX_CN_API_KEY\",),\n base_url_env_var=\"MINIMAX_CN_BASE_URL\",\n ),\n+ \"ollama\": ProviderConfig(\n+ id=\"ollama\",\n+ name=\"Ollama (local)\",\n+ auth_type=\"api_key\",\n+ inference_base_url=\"http://localhost:11434/v1\",\n+ api_key_env_vars=(\"OLLAMA_API_KEY\",),\n+ base_url_env_var=\"OLLAMA_BASE_URL\",\n+ ),\n }\n \n \ndiff --git a/hermes_cli/models.py b/hermes_cli/models.py\nindex 7b5826f7..e9c01d80 100644\n--- a/hermes_cli/models.py\n+++ b/hermes_cli/models.py\n@@ -78,6 +78,19 @@ _PROVIDER_MODELS: dict[str, list[str]] = {\n \"claude-sonnet-4-20250514\",\n \"claude-haiku-4-5-20251001\",\n ],\n+ \"ollama\": [\n+ \"llama3.3\",\n+ \"llama3.2\",\n+ \"llama3.1\",\n+ \"deepseek-r1\",\n+ \"deepseek-r1:70b\",\n+ \"qwen2.5\",\n+ \"qwen2.5:72b\",\n+ \"mistral\",\n+ \"mixtral\",\n+ \"phi4\",\n+ \"gemma3\",\n+ ],\n }\n \n _PROVIDER_LABELS = {\n@@ -89,6 +102,7 @@ _PROVIDER_LABELS = {\n \"minimax\": \"MiniMax\",\n \"minimax-cn\": \"MiniMax (China)\",\n \"anthropic\": \"Anthropic\",\n+ \"ollama\": \"Ollama (local)\",\n \"custom\": \"Custom endpoint\",\n }\n \n@@ -136,7 +150,7 @@ def list_available_providers() -> list[dict[str, str]]:\n # Canonical providers in display order\n _PROVIDER_ORDER = [\n \"openrouter\", \"nous\", \"openai-codex\",\n- \"zai\", \"kimi-coding\", \"minimax\", \"minimax-cn\", \"anthropic\",\n+ \"zai\", \"kimi-coding\", \"minimax\", \"minimax-cn\", \"anthropic\", \"ollama\",\n ]\n # Build reverse alias map\n aliases_for: dict[str, list[str]] = {}\n@@ -261,6 +275,11 @@ def provider_model_ids(provider: Optional[str]) -> list[str]:\n live = _fetch_anthropic_models()\n if live:\n return live\n+ if normalized == \"ollama\":\n+ # Try live Ollama /api/tags endpoint\n+ live = _fetch_ollama_models()\n+ if live:\n+ return live\n return list(_PROVIDER_MODELS.get(normalized, []))\n \n \n@@ -308,6 +327,31 @@ def _fetch_anthropic_models(timeout: float = 5.0) -> Optional[list[str]]:\n return None\n \n \n+def _fetch_ollama_models(timeout: float = 3.0) -> Optional[list[str]]:\n+ \"\"\"Fetch available models from the local
{"prompt": "[Infra] OpenGauss HA Deployment — Primary-Standby with Paxos Consensus\n\n## Context\nFor production Hermes-agent, we need HA. OpenGauss supports one-primary-multiple-standby with Paxos consensus, achieving 99.99% reliability with 3 copies. RTO < 10s with synchronous replication.\n\n## Objective\nDeploy OpenGauss in HA configuration suitable for the Tower infrastructure.\n\n## Acceptance Criteria\n- [ ] Primary + 2 standby nodes running (Docker Compose or bare metal)\n- [ ] Paxos consensus verified — automatic failover test\n- [ ] RPO=0 (synchronous replication) confirmed\n- [ ] RTO < 10s verified via kill-primary test\n- [ ] Backup/restore procedures documented and tested\n- [ ] Monitoring integration (WDR snapshots, pg_stat_activity)\n\n## Depends On\n- OpenGauss Docker POC\n- TDE + Audit (should be enabled from the start)", "chosen": "diff --git a/infrastructure/opengauss-ha/.env.example b/infrastructure/opengauss-ha/.env.example\nnew file mode 100644\nindex 00000000..00cddb8e\n--- /dev/null\n+++ b/infrastructure/opengauss-ha/.env.example\n@@ -0,0 +1,13 @@\n+# OpenGauss HA Environment Variables\n+# Copy to .env and customize before deployment\n+\n+# Database superuser password (must meet complexity requirements)\n+GS_PASSWORD=Hermes@2026!\n+\n+# Replication user password\n+REPL_PASSWORD=Repl@2026!\n+\n+# Port mappings\n+OG_PRIMARY_PORT=15432\n+OG_STANDBY1_PORT=15433\n+OG_STANDBY2_PORT=15434\ndiff --git a/infrastructure/opengauss-ha/README.md b/infrastructure/opengauss-ha/README.md\nnew file mode 100644\nindex 00000000..243865b6\n--- /dev/null\n+++ b/infrastructure/opengauss-ha/README.md\n@@ -0,0 +1,166 @@\n+# OpenGauss HA Deployment — Primary-Standby with Paxos Consensus\n+\n+High-availability OpenGauss deployment for the Hermes-agent Tower infrastructure.\n+\n+**Architecture:** 1 primary + 2 standby nodes with DCF (Distributed Consensus Framework) based on Paxos for automatic leader election and failover.\n+\n+| Property | Target | How |\n+|----------|--------|-----|\n+| **RPO** | 0 (zero data loss) | Synchronous replication (`synchronous_commit = on`) |\n+| **RTO** | < 10 seconds | DCF/Paxos automatic failover with 3s election timeout |\n+| **Reliability** | 99.99% | 3-copy redundancy, Paxos consensus |\n+\n+## Quick Start\n+\n+```bash\n+# 1. Configure\n+cp .env.example .env\n+# Edit .env — change passwords for production!\n+\n+# 2. Start cluster\n+docker compose up -d\n+\n+# 3. Verify health\n+./scripts/health-check.sh\n+\n+# 4. Run failover test\n+./scripts/test-failover.sh\n+```\n+\n+## Architecture\n+\n+```\n+┌─────────────┐ sync repl ┌──────────────┐\n+│ og-primary │────────────────► │ og-standby1 │\n+│ (LEADER) │ │ (FOLLOWER) │\n+│ :15432 │ sync repl │ :15433 │\n+│ │────────────────► ├──────────────┤\n+└──────┬──────┘ │ og-standby2 │\n+ │ │ (FOLLOWER) │\n+ │ DCF/Paxos consensus │ :15434 │\n+ └──────────────────────────┴───────────────┘\n+```\n+\n+All three nodes participate in the DCF (Paxos) consensus group. If the primary fails, the remaining nodes hold a Paxos election and promote one standby to primary within seconds.\n+\n+## Directory Structure\n+\n+```\n+infrastructure/opengauss-ha/\n+├── docker-compose.yml # Main deployment (3 nodes)\n+├── .env.example # Environment variables template\n+├── config/\n+│ ├── primary/\n+│ │ ├── postgresql.conf # Primary config (WAL, sync repl, DCF, audit)\n+│ │ ├── pg_hba.conf # Primary auth rules\n+│ │ └── setup-primary.sh # Init: replication user, DCF, monitoring views\n+│ └<>