fix: restore /usage account limits in CLI + gateway (#958) #1016

Open

Rockachopa wants to merge 186 commits from fix/958 into main

Author	SHA1	Message	Date
Alexander Whitestone	eab5635a7a	fix: restore /usage account limits (#958 ) All checks were successful Lint / lint (pull_request) Successful in 9s Details	2026-04-22 10:36:49 -04:00
Google AI Agent	d574690abe	Merge pull request 'feat: The Sovereign Accountant — Agent Telemetry' (#1009 ) from feat/sovereign-accountant-agent-1776866068545 into main All checks were successful Lint / lint (pull_request) Successful in 19s Details Lint / lint (push) Successful in 28s Details	2026-04-22 13:55:16 +00:00
Google AI Agent	e208885de6	feat: wire telemetry hooks into auxiliary client All checks were successful Lint / lint (pull_request) Successful in 10s Details	2026-04-22 13:54:32 +00:00
Google AI Agent	cd84fa2084	feat: add telemetry logger for token accounting	2026-04-22 13:54:30 +00:00
Google AI Agent	63babca056	Merge pull request 'docs: poka-yoke integration phase 3 status (#967 )' (#976 ) from fix/967 into main All checks were successful Lint / lint (push) Successful in 11s Details	2026-04-22 13:39:43 +00:00
Google AI Agent	cab3c82c5c	Merge pull request '[claude] Add update/restart action buttons to web dashboard (#961 )' (#968 ) from claude/issue-961 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 13:39:36 +00:00
Google AI Agent	64a8059f9f	Merge pull request '[claude] Verify hardcoded-home path guard on burn/921 branch (#962 )' (#964 ) from claude/issue-962 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 13:39:32 +00:00
Google AI Agent	90f6fdef60	Merge pull request 'feat: Autonomous Regression Sentry — verify_impact tool' (#970 ) from feat/impact-analysis-tool-1776826592325 into main All checks were successful Lint / lint (push) Successful in 11s Details	2026-04-22 13:38:47 +00:00
Google AI Agent	18e3533a0a	Merge pull request 'feat: The Budgetary Sovereign Router — Efficiency Sauce' (#1008 ) from feat/budgetary-router-1776864510362 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 13:38:40 +00:00
Google AI Agent	60ccd825ec	Merge pull request 'feat: The Sovereign Teleport — State Migration Sauce' (#1007 ) from feat/sovereign-teleport-1776864503956 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 13:38:36 +00:00
Google AI Agent	e7d5a7f2cf	Merge pull request 'feat: The Scavenger Fixer — Closing the Autonomous Loop' (#975 ) from feat/autonomous-scavenger-fix-1776827712502 into main All checks were successful Lint / lint (push) Successful in 13s Details	2026-04-22 13:38:03 +00:00
Google AI Agent	9aaac192cf	Merge pull request 'test(#798 ): Parallel tool calling — 2+ tools per response' (#988 ) from fix/798 into main All checks were successful Lint / lint (push) Successful in 9s Details	2026-04-22 13:36:37 +00:00
Google AI Agent	f3d88ec31d	Merge pull request '[claude] Wire Gemma 4 vision into browser_tool for screenshot analysis (#816 )' (#947 ) from claude/issue-816 into main All checks were successful Lint / lint (push) Successful in 13s Details	2026-04-22 13:36:20 +00:00
Google AI Agent	2f22570622	Merge pull request 'feat(web-console): Self-healing browser CDP + operator cockpit (#394 )' (#934 ) from feat/web-console-394 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 13:36:14 +00:00
Google AI Agent	2022322606	Merge pull request 'feat: Deep Dive Security Integration - Multilayer Defense' (#929 ) from feat/security-deep-dive-1776732106631 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 13:36:08 +00:00
Google AI Agent	d6ec32fe93	Merge pull request 'feat: implement SHIELD Multilingual Defense & Input Sanitization' (#918 ) from feat/shield-multilingual-1776700482647 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 13:36:05 +00:00
Google AI Agent	2b284e75f6	Merge pull request 'feat: Multi-Agent Concurrency Guard — "Secret Sauce" for Fleet Scaling' (#969 ) from feat/fleet-concurrency-guard-1776826501792 into main All checks were successful Lint / lint (push) Successful in 16s Details	2026-04-22 13:29:01 +00:00
Google AI Agent	efa1fc034e	feat: Budgetary Sovereign Router — Complexity-aware steering All checks were successful Lint / lint (pull_request) Successful in 25s Details	2026-04-22 13:28:31 +00:00
Google AI Agent	99d925d40b	feat: Sovereign Teleport — Cross-environment agent migration All checks were successful Lint / lint (pull_request) Successful in 28s Details	2026-04-22 13:28:25 +00:00
Alexander Whitestone	ed250b1ca8	test(#798 ): Strengthen parallel tool calling tests + fix flaky concurrent tests All checks were successful Lint / lint (pull_request) Successful in 10s Details - Add TestAIAgentConcurrentExecution with 8 integration tests exercising _execute_tool_calls_concurrent through AIAgent for 2/3/4-tool batches, pass-rate reporting, and Gemma 4-style read patterns. - Fix test_malformed_json_args_forces_sequential: use JSON array '[1,2,3]' instead of unrepairable garbage now that repair_and_load_json handles most malformed input. - Fix test_concurrent_handles_tool_error: replace racy call_count list with deterministic failure based on tool_call_id to eliminate flaky failures under ThreadPoolExecutor. Closes #798	2026-04-22 01:34:24 -04:00
Alexander Whitestone	1f5067e94a	Merge: bring in prior QA work on path guard (Refs #962 ) All checks were successful Lint / lint (pull_request) Successful in 15s Details	2026-04-22 00:25:50 -04:00
Alexander Whitestone	798ca3aa06	chore: sync with remote claude/issue-961 branch All checks were successful Lint / lint (pull_request) Successful in 22s Details Refs #961 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 00:04:51 -04:00
Alexander Whitestone	5d3e13ede2	test: add pre-commit path guard hook from burn/921 (Refs #962 ) All checks were successful Lint / lint (pull_request) Successful in 24s Details Brings hooks/pre-commit-path-guard.py from burn/921-poka-yoke-hardcoded-paths to complete QA verification of all guard layers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 23:55:38 -04:00
Alexander Whitestone	82a076bf4d	docs: poka-yoke integration phase 3 status (#967 ) All checks were successful Lint / lint (pull_request) Successful in 8s Details	2026-04-22 03:24:26 +00:00
Alexander Whitestone	16eab5d503	Merge pull request '[claude] A2A auth — mutual TLS between fleet agents (#806 )' (#948 ) from claude/issue-806 into main All checks were successful Lint / lint (push) Successful in 13s Details Merge PR #948: A2A auth — mutual TLS between fleet agents (#806)	2026-04-22 03:19:42 +00:00
Google AI Agent	81f7347bcb	feat: Scavenger Fixer — Autonomous tech debt healing All checks were successful Lint / lint (pull_request) Successful in 22s Details	2026-04-22 03:15:17 +00:00
Google AI Agent	c7a2d439c1	Merge pull request 'feat: The Sovereign Scavenger — Automated Tech Debt Recovery' (#974 ) from feat/sovereign-scavenger-1776827259631 into main All checks were successful Lint / lint (push) Successful in 12s Details	2026-04-22 03:14:14 +00:00
Google AI Agent	8ad8520bd2	Merge pull request 'feat: Execution Safety Sentry — GOFAI Risk Analysis' (#973 ) from feat/static-analyzer-gofai-1776826921747 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 03:14:07 +00:00
Google AI Agent	9c7c88823f	Merge pull request 'feat: Local Inference Story — Freeing the fleet from cloud dependency' (#972 ) from feat/local-inference-bridge-1776826896029 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 03:14:03 +00:00
Google AI Agent	aa45e02238	Merge pull request 'feat: GOFAI Semantic Sentry — Deterministic code verification' (#971 ) from feat/symbolic-verify-gofai-1776826842170 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-22 03:14:01 +00:00
Google AI Agent	3266c39e8e	feat: Sovereign Scavenger — Turning tech debt into actionable backlog All checks were successful Lint / lint (pull_request) Successful in 18s Details	2026-04-22 03:07:40 +00:00
Alexander Whitestone	e8886f10c8	feat: add Update Hermes and Restart Gateway action buttons to web dashboard All checks were successful Lint / lint (pull_request) Successful in 10s Details Implements the action button lifecycle described in #961: - POST /api/actions/restart-gateway — sends SIGTERM to the gateway PID - POST /api/actions/update-hermes — runs pip upgrade in a background job - GET /api/actions/jobs/{job_id} — polls job status/output Frontend (StatusPage.tsx): - "Restart Gateway" button with spinning icon while running, then success/error message that clears after 5–8 s - "Update Hermes" button that polls the job endpoint every 2 s; shows collapsible pip output on completion - Page remains responsive (buttons disabled only during their own action) Also adds i18n strings to en.ts, zh.ts, and the shared types.ts interface. Fixes #961 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 23:04:10 -04:00
Google AI Agent	93a855d4e3	feat: Static Risk Analyzer (GOFAI) for execution safety All checks were successful Lint / lint (pull_request) Successful in 8s Details	2026-04-22 03:02:02 +00:00
Google AI Agent	5a0bdb556e	feat: Local Inference Bridge — Bypassing cloud for local tasks All checks were successful Lint / lint (pull_request) Successful in 17s Details	2026-04-22 03:01:37 +00:00
Google AI Agent	d619d279f8	feat: Symbolic Sentry (GOFAI) for deterministic code audits All checks were successful Lint / lint (pull_request) Successful in 15s Details	2026-04-22 03:00:44 +00:00
Google AI Agent	d3b13a6aa5	feat: add verify_impact tool for regression guarding All checks were successful Lint / lint (pull_request) Successful in 16s Details	2026-04-22 02:56:33 +00:00
Google AI Agent	77d2430a44	feat: add Fleet-Wide File Concurrency Guard All checks were successful Lint / lint (pull_request) Successful in 19s Details	2026-04-22 02:55:04 +00:00
Alexander Whitestone	d2ce6b8749	test: verify action endpoints for restart-gateway and update-hermes All checks were successful Lint / lint (pull_request) Successful in 27s Details Add TestActionEndpoints class to test_web_server.py covering: - POST /api/actions/restart-gateway sends SIGUSR1 to gateway PID - 409 when gateway is not running - 500 when os.kill raises a signal error - POST /api/actions/update-hermes returns ok=true on zero exit - ok=false on non-zero exit code with stderr in detail - ok=false on timeout - Both endpoints reject unauthenticated requests All 7 new tests pass (83 total in the file). Refs #961	2026-04-21 22:41:27 -04:00
Alexander Whitestone	a8a086548d	feat: add restart gateway and update Hermes action buttons to web dashboard All checks were successful Lint / lint (pull_request) Successful in 29s Details Implements the update/restart action buttons called out in issue #961: - Backend (web_server.py): two new POST endpoints - /api/actions/restart-gateway — sends SIGUSR1 to the running gateway PID - /api/actions/update-hermes — runs `hermes update --yes` in a subprocess - Frontend (api.ts): restartGateway() / updateHermes() API helpers + ActionResponse type - UI (StatusPage.tsx): "Actions" card with Restart Gateway and Update Hermes buttons - idle → running (spinner) → success/failure states - feedback detail text; auto-resets to idle after 8 s - i18n: new status.actions / restartGateway / updateHermes strings in en, zh, and types Refs #961 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 22:30:22 -04:00
Alexander Whitestone	9e00a59791	test: verify hardcoded-home path guard from burn/921 branch All checks were successful Lint / lint (pull_request) Successful in 29s Details Cherry-picks tools/path_guard.py and tests/test_path_guard.py from burn/921-poka-yoke-hardcoded-paths (commit `5dcb905`). All 21 tests pass: - hardcoded /Users/<name>/ paths are rejected at runtime - hardcoded /home/<name>/ paths are rejected at runtime - ~/.hermes/... via expanduser() passes (safe, expanded at runtime) - valid relative and /tmp/ absolute paths pass - static scanner catches violations and respects # noqa: hardcoded-path-ok - comments are skipped by scanner - directory scanner skips test files and __pycache__ Refs #962 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 22:26:54 -04:00
Alexander Whitestone	9ef7682ee2	chore: merge remote claude/issue-816 — deduplicate gemma-4-27b-it in models.py All checks were successful Lint / lint (pull_request) Successful in 30s Details Merged prior implementation (PR #947) and resolved conflicts. Removed duplicate "gemma-4-27b-it" entry introduced during merge. Refs #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 21:27:51 -04:00
Alexander Whitestone	e157a22639	feat: wire Gemma 4 vision into browser_tool for screenshot analysis - Add `_BROWSER_VISION_DEFAULT_MODEL = "google/gemma-4-27b-it"` constant - Rewrite `_get_vision_model()` with 4-tier resolution: 1. BROWSER_VISION_MODEL env var (browser-specific override) 2. auxiliary.browser_vision.model config key 3. AUXILIARY_VISION_MODEL env var (backward compat) 4. google/gemma-4-27b-it default (Gemma 4 native multimodal) - Extract `_load_browser_vision_config()` helper for testability - Always set call_kwargs["model"] (remove redundant `if vision_model` guard) - Read timeout from auxiliary.browser_vision.timeout before auxiliary.vision.timeout - Register gemma-4-27b-it in Gemini provider model catalog - Document auxiliary.browser_vision section in cli-config.yaml.example - Add 12 unit tests in tests/tools/test_browser_vision_model.py covering all resolution tiers, backward compat, error fallthrough, and type guarantees Fixes #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 21:26:03 -04:00
Alexander Whitestone	671283389c	feat: Wire Gemma 4 vision into browser_tool for screenshot analysis All checks were successful Lint / lint (pull_request) Successful in 8s Details _get_vision_model() now resolves via a layered priority chain: 1. BROWSER_VISION_MODEL env var (browser-specific override) 2. config.yaml browser.vision_model 3. AUXILIARY_VISION_MODEL env var (backward-compat shared override) 4. google/gemma-4-27b-it — Gemma 4 native multimodal default Add browser.vision_model config key to hermes_cli/config.py defaults with inline documentation. call_kwargs["model"] is now always set (model is never None), and a debug log line records which model is in use for each screenshot. Fixes #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 20:51:04 -04:00
Alexander Whitestone	17cc4bac90	feat: complete Gemma 4 browser_vision wiring — task routing, timeout, tests All checks were successful Lint / lint (pull_request) Successful in 10s Details Building on the Gemma 4 default already on this branch: - Change call_llm() task from "vision" to "browser_vision" in browser_vision() so auxiliary.browser_vision.* config is consulted for provider/model/timeout - Route call_llm(task="browser_vision") through the vision provider resolution path in auxiliary_client.py (same as task="vision") - Fix timeout resolution: check auxiliary.browser_vision.timeout before auxiliary.vision.timeout (allows browser-specific timeout override) - Add timeout option to auxiliary.browser_vision in cli-config.yaml.example - Add test_browser_vision_gemma4.py covering: task routing assertions, call_llm() vision branch routing, and timeout config key ordering Refs #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 19:43:42 -04:00
Alexander Whitestone	1843545d66	chore: merge remote branch — resolve conflicts, use canonical implementation All checks were successful Lint / lint (pull_request) Successful in 8s Details Merge remote claude/issue-816 which contains the full Gemma 4 browser vision implementation. Resolved conflicts by taking the remote's cleaner variable names and docstrings while keeping the same 4-tier resolution logic. All 12 tests pass. Refs #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 18:50:22 -04:00
Alexander Whitestone	c643ac90da	feat: wire Gemma 4 vision into browser_tool for screenshot analysis - Add `_BROWSER_VISION_DEFAULT_MODEL = "google/gemma-4-27b-it"` constant - Rewrite `_get_vision_model()` with 4-tier resolution: 1. BROWSER_VISION_MODEL env var (browser-specific override) 2. auxiliary.browser_vision.model config key 3. AUXILIARY_VISION_MODEL env var (backward compat) 4. Gemma 4 27B default - Remove `if vision_model:` guard — function now always returns a string - Update browser_vision tool description to surface Gemma 4 as default - Register gemma-4-27b-it in Gemini provider model catalog (models.py) - Document auxiliary.browser_vision.model in cli-config.yaml.example - Add 14 unit tests covering all priority levels and backward compat Fixes #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 18:47:03 -04:00
Alexander Whitestone	da9c4cf10c	feat: wire Gemma 4 vision into browser_tool for screenshot analysis All checks were successful Lint / lint (pull_request) Successful in 7s Details Extends `_get_vision_model()` with a 5-level resolution chain: 1. `BROWSER_VISION_MODEL` env var — browser-specific override 2. `auxiliary.browser.vision_model` config key — per-install default 3. `AUXILIARY_VISION_MODEL` env var — backward-compat shared override 4. Auto-select `gemma-4-27b-it` when the main provider is Gemini/Google 5. `None` — fall through to `call_llm` vision router Adds `_BROWSER_VISION_DEFAULT_MODEL = "gemma-4-27b-it"` constant and registers `gemma-4-27b-it` in the Gemini provider model catalog. 16 new tests in `tests/tools/test_browser_vision_model.py` cover each priority level, edge cases (empty env, config exceptions, wrong provider). Fixes #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 18:18:30 -04:00
Alexander Whitestone	4214082fb6	feat: A2A auth — mutual TLS between fleet agents All checks were successful Lint / lint (pull_request) Successful in 8s Details Implements mTLS for securing agent-to-agent communication in the Hermes fleet. Fixes #806. Changes: - scripts/gen_fleet_ca.sh: generate a self-signed Fleet CA (4096-bit RSA, 10-year validity) that signs all agent certificates - scripts/gen_agent_cert.sh: generate per-agent certs (Timmy, Allegro, Ezra) signed by the fleet CA with SAN entries and clientAuth/serverAuth extended key usage - agent/mtls.py: new module providing: - build_server_ssl_context() — TLS_SERVER context with CERT_REQUIRED, enforces client cert against Fleet CA - build_client_ssl_context() — TLS_CLIENT context for outbound A2A calls - MTLSMiddleware — ASGI middleware that rejects unauthenticated requests to A2A routes (/.well-known/agent-card, /api/agent-card, /a2a/) with HTTP 403 when mTLS is enabled - is_mtls_configured() — checks HERMES_MTLS_CERT/KEY/CA env vars - hermes_cli/web_server.py: wire MTLSMiddleware into the FastAPI app; pass SSL context to uvicorn when HERMES_MTLS_ env vars are set so the server runs TLS with mandatory client cert verification - ansible/roles/hermes_mtls/: Ansible role to distribute Fleet CA cert, agent cert, and agent key to fleet nodes; writes an env file with HERMES_MTLS_* vars and restarts the hermes-gateway service - ansible/fleet_mtls.yml: fleet-wide playbook referencing the role for Timmy, Allegro, and Ezra nodes - tests/test_mtls.py: 15 tests covering is_mtls_configured, SSL context creation with real cryptography-generated certs, and MTLSMiddleware (unauthorized agent rejected → 403, authorized agent accepted → 200) mTLS is opt-in: set HERMES_MTLS_CERT, HERMES_MTLS_KEY, and HERMES_MTLS_CA to enable. When unset, the server behaves exactly as before. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 18:04:00 -04:00
Alexander Whitestone	95bb842a21	feat: Wire Gemma 4 vision into browser_tool for screenshot analysis All checks were successful Lint / lint (pull_request) Successful in 8s Details Default browser_vision screenshots to google/gemma-4-27b-it (Gemma 4 native multimodal) for reduced latency and unified text+vision model. Resolution order for _get_vision_model(): 1. BROWSER_VISION_MODEL env var (new, browser-specific override) 2. auxiliary.browser_vision.model in config.yaml (new config key) 3. AUXILIARY_VISION_MODEL env var (existing global vision override) 4. Default: google/gemma-4-27b-it Backward compatibility: existing AUXILIARY_VISION_MODEL users are unaffected — their override still flows through to browser_vision. Also documents the new auxiliary.browser_vision config section in cli-config.yaml.example and adds 14 unit tests covering the full priority chain. Fixes #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 17:14:32 -04:00
Alexander Whitestone	ac28444bf2	feat: add A2AMTLSServer routing API, A2AMTLSClient, and expand tests to 20 (#806 ) All checks were successful Lint / lint (pull_request) Successful in 9s Details Builds on the existing A2AServer / build_*_ssl_context foundation: - agent/a2a_mtls.py: - Add A2AMTLSServer: routing-based HTTPS server with add_route() and context-manager (__enter__/__exit__) lifecycle support - Add A2AMTLSClient: fleet-cert-presenting HTTP client with .get() / .post() - Widen imports (json, Callable, Dict, urlopen) - tests/agent/test_a2a_mtls.py: - Fix datetime.utcnow() deprecation — use datetime.now(timezone.utc) - Add TestA2AMTLSServerAndClient (9 tests): routing GET/POST, 404, context-manager stop, rogue-cert rejection, A2AMTLSClient, concurrency - Total: 11 → 20 passing tests Refs #806	2026-04-21 15:21:10 -04:00
Alexander Whitestone	12b5d9a7fd	refactor: remove redundant vision_model guard in browser_vision All checks were successful Lint / lint (pull_request) Successful in 10s Details _get_vision_model() now always returns a non-empty string (Gemma 4 default or configured override), so the `if vision_model:` conditional guard is unnecessary. Replace with unconditional assignment and add a debug log line showing which model was selected. Refs #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 14:09:40 -04:00
Alexander Whitestone	91faf6f956	feat: A2A auth — mutual TLS between fleet agents All checks were successful Lint / lint (pull_request) Successful in 10s Details Implements mutual TLS for secure agent-to-agent communication (#806). - scripts/gen_fleet_ca.sh: generate fleet CA (4096-bit RSA, 10-year) - scripts/gen_agent_cert.sh: per-agent cert signed by fleet CA (timmy, allegro, ezra) - agent/a2a_mtls.py: A2AServer requiring client cert verification (CERT_REQUIRED), build_server_ssl_context / build_client_ssl_context helpers, server_from_env() - ansible/roles/fleet_mtls_certs/: distribute CA + per-agent certs to fleet nodes, write /etc/hermes/a2a.env, notify hermes-a2a service on change - ansible/fleet_mtls.yml + ansible/inventory/fleet.ini.example: playbook + example inventory - tests/agent/test_a2a_mtls.py: 11 tests — authorized agent accepted (200/202), self-signed cert rejected, no-cert rejected, lifecycle, env-var wiring Fixes #806 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 13:28:28 -04:00
Alexander Whitestone	b6398b8b0d	feat: wire Gemma 4 vision into browser_tool for screenshot analysis All checks were successful Lint / lint (pull_request) Successful in 19s Details Default browser screenshot analysis now uses Gemma 4 27B (google/gemma-4-27b-it) instead of deferring to the auxiliary router's auto-detection. Gemma 4 is natively multimodal — the same model family already in use for text tasks — which avoids cold-start model-switching overhead and improves context continuity. Resolution order for _get_vision_model(): 1. BROWSER_VISION_MODEL env var (browser-specific override) 2. auxiliary.browser_vision.model in config.yaml 3. AUXILIARY_VISION_MODEL env var (shared/legacy override) 4. google/gemma-4-27b-it (new default) - Add _BROWSER_VISION_DEFAULT_MODEL constant to browser_tool.py - Document auxiliary.browser_vision config key in cli-config.yaml.example - Add 10 unit tests covering all resolution steps Fixes #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 12:49:46 -04:00
Claude (Opus 4.6)	a2a40429bd	Merge pull request '[claude] Poka-yoke: auto-revert incomplete skill edits (#923 )' (#946 ) from claude/issue-923 into main All checks were successful Lint / lint (push) Successful in 10s Details	2026-04-21 16:38:24 +00:00
Alexander Whitestone	ee61c5fa9d	Merge pull request 'feat: Add queue health check script' (#912 ) from feat/queue-health-check into main All checks were successful Lint / lint (push) Successful in 34s Details	2026-04-21 15:37:59 +00:00
Alexander Whitestone	1fece10569	feat: poka-yoke auto-revert for incomplete skill edits (#923 ) All checks were successful Lint / lint (pull_request) Successful in 32s Details Implement a transactional write-validate-commit-or-rollback pattern for all skill_manage write operations (edit, patch, write_file): - _backup_skill_file: timestamped .bak.{ts} snapshot before every write - _validate_written_file: re-reads from disk after write to catch truncation, encoding errors, and broken YAML frontmatter - _revert_from_backup: restores original content (or removes the corrupted file) on any validation failure - _cleanup_old_backups: prunes to MAX_BACKUPS_PER_FILE (3) after success; failed edits keep their .bak file as a debugging aid Also fixes pre-existing issue where _patch_skill error returns lacked a `suggestion` field expected by test_skill_manager_error_context.py tests. Adds 21 tests in test_skill_manager_autorevert.py covering every component and an end-to-end simulation of mid-write failure + auto-revert. Fixes #923 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 11:37:55 -04:00
Alexander Whitestone	46668505bc	Merge pull request 'feat: tool fixation detection — break repetitive loops (#886 )' (#914 ) from fix/886 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-21 15:35:08 +00:00
Alexander Whitestone	cac0c8224e	Merge pull request 'fix: circuit breaker for error cascading (2.33x amplification)' (#927 ) from fix/885-circuit-breaker into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-21 15:35:04 +00:00
Alexander Whitestone	f38a64455d	Merge pull request '[claude] Gateway config debt: add validation tests and API_SERVER_KEY warning (#892 )' (#915 ) from claude/issue-892 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-21 15:33:19 +00:00
Alexander Whitestone	1b35a5a0d2	Merge pull request 'feat: Poka-yoke — hardcoded path guard (#921 )' (#928 ) from fix/921-hardcoded-path-guard into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-21 15:33:14 +00:00
Alexander Whitestone	9172131b25	Merge pull request 'docs: tool investigation report from awesome-ai-tools (#926 )' (#931 ) from fix/926 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-21 15:33:12 +00:00
Alexander Whitestone	407eab3331	Merge pull request 'feat: session deterministic seeding & marathon limits' (#919 ) from feat/session-management-1776700585635 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-21 15:29:44 +00:00
Alexander Whitestone	cf090a966d	Merge pull request 'fix: Poka-yoke — detect and block tool hallucination before API calls (#922 )' (#935 ) from fix/922 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-21 15:29:35 +00:00
Alexander Whitestone	b65be9b12c	Merge pull request '[claude] Add tool investigation report: top 5 awesome-ai-tools recommendations (#926 )' (#936 ) from claude/issue-926 into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-21 15:29:32 +00:00
Alexander Whitestone	3c1cff255e	Merge pull request 'ci: integrate hardcoded path linter into CI workflow' (#938 ) from fix/865-ci-path-linter into main Some checks failed Lint / lint (push) Has been cancelled Details	2026-04-21 15:29:30 +00:00
Alexander Whitestone	690d100afc	Merge pull request 'feat: Poka-yoke token budget — progressive context overflow guard (#925 )' (#943 ) from burn/925-1776770102 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been skipped Details Nix / nix (ubuntu-latest) (push) Failing after 5s Details Tests / e2e (push) Successful in 5m8s Details Tests / test (push) Failing after 30m13s Details Nix / nix (macos-latest) (push) Has been cancelled Details	2026-04-21 15:29:02 +00:00
Alexander Whitestone	c6f0831738	Merge pull request 'feat: Python syntax validation before execute_code (#913 )' (#917 ) from fix/913-syntax-validation into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:27:05 +00:00
Alexander Whitestone	30773ac1f9	Merge pull request 'fix: Path validation before read_file — poka-yoke (#887 )' (#911 ) from fix/887-path-validation-read-file into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:26:55 +00:00
Alexander Whitestone	feb24bd08c	Merge pull request 'feat: Block silent credential exposure in tool outputs (#839 )' (#910 ) from fix/839-1776403070 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:26:47 +00:00
Alexander Whitestone	bc55f40505	Merge pull request 'feat: time-aware model routing for cron jobs (#889 )' (#909 ) from fix/889 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:26:43 +00:00
Alexander Whitestone	2adc72335e	Merge pull request 'fix: profile session isolation — tag and filter by profile' (#907 ) from fix/891-profile-isolation into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:26:39 +00:00
Alexander Whitestone	ab32670464	Merge pull request 'feat: Poka-yoke — detect and block tool hallucination before API calls (#922 )' (#944 ) from burn/922-1776770102 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:23:56 +00:00
Alexander Whitestone	bfc0231297	Merge pull request 'docs: holographic + vector hybrid memory architecture (#879 )' (#942 ) from fix/879 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:23:49 +00:00
Alexander Whitestone	cf2b09cf2f	Merge pull request 'docs: emotional presence patterns for crisis support (#880 )' (#941 ) from fix/880 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:23:45 +00:00
Alexander Whitestone	719bb537c0	Merge pull request 'feat: provider preflight validation before session start (#924 )' (#932 ) from fix/924 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:23:02 +00:00
Alexander Whitestone	0bcbcf19ac	Merge pull request 'feat: time-aware model routing for cron jobs #889 ' (#906 ) from fix/time-aware-routing-889 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:22:37 +00:00
Alexander Whitestone	27d2f2ca0e	Merge pull request 'feat: Prevent context window overflow via proactive token counting (#838 )' (#905 ) from fix/838-1776402240 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-04-21 15:22:31 +00:00
Alexander Whitestone	7e7dcfa345	Merge pull request 'fix: Gateway config validation and fallback fixes (#892 )' (#904 ) from fix/892 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-04-21 15:22:22 +00:00
Alexander Whitestone	ba0e614446	Merge pull request 'feat: integrate 988 Suicide & Crisis Lifeline — automatic crisis escalation (#673 )' (#903 ) from feat/673 into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details	2026-04-21 15:22:17 +00:00
Alexander Whitestone	4f5e641c92	Merge pull request 'fix: kill 9 dead cron jobs — audit and cleanup script' (#902 ) from fix/890-dead-cron-jobs into main Some checks failed Docker Build and Publish / build-and-push (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details Tests / test (push) Has been cancelled Details	2026-04-21 15:22:15 +00:00
Alexander Whitestone	d61bd141f9	feat: add poka-yoke validation to non-execute_code dispatch (#922 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Failing after 32s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s Details Tests / e2e (pull_request) Successful in 3m5s Details Tests / test (pull_request) Failing after 36m26s Details	2026-04-21 12:01:57 +00:00
Alexander Whitestone	a4058af238	feat: wire poka-yoke validation into tool dispatch (#922 )	2026-04-21 12:00:20 +00:00
Alexander Whitestone	08432a5618	test: poka-yoke validation tests (#922 )	2026-04-21 11:59:26 +00:00
Alexander Whitestone	a875c6ed91	feat: poka-yoke tool call validation firewall (#922 )	2026-04-21 11:59:25 +00:00
Alexander Whitestone	07c5b5b83d	test: add token budget poka-yoke tests (#925 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 44s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s Details Tests / test (pull_request) Failing after 25m21s Details Tests / e2e (pull_request) Successful in 3m18s Details	2026-04-21 11:41:39 +00:00
Alexander Whitestone	ba56567631	docs: holographic + vector hybrid memory architecture (#879 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s Details Tests / test (pull_request) Failing after 14m3s Details Tests / e2e (pull_request) Successful in 1m53s Details	2026-04-21 11:41:31 +00:00
Alexander Whitestone	8ac26f54a5	feat: token budget with progressive poka-yoke thresholds (#925 )	2026-04-21 11:40:39 +00:00
Alexander Whitestone	b807972d05	docs: emotional presence patterns for crisis support (#880 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 52s Details Tests / e2e (pull_request) Successful in 3m53s Details Tests / test (pull_request) Failing after 53m34s Details	2026-04-21 11:37:57 +00:00
Alexander Whitestone	6b5a6db668	ci: add Gitea Actions lint workflow All checks were successful Lint / lint (pull_request) Successful in 15s Details Part of #865. Runs hardcoded path linter on every push/PR.	2026-04-21 11:37:33 +00:00
Alexander Whitestone	b702249c12	ci: add hardcoded path linter to CI workflow Closes #865 Runs scripts/lint_hardcoded_paths.py as a CI check. Uses continue-on-error for now since the linter may have false positives.	2026-04-21 11:37:31 +00:00
Alexander Whitestone	8023c9b8f2	docs: add tool investigation report for top 5 awesome-ai-tools recommendations Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 55s Details Tests / e2e (pull_request) Successful in 3m56s Details Tests / test (pull_request) Failing after 54m0s Details Persists the research report from issue #926 as a markdown file following the existing convention of research_*.md files in the repo. Documents the top 5 tool recommendations (LiteLLM, Mem0, RAGFlow, LiteRT-LM, Claude-Mem) with integration effort, impact scores, and phased implementation plan. Refs #926	2026-04-21 07:26:44 -04:00
TERRA	9edd5383e7	feat: add hermes web console cockpit and browser self-healing (#394 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 36s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 31s Details Tests / e2e (pull_request) Successful in 3m37s Details Tests / test (pull_request) Failing after 38m26s Details	2026-04-21 02:00:41 -04:00
TERRA	f6c072f136	wip: add web console cockpit regression tests for #394	2026-04-21 02:00:41 -04:00
Alexander Whitestone	6eeee39c10	test(#922 ): Add tests for tool hallucination detection Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 1m15s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 1m8s Details Tests / e2e (pull_request) Successful in 3m44s Details Tests / test (pull_request) Failing after 1h9m15s Details Tests for validation firewall: - Unknown tool detection - Missing required params - Wrong type detection - Hallucination patterns - Rejection stats Refs #922	2026-04-21 05:38:54 +00:00
Alexander Whitestone	b2d2d2c650	fix(#922 ): Poka-yoke — detect and block tool hallucination Validation firewall between LLM tool-call output and execution: 1. Unknown tool names rejected 2. Malformed parameters caught 3. Missing required arguments detected 4. Hallucination patterns detected All rejections logged with model provenance. Agent receives rejection as tool result for self-correction. Resolves #922	2026-04-21 05:38:22 +00:00
Alexander Whitestone	5b62bb8d81	feat(#394 ): Hermes web UI operator cockpit Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Failing after 43s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 1m9s Details Tests / e2e (pull_request) Successful in 6m9s Details Tests / test (pull_request) Failing after 1h3m4s Details Minimal web interface for Hermes operation: - Chat interface with streaming - System status monitoring - Crisis detection display - Session management - Dark theme, responsive design Source-backed: Hermes Atlas pattern. Refs #394	2026-04-21 05:34:22 +00:00
Alexander Whitestone	10f9fd690a	feat(#394 ): Self-healing browser CDP layer (browser-harness) Source-backed browser automation: - CDP connection with auto-reconnect - Self-healing on disconnects - Screenshot, DOM inspection, JS evaluation - Click, type, navigate primitives - Session persistence Refs #394	2026-04-21 05:33:32 +00:00
Alexander Whitestone	bdd0f2709b	feat: provider preflight validation before session start (#924 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 47s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 52s Details Tests / test (pull_request) Failing after 30m48s Details Tests / e2e (pull_request) Successful in 2m9s Details	2026-04-21 04:48:57 +00:00
Alexander Whitestone	a9cbf7d69f	docs: tool investigation report from awesome-ai-tools (#926 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 36s Details Tests / e2e (pull_request) Successful in 2m56s Details Tests / test (pull_request) Failing after 34m20s Details	2026-04-21 04:45:03 +00:00
Google AI Agent	b64f4d9632	feat: update run_agent.py for deep dive security Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 28s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Nix / nix (ubuntu-latest) (pull_request) Failing after 4s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 35s Details Tests / test (pull_request) Failing after 1h0m5s Details Tests / e2e (pull_request) Failing after 6m56s Details Nix / nix (macos-latest) (pull_request) Has been cancelled Details	2026-04-21 00:41:55 +00:00
Google AI Agent	7caaf49a34	feat: deep dive integration of tests/test_shield_multilingual.py	2026-04-21 00:41:53 +00:00
Google AI Agent	e52f6d2cde	feat: deep dive integration of tools/shield/detector.py	2026-04-21 00:41:52 +00:00
Google AI Agent	000d64deed	feat: deep dive integration of agent/input_sanitizer.py	2026-04-21 00:41:50 +00:00
Google AI Agent	d527cb569b	feat: deep dive integration of agent/shield.py	2026-04-21 00:41:49 +00:00
Google AI Agent	44ada06fd4	feat: update agent/privacy_filter.py for deep dive security	2026-04-21 00:41:48 +00:00
Alexander Whitestone	4cdda8701d	feat: integrate hardcoded path guard into tool dispatch Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Failing after 32s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s Details Tests / e2e (pull_request) Successful in 2m56s Details Tests / test (pull_request) Failing after 1h1m7s Details	2026-04-21 00:31:01 +00:00
Alexander Whitestone	a80d30b342	feat: add pre-commit hook for hardcoded path detection	2026-04-21 00:29:33 +00:00
Alexander Whitestone	f098cf8c4a	feat: add hardcoded path guard module (#921 ) - Detects /Users/, /home/, ~/ in tool arguments - Source code scanner for CI/pre-commit - Runtime guard for tool dispatch - noqa: hardcoded-path-ok escape hatch Closes #921	2026-04-21 00:29:12 +00:00
Alexander Whitestone	30509b9c7c	test: circuit breaker tests Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Failing after 38s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 40s Details Tests / e2e (pull_request) Successful in 1m36s Details Tests / test (pull_request) Failing after 17m13s Details Part of #885	2026-04-21 00:28:15 +00:00
Alexander Whitestone	ccaa1cb021	feat: circuit breaker for error cascading Closes #885 2.33x error cascade factor detected. After 3 consecutive errors, circuit opens and agent must take corrective action. Recovery pattern: terminal is the safety net (2300 recoveries).	2026-04-21 00:28:14 +00:00
Alexander Whitestone	c6f2855745	fix: restore _format_error helper for test compatibility (#916 ) Some checks failed Docker Build and Publish / build-and-push (push) Has been skipped Details Nix / nix (ubuntu-latest) (push) Failing after 2s Details Tests / e2e (push) Successful in 2m47s Details Tests / test (push) Failing after 27m41s Details Build Skills Index / build-index (push) Has been skipped Details Build Skills Index / deploy-with-index (push) Has been skipped Details Nix / nix (macos-latest) (push) Has been cancelled Details fix: restore _format_error helper for test compatibility (#916)	2026-04-20 23:56:27 +00:00
Google AI Agent	9d180f31cc	feat: add session templates Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 43s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s Details Tests / test (pull_request) Failing after 45m24s Details Tests / e2e (pull_request) Failing after 7m35s Details	2026-04-20 15:56:26 +00:00
Google AI Agent	3d8cf5122a	feat: add agent/shield.py for SHIELD defense Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 31s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 40s Details Tests / e2e (pull_request) Successful in 2m2s Details Tests / test (pull_request) Failing after 52m0s Details	2026-04-20 15:54:48 +00:00
Google AI Agent	790b677978	feat: add tests/test_shield_multilingual.py for SHIELD defense	2026-04-20 15:54:46 +00:00
Google AI Agent	9a749d2854	feat: add agent/input_sanitizer.py for SHIELD defense	2026-04-20 15:54:45 +00:00
Google AI Agent	68534e78be	feat: add tools/shield/detector.py for SHIELD defense	2026-04-20 15:54:43 +00:00
Alexander Whitestone	c17f64fa2c	test: add syntax validation tests (#913 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Failing after 41s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 29s Details Tests / e2e (pull_request) Successful in 2m2s Details Tests / test (pull_request) Failing after 1h14m43s Details	2026-04-20 15:47:35 +00:00
Alexander Whitestone	bc7ffc2166	feat: Python syntax validation before execute_code (#913 )	2026-04-20 15:46:23 +00:00
Alexander Whitestone	c22cdcaa8e	fix: add _validate_gateway_config tests and API_SERVER_KEY network binding warning Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Failing after 23s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 27s Details Tests / e2e (pull_request) Successful in 1m51s Details Tests / test (pull_request) Failing after 37m0s Details Refs #892 - Gateway config debt: missing keys and broken fallbacks Changes: - Add `_is_network_accessible()` helper to gateway/config.py (avoids circular import with gateway.platforms.base which imports from gateway.config) - Add API_SERVER_KEY warning in `_validate_gateway_config`: when the API server is enabled on a network-accessible address (0.0.0.0, public IP, hostname) but no key is configured, log a warning at config-load time so operators see the issue before any adapter initialisation runs - Add `TestValidateGatewayConfig` in tests/gateway/test_config.py covering: - idle_minutes <= 0 and None are corrected to 1440 (default) - at_hour outside 0-23 is corrected to 4 (default) - Boundary hours 0 and 23 are accepted unchanged - Empty platform token triggers a warning log - Disabled platform with empty token produces no warning - API server on 0.0.0.0 without key logs a warning - API server on 127.0.0.1 without key is silent (loopback is allowed) - API server with a key set logs no warning regardless of bind address Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 02:18:02 -04:00
Alexander Whitestone	ab968e910c	feat: tool fixation detection — break repetitive loops (#886 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 37s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 43s Details Tests / e2e (pull_request) Successful in 1m57s Details Tests / test (pull_request) Failing after 18m57s Details Marathon sessions show tool fixation: agent latches onto one tool and calls it repeatedly. Observed streaks of 8-25 identical calls. New agent/tool_fixation_detector.py: - ToolFixationDetector: tracks consecutive tool calls - record(tool_name): returns nudge prompt when threshold reached - Default threshold: 5 consecutive calls (configurable via TOOL_FIXATION_THRESHOLD env var) - Nudge prompt explains the fixation and suggests alternatives: 1. Read error carefully 2. Try different tool 3. Ask user for clarification 4. Check if task is complete - get_streak_info(): current streak state - format_report(): human-readable fixation events - Singleton via get_fixation_detector() Config: - TOOL_FIXATION_THRESHOLD (default: 5) - TOOL_FIXATION_WINDOW (default: 10) Tests: tests/test_tool_fixation_detector.py (9 tests) Closes #886	2026-04-17 01:57:37 -04:00
Alexander Whitestone	73984ca72f	feat: Add queue health check script Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Failing after 29s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 31s Details Tests / e2e (pull_request) Successful in 2m13s Details Tests / test (pull_request) Failing after 28m10s Details	2026-04-17 01:26:07 -04:00
Alexander Whitestone	436c800def	fix: add path validation before read_file (#887 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 35s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 38s Details Tests / e2e (pull_request) Successful in 1m58s Details Tests / test (pull_request) Failing after 42m6s Details - Check if file exists before attempting read - Return clear error with suggestions for similar files - Suggest using search_files to find correct path - Eliminates 83.7% of read_file errors (file not found) Closes #887	2026-04-17 05:24:52 +00:00
Alexander Whitestone	cb331da4f1	test: Add credential redaction tests (#839 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Failing after 49s Details Tests / e2e (pull_request) Successful in 2m50s Details Tests / test (pull_request) Failing after 11m50s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 47s Details	2026-04-17 05:23:48 +00:00
Alexander Whitestone	fa892bfcb9	feat: Add credential redaction for tool outputs (#839 )	2026-04-17 05:21:25 +00:00
Alexander Whitestone	0b72884750	feat: time-aware model routing for cron jobs (#889 ) Some checks failed Tests / test (pull_request) Failing after 25m4s Details Tests / e2e (pull_request) Successful in 3m19s Details Contributor Attribution Check / check-attribution (pull_request) Failing after 14s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 14s Details Error rate peaks at 18:00 (9.4%) during evening cron batches vs 4.0% at 09:00 during interactive work. Route cron tasks to stronger models during off-hours when user is not present to correct errors. New agent/time_aware_routing.py: - resolve_time_aware_model(): routes based on hour, error rate, task type - Interactive sessions: always use base model (user corrects errors) - Cron during business hours: use base model (low error rate) - Cron during off-hours with high error rate (>6%): upgrade to strong model - get_hour_error_rate(): error rates by hour from empirical audit - is_off_hours(): 18:00-05:59 = off-hours - RoutingDecision: model, provider, reason, hour, error_rate - get_routing_report(): 24h forecast of routing decisions Config via env vars: - CRON_STRONG_MODEL (default: xiaomi/mimo-v2-pro) - CRON_CHEAP_MODEL (default: qwen2.5:7b) - CRON_ERROR_THRESHOLD (default: 6.0%) Tests: tests/test_time_aware_routing.py (9 tests) Closes #889	2026-04-17 01:15:09 -04:00
Alexander Whitestone	a0ed1e6ff2	test: profile isolation tests Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 15s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Tests / test (pull_request) Failing after 18m33s Details Tests / e2e (pull_request) Successful in 1m17s Details Part of #891	2026-04-17 05:13:03 +00:00
Alexander Whitestone	b5ba272efe	feat: profile session isolation Closes #891 Tags sessions with originating profile and provides filtered access so profiles cannot see each other's data.	2026-04-17 05:13:01 +00:00
Alexander Whitestone	2e0dfe27df	feat: time-aware model routing for cron jobs #889 Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 15s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 14s Details Tests / test (pull_request) Failing after 18m19s Details Tests / e2e (pull_request) Successful in 1m17s Details	2026-04-17 05:10:34 +00:00
Alexander Whitestone	d4cdfdc604	test: Add context budget tracker tests (#838 ) Some checks failed Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19s Details Contributor Attribution Check / check-attribution (pull_request) Failing after 16s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Tests / test (pull_request) Failing after 18m30s Details Tests / e2e (pull_request) Successful in 1m16s Details	2026-04-17 05:06:54 +00:00
Alexander Whitestone	e3436e36c3	feat: Add context budget tracker for overflow prevention (#838 )	2026-04-17 05:06:08 +00:00
Timmy Time	34e7de6a4c	feat: 988 Lifeline tests (#673 ) Some checks failed Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 18s Details Contributor Attribution Check / check-attribution (pull_request) Failing after 17s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Tests / test (pull_request) Failing after 18m18s Details Tests / e2e (pull_request) Successful in 1m13s Details	2026-04-17 05:04:50 +00:00
Timmy Time	dbabe0e6ae	feat: 988 Suicide & Crisis Lifeline integration (#673 ) agent/crisis_resources.py provides all 988 Lifeline contact methods: phone (988), text (HOME to 988), chat, Spanish line. Also Crisis Text Line (741741) and 911. Closes #673	2026-04-17 05:04:48 +00:00
Alexander Whitestone	517e2c571e	fix(#892 ): Gateway config validation and fallback fixes Some checks failed Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 16s Details Tests / test (pull_request) Failing after 18m29s Details Tests / e2e (pull_request) Successful in 1m20s Details Contributor Attribution Check / check-attribution (pull_request) Failing after 16s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Config validator and fallback fixes: - Validate required keys (OPENROUTER_API_KEY, API_SERVER_KEY) - Fix idle_minutes validation (>0 required) - Fix Discord skill limit (reduce to 95 max) - Validate provider configs - Apply sensible defaults Resolves #892	2026-04-17 05:04:11 +00:00
Alexander Whitestone	0b019327a3	docs: cron audit documentation Some checks failed Tests / e2e (pull_request) Successful in 49s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s Details Contributor Attribution Check / check-attribution (pull_request) Failing after 14s Details Tests / test (pull_request) Failing after 18m24s Details Part of #890	2026-04-17 05:00:09 +00:00
Alexander Whitestone	6b0fca6944	feat: cron job audit and cleanup script Closes #890 Finds dead cron jobs (zero completions, stale) and provides --disable and --delete actions to clean them up.	2026-04-17 05:00:06 +00:00
Alexander Whitestone	05f8c2d188	Merge PR #899 Merged PR #899: feat: Allegro worker deliverables	2026-04-17 01:52:11 +00:00
Hermes Agent	ff2ce95ade	feat(research): Allegro worker deliverables — fleet research reports + skill manager test Some checks failed Tests / e2e (pull_request) Successful in 1m39s Details Tests / test (pull_request) Failing after 1h7m45s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Successful in 24s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 28s Details Research reports: - Vector DB research - Workflow orchestration research - Fleet knowledge graph SOTA research - LLM inference optimization - Local model crisis quality - Memory systems SOTA - Multi-agent coordination - R5 vs E2E gap analysis - Text-to-music-video Test: - test_skill_manager_error_context.py [Allegro] Forge workers — 2026-04-16	2026-04-16 15:04:28 +00:00
Hermes Merge Bot	aedebfdf58	Merge PR #848	2026-04-16 02:12:13 -04:00
Hermes Merge Bot	adf49b1809	Merge PR #849	2026-04-16 02:11:21 -04:00
Hermes Merge Bot	52ea3a8935	Merge PR #850	2026-04-16 02:09:00 -04:00
Hermes Merge Bot	43246d6cb4	Merge PR #852	2026-04-16 02:08:06 -04:00
Hermes Merge Bot	20c5e237a7	Merge PR #861	2026-04-16 02:06:36 -04:00
Hermes Merge Bot	a0f4d10a7f	Merge PR #855	2026-04-16 02:06:17 -04:00
Hermes Merge Bot	bc5d1cf6ff	Merge PR #863	2026-04-16 02:05:44 -04:00
Hermes Merge Bot	dff451081d	Merge PR #856	2026-04-16 02:05:42 -04:00
Hermes Merge Bot	5509b157c5	Merge PR #864	2026-04-16 02:05:05 -04:00
Hermes Merge Bot	fcc322fb81	Merge PR #867	2026-04-16 02:03:23 -04:00
Hermes Merge Bot	9bba9ecc40	Merge PR #866	2026-04-16 02:02:43 -04:00
Hermes Merge Bot	05086e58ea	Merge PR #871	2026-04-16 02:00:55 -04:00
Hermes Merge Bot	7af6889767	Merge PR #869	2026-04-16 02:00:49 -04:00
Alexander Whitestone	5022db9d7b	Merge pull request 'feat: self-modifying agent that improves its own prompts (#813 )' (#897 ) from fix/813 into main	2026-04-16 05:29:11 +00:00
Alexander Whitestone	0f61474b74	Merge pull request 'feat: MCP server — expose hermes tools to fleet peers (#803 )' (#896 ) from fix/803 into main Auto-merged PR #896: feat: MCP server — expose hermes tools to fleet peers (#803)	2026-04-16 05:24:27 +00:00
Alexander Whitestone	a528bd5b1b	fix: use .get() for env_vars key in _show_tool_availability_warnings Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 24s Details Tests / test (pull_request) Failing after 1h2m1s Details Tests / e2e (pull_request) Successful in 1m38s Details Fixes KeyError: 'missing_vars' crash on CLI startup when toolsets are unavailable. registry.py returns dicts with 'env_vars' key, but _show_tool_availability_warnings() was accessing 'missing_vars' directly. Now uses .get("env_vars") or .get("missing_vars") to handle both key names, consistent with how doctor.py already handles this. Fixes #834 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 01:23:48 -04:00
Alexander Whitestone	e63cdaf16f	feat: self-modifying agent that improves its own prompts (#813 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been cancelled Details Contributor Attribution Check / check-attribution (pull_request) Has been cancelled Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Has been cancelled Details Tests / test (pull_request) Has been cancelled Details Tests / e2e (pull_request) Has been cancelled Details Resolves #813. Agent analyzes session transcripts for failure patterns and generates prompt patches to prevent future failures. agent/self_modify.py (PromptLearner class): - analyze_session(): detects 5 failure types from transcripts: retry_loop, timeout, hallucination, context_loss, tool_failure - generate_patches(): converts patterns to prompt patches with confidence scoring (frequency-based) - apply_patches(): appends learned rules to system prompt with backup and rollback support - learn_from_session(): full cycle analyze → patch → apply Failures → patterns → patches → improved prompts → fewer failures. Safety: patches only ADD rules (append-only), never remove. Rollback: restores from timestamped backup.	2026-04-16 01:23:48 -04:00
Alexander Whitestone	2b7b12baf9	feat: MCP server — expose hermes tools to fleet peers (#803 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 44s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Tests / test (pull_request) Has been cancelled Details Tests / e2e (pull_request) Has been cancelled Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19m48s Details Resolves #803. Standalone MCP server that exposes safe hermes tools to other fleet agents. scripts/mcp_server.py: - Exposes: terminal, file_read, file_search, web_search, session_search - Blocks: approval, delegate, memory, config, cron, send_message - Terminal uses approval.py dangerous command detection - Auth via Bearer token (MCP_AUTH_KEY) - HTTP endpoints: GET /mcp/tools, POST /mcp/tools/call, GET /health Usage: python scripts/mcp_server.py --port 8081 --auth-key SECRET curl http://localhost:8081/mcp/tools curl -X POST http://localhost:8081/mcp/tools/call -d {"name":"file_read","arguments":{"path":"README.md"}}	2026-04-16 01:10:00 -04:00
Alexander Whitestone	6b40c5db7a	fix: use env_vars key in _show_tool_availability_warnings to prevent KeyError registry.py:check_tool_availability() returns unavailable dicts with key "env_vars", but _show_tool_availability_warnings() in cli.py was accessing u["missing_vars"] causing a KeyError crashing CLI startup whenever any toolset was disabled. Fix matches how doctor.py already handles the same data. Fixes #834 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 00:42:03 -04:00
Google AI Agent	5a24894f78	fix: update hermes_cli/web_server.py for agent card discovery Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 43s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Nix / nix (ubuntu-latest) (pull_request) Failing after 5s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 38s Details Tests / test (pull_request) Failing after 10m58s Details Tests / e2e (pull_request) Successful in 1m32s Details Nix / nix (macos-latest) (pull_request) Has been cancelled Details	2026-04-16 03:45:04 +00:00
Google AI Agent	a474eb8459	fix: add agent/agent_card.py for agent card discovery	2026-04-16 03:45:01 +00:00
Alexander Whitestone	3238cf4eb1	feat: Tool investigation report + Mem0 local provider (#842 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 38s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s Details Tests / test (pull_request) Failing after 43m54s Details Tests / e2e (pull_request) Successful in 2m5s Details ## Investigation Report - docs/tool-investigation-2026-04-15.md: Full report analyzing 414 tools from awesome-ai-tools. Top 5 recommendations with integration paths. - docs/plans/awesome-ai-tools-integration.md: Implementation tracking plan. ## Mem0 Local Provider (P1) - plugins/memory/mem0_local/: New ChromaDB-backed memory provider. No API key required - fully sovereign. Compatible tool schemas with cloud Mem0 (mem0_profile, mem0_search, mem0_conclude). - Pattern-based fact extraction from conversations. - Deterministic dedup via content hashing. - Circuit breaker for resilience. - tests/plugins/memory/test_mem0_local.py: Full test coverage. ## Issues Filed - #857: LightRAG integration (P2) - #858: n8n workflow orchestration (P3) - #859: RAGFlow document understanding (P4) - #860: tensorzero LLMOps evaluation (P3) Closes #842	2026-04-15 23:04:41 -04:00
Timmy	eed87e454e	test: Benchmark Gemma 4 vision accuracy vs current approach (#817 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 26s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 26s Details Tests / e2e (pull_request) Successful in 2m38s Details Tests / test (pull_request) Failing after 47m49s Details Vision benchmark suite comparing Gemma 4 (google/gemma-4-27b-it) vs current Gemini 3 Flash Preview (google/gemini-3-flash-preview). Metrics: - OCR accuracy (character + word overlap) - Description completeness (keyword coverage) - Structural quality (length, sentences, numbers) - Latency (ms per image) - Token usage - Consistency across runs Features: - 24 diverse test images (screenshots, diagrams, photos, charts) - Category-specific evaluation prompts - Automated verdict with composite scoring - JSON + markdown report output - 28 unit tests passing Usage: python benchmarks/vision_benchmark.py --images benchmarks/test_images.json python benchmarks/vision_benchmark.py --url https://example.com/img.png python benchmarks/vision_benchmark.py --generate-dataset Closes #817.	2026-04-15 23:02:02 -04:00
Alexander Whitestone	f03709aa29	test: crisis hook integration tests with agent loop (#707 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 16s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s Details Tests / e2e (pull_request) Failing after 12m38s Details Tests / test (pull_request) Failing after 25m58s Details 10 integration tests verifying crisis detection works correctly when called from the agent conversation flow: - scan_user_message detects CRITICAL/HIGH/MEDIUM/LOW levels - Safe messages pass through without triggering - Tool handler returns valid JSON - Compassion injection includes 988 lifeline for CRITICAL/HIGH - Case insensitive detection - Empty/None text handled gracefully - False positive resistance on common non-crisis phrases - Config check returns bool - Callable from agent context (not just isolation tests)	2026-04-15 23:00:12 -04:00
Alexander Whitestone	4d8e004b5f	fix: extend JSON repair to remaining json.loads sites in run_agent.py Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 42s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Nix / nix (ubuntu-latest) (pull_request) Failing after 4s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 36s Details Tests / test (pull_request) Failing after 1h13m6s Details Tests / e2e (pull_request) Successful in 1m32s Details Nix / nix (macos-latest) (pull_request) Has been cancelled Details Adds `repair_and_load_json()` to utils.py using the `json_repair` library as a fallback when `json.loads()` fails. Replaces 8 non-hot-path json.loads sites identified in issue #809: - L2250: trajectory/sanitization message content parsing - L2500: tool_call dict reconstruction in trajectory conversion - L2535: tool_content parsing (JSON-like strings in tool responses) - L2888: session log file loading (with warning on unrecoverable parse) - L3119: todo content parsing in message processing - L5963: vision result_json parsing - L6761: memory flush tool call argument parsing - L8300: cache serialization tool call args normalization Each site uses an appropriate default ({} for tool args, None/continue for content parsing) and a context label for debug tracing. Fixes #809	2026-04-15 22:56:39 -04:00
PRIMA	85a654348a	feat: poka-yoke — prevent hardcoded ~/.hermes paths (closes #835 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 27s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19s Details Tests / e2e (pull_request) Successful in 1m55s Details Tests / test (pull_request) Failing after 56m41s Details scripts/lint_hardcoded_paths.py (new): - Scans Python files for hardcoded home-directory paths - Detects: Path.home()/.hermes without env fallback, /Users/<name>/, /home/<name>/ - Excludes: comments, docstrings, test files, skills, plugins, docs - Excludes correct patterns: profiles_parent, current_default, native_home - Supports --staged (git pre-commit), --fix (suggestions), --json output scripts/pre-commit-hardcoded-paths.sh (new): - Pre-commit hook that runs lint_hardcoded_paths.py --staged - Blocks commits containing hardcoded path violations tools/confirmation_daemon.py (fixed): - Replaced Path.home() / '.hermes' / 'approval_whitelist.json' with get_hermes_home() / 'approval_whitelist.json' - Added import of get_hermes_home from hermes_constants tests/test_hardcoded_paths.py (new): - 11 tests: detection, exclusion, fallback patterns, clean files	2026-04-15 22:56:32 -04:00
Alexander Whitestone	fc0d8fe5e9	fix: extend JSON repair to ALL remaining json.loads sites (#809 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Nix / nix (ubuntu-latest) (pull_request) Failing after 2s Details Contributor Attribution Check / check-attribution (pull_request) Successful in 26s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 26s Details Tests / e2e (pull_request) Successful in 2m50s Details Tests / test (pull_request) Failing after 1h17m49s Details Nix / nix (macos-latest) (pull_request) Has been cancelled Details	2026-04-16 02:53:41 +00:00
Alexander Whitestone	13ef670c05	feat: session compaction with fact extraction (#748 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 29s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 33s Details Tests / e2e (pull_request) Successful in 3m26s Details Tests / test (pull_request) Failing after 1h28m50s Details Before compressing conversation context, extract durable facts (user preferences, corrections, project details) and save to fact store so they survive compression. New agent/session_compactor.py: - extract_facts_from_messages(): scans user messages for preferences, corrections, project/infra facts using regex - 3 pattern categories: user_pref (5 patterns), correction (3 patterns), project (4 patterns) - ExtractedFact: category, entity, content, confidence, source_turn - save_facts_to_store(): saves to fact store (callback or auto-detect) - extract_and_save_facts(): one-call extraction + persistence - Deduplication by category+content - Skips tool results, short messages, system messages - format_facts_summary(): human-readable summary Tests: tests/test_session_compactor.py (9 tests) Closes #748	2026-04-15 22:41:54 -04:00
Alexander Whitestone	4752a0085e	fix: extend JSON repair to remaining json.loads sites in run_agent.py (#809 )	2026-04-16 02:40:51 +00:00
Alexander Whitestone	b26a6ec23b	feat: add repair_and_load_json() to utils.py (#809 )	2026-04-16 02:38:01 +00:00
Alexander Whitestone	9f0c410481	feat: batch tool execution with parallel safety checks (#749 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Successful in 35s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 37s Details Tests / e2e (pull_request) Successful in 1m48s Details Tests / test (pull_request) Failing after 36m13s Details Centralized safety classification for tool call batches: tools/batch_executor.py (new): - classify_tool_calls() — classifies batch into parallel_safe, path_scoped, sequential, never_parallel tiers - BatchExecutionPlan — structured plan with parallel and sequential batches - Path conflict detection — write_file + patch on same file go sequential - Destructive command detection — rm, mv, sed -i, redirects - execute_parallel_batch() — ThreadPoolExecutor for concurrent execution tools/registry.py (enhanced): - ToolEntry.parallel_safe field — tools can declare parallel safety - registry.register() accepts parallel_safe=True parameter - registry.get_parallel_safe_tools() — query registry-declared safe tools Safety tiers: - parallel_safe: read_file, web_search, search_files, etc. - path_scoped: write_file, patch (concurrent when paths don't overlap) - sequential: terminal, delegate_task, unknown tools - never_parallel: clarify (requires user interaction) 19 tests passing.	2026-04-15 22:17:16 -04:00
Alexander Whitestone	b34b5b293d	test: add tests for tool hallucination prevention (#836 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 24s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 22s Details Tests / e2e (pull_request) Successful in 3m6s Details Tests / test (pull_request) Failing after 41m24s Details	2026-04-16 02:15:59 +00:00
Alexander Whitestone	05f9d2b009	feat: integrate poka-yoke validation into tool dispatch (#836 ) - Added import for tool_pokayoke module - Added validation before orchestrator.dispatch calls - Auto-corrects tool names and parameters - Returns structured errors with suggestions - Circuit breaker for consecutive failures Closes #836	2026-04-16 02:15:17 +00:00
Timmy Time	fb7464995c	fix: Ultraplan Mode for daily autonomous planning (closes #840 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 37s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 39s Details Tests / test (pull_request) Failing after 1h15m33s Details Tests / e2e (pull_request) Successful in 2m20s Details	2026-04-15 22:14:16 -04:00
Alexander Whitestone	7c71b7e73a	test: parallel tool calling — 2+ tools per response (#798 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 45s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 1m16s Details Tests / e2e (pull_request) Successful in 3m17s Details Tests / test (pull_request) Failing after 1h30m54s Details	2026-04-16 02:13:00 +00:00
Alexander Whitestone	4a3068b3b5	test: add regression tests for issue #834 KeyError fix Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 39s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 44s Details Tests / e2e (pull_request) Successful in 2m53s Details Tests / test (pull_request) Failing after 1h28m32s Details	2026-04-16 02:12:36 +00:00
Alexander Whitestone	a8300ceb43	fix: KeyError 'missing_vars' in _show_tool_availability_warnings (#834 )	2026-04-16 02:11:08 +00:00
Alexander Whitestone	8ef766beac	feat: add tool hallucination prevention module (#836 ) - Validates tool names against registered tools - Auto-corrects parameter names within Levenshtein distance 1 - Circuit breaker for consecutive failures (threshold: 3) - Structured error messages with suggestions Closes #836	2026-04-16 02:10:39 +00:00
Alexander Whitestone	db72e908f7	Merge pull request 'feat(security): implement Vitalik's secure LLM patterns — privacy filter + confirmation daemon [resolves merge conflict]' (#830 ) from feat/vitalik-secure-llm-1776303263 into main Vitalik's secure LLM patterns — privacy filter + confirmation daemon Clean rebase of #397 onto current main. Resolves merge conflicts in tools/approval.py.	2026-04-16 01:36:58 +00:00
Alexander Whitestone	b82b760d5d	feat: add Vitalik's threat model patterns to DANGEROUS_PATTERNS Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 41s Details Contributor Attribution Check / check-attribution (pull_request) Successful in 51s Details Tests / e2e (pull_request) Successful in 5m21s Details Tests / test (pull_request) Failing after 45m7s Details	2026-04-16 01:35:49 +00:00
Alexander Whitestone	d8d7846897	feat: add tests/tools/test_confirmation_daemon.py from PR #397	2026-04-16 01:35:24 +00:00
Alexander Whitestone	6840d05554	feat: add tests/agent/test_privacy_filter.py from PR #397	2026-04-16 01:35:21 +00:00
Alexander Whitestone	8abe59ed95	feat: add tools/confirmation_daemon.py from PR #397	2026-04-16 01:35:18 +00:00
Alexander Whitestone	435d790201	feat: add agent/privacy_filter.py from PR #397	2026-04-16 01:35:14 +00:00
Alexander Whitestone	30afd529ac	feat: add crisis detection tool — the-door integration (#141 ) Some checks failed Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Contributor Attribution Check / check-attribution (pull_request) Successful in 44s Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 59s Details Tests / e2e (pull_request) Successful in 3m49s Details Tests / test (pull_request) Failing after 44m1s Details New tool: tools/crisis_tool.py - Wraps the-door's canonical crisis detection (detect.py) - Scans user messages for despair/suicidal ideation - Classifies into NONE/LOW/MEDIUM/HIGH/CRITICAL tiers - Provides recommended actions per tier - Gateway hook: scan_user_message() for pre-API-call detection - System prompt injection: compassion_injection based on crisis level - Optional escalation logging to crisis_escalations.jsonl - Optional bridge API POST for HIGH+ (configurable via CRISIS_BRIDGE_URL) - Configurable via crisis_detection: true/false in config.yaml - Follows the-door design principles: never computes life value, never suggests death, errs on side of higher risk Also: tests/test_crisis_tool.py (9 tests, all passing)	2026-04-15 21:00:06 -04:00
Alexander Whitestone	a244b157be	bench: add Gemma 4 vs mimo-v2-pro tool calling benchmark (#796 ) Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 42s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s Details Tests / e2e (pull_request) Successful in 2m26s Details Tests / test (pull_request) Failing after 44m7s Details 100-call regression test across 7 tool categories: - File operations (20): read_file, write_file, search_files - Terminal commands (20): shell execution - Web search (15): web_search - Code execution (15): execute_code - Browser automation (10): browser_navigate - Delegation (10): delegate_task - MCP tools (10): mcp_list/read/call Metrics tracked: - Schema parse success (valid JSON tool calls) - Tool name accuracy (correct tool selected) - Arguments accuracy (required args present) - Average latency per call Usage: python3 benchmarks/tool_call_benchmark.py --model nous:xiaomi/mimo-v2-pro python3 benchmarks/tool_call_benchmark.py --model ollama/gemma4:latest python3 benchmarks/tool_call_benchmark.py --compare	2026-04-15 18:56:35 -04:00
Timmy Time	d86359cbb2	Merge pull request 'feat: robust tool orchestration and circuit breaking' (#811 ) from feat/robust-tool-orchestration-1776268138150 into main	2026-04-15 16:03:07 +00:00
Google AI Agent	f264b55b29	refactor: use ToolOrchestrator for robust tool execution Some checks failed Contributor Attribution Check / check-attribution (pull_request) Successful in 36s Details Docker Build and Publish / build-and-push (pull_request) Has been skipped Details Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 38s Details Tests / e2e (pull_request) Successful in 2m37s Details Tests / test (pull_request) Failing after 40m19s Details	2026-04-15 15:49:02 +00:00
Google AI Agent	dfe23f66b1	feat: add ToolOrchestrator with circuit breaker	2026-04-15 15:49:00 +00:00

fix: restore /usage account limits in CLI + gateway (#958) #1016

186 Commits