fix: restore /usage account limits in CLI + gateway (#958) #1016

Open
Rockachopa wants to merge 186 commits from fix/958 into main

186 Commits

Author SHA1 Message Date
Alexander Whitestone
eab5635a7a fix: restore /usage account limits (#958)
All checks were successful
Lint / lint (pull_request) Successful in 9s
2026-04-22 10:36:49 -04:00
d574690abe Merge pull request 'feat: The Sovereign Accountant — Agent Telemetry' (#1009) from feat/sovereign-accountant-agent-1776866068545 into main
All checks were successful
Lint / lint (pull_request) Successful in 19s
Lint / lint (push) Successful in 28s
2026-04-22 13:55:16 +00:00
e208885de6 feat: wire telemetry hooks into auxiliary client
All checks were successful
Lint / lint (pull_request) Successful in 10s
2026-04-22 13:54:32 +00:00
cd84fa2084 feat: add telemetry logger for token accounting 2026-04-22 13:54:30 +00:00
63babca056 Merge pull request 'docs: poka-yoke integration phase 3 status (#967)' (#976) from fix/967 into main
All checks were successful
Lint / lint (push) Successful in 11s
2026-04-22 13:39:43 +00:00
cab3c82c5c Merge pull request '[claude] Add update/restart action buttons to web dashboard (#961)' (#968) from claude/issue-961 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 13:39:36 +00:00
64a8059f9f Merge pull request '[claude] Verify hardcoded-home path guard on burn/921 branch (#962)' (#964) from claude/issue-962 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 13:39:32 +00:00
90f6fdef60 Merge pull request 'feat: Autonomous Regression Sentry — verify_impact tool' (#970) from feat/impact-analysis-tool-1776826592325 into main
All checks were successful
Lint / lint (push) Successful in 11s
2026-04-22 13:38:47 +00:00
18e3533a0a Merge pull request 'feat: The Budgetary Sovereign Router — Efficiency Sauce' (#1008) from feat/budgetary-router-1776864510362 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 13:38:40 +00:00
60ccd825ec Merge pull request 'feat: The Sovereign Teleport — State Migration Sauce' (#1007) from feat/sovereign-teleport-1776864503956 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 13:38:36 +00:00
e7d5a7f2cf Merge pull request 'feat: The Scavenger Fixer — Closing the Autonomous Loop' (#975) from feat/autonomous-scavenger-fix-1776827712502 into main
All checks were successful
Lint / lint (push) Successful in 13s
2026-04-22 13:38:03 +00:00
9aaac192cf Merge pull request 'test(#798): Parallel tool calling — 2+ tools per response' (#988) from fix/798 into main
All checks were successful
Lint / lint (push) Successful in 9s
2026-04-22 13:36:37 +00:00
f3d88ec31d Merge pull request '[claude] Wire Gemma 4 vision into browser_tool for screenshot analysis (#816)' (#947) from claude/issue-816 into main
All checks were successful
Lint / lint (push) Successful in 13s
2026-04-22 13:36:20 +00:00
2f22570622 Merge pull request 'feat(web-console): Self-healing browser CDP + operator cockpit (#394)' (#934) from feat/web-console-394 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 13:36:14 +00:00
2022322606 Merge pull request 'feat: Deep Dive Security Integration - Multilayer Defense' (#929) from feat/security-deep-dive-1776732106631 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 13:36:08 +00:00
d6ec32fe93 Merge pull request 'feat: implement SHIELD Multilingual Defense & Input Sanitization' (#918) from feat/shield-multilingual-1776700482647 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 13:36:05 +00:00
2b284e75f6 Merge pull request 'feat: Multi-Agent Concurrency Guard — "Secret Sauce" for Fleet Scaling' (#969) from feat/fleet-concurrency-guard-1776826501792 into main
All checks were successful
Lint / lint (push) Successful in 16s
2026-04-22 13:29:01 +00:00
efa1fc034e feat: Budgetary Sovereign Router — Complexity-aware steering
All checks were successful
Lint / lint (pull_request) Successful in 25s
2026-04-22 13:28:31 +00:00
99d925d40b feat: Sovereign Teleport — Cross-environment agent migration
All checks were successful
Lint / lint (pull_request) Successful in 28s
2026-04-22 13:28:25 +00:00
Alexander Whitestone
ed250b1ca8 test(#798): Strengthen parallel tool calling tests + fix flaky concurrent tests
All checks were successful
Lint / lint (pull_request) Successful in 10s
- Add TestAIAgentConcurrentExecution with 8 integration tests exercising
  _execute_tool_calls_concurrent through AIAgent for 2/3/4-tool batches,
  pass-rate reporting, and Gemma 4-style read patterns.
- Fix test_malformed_json_args_forces_sequential: use JSON array '[1,2,3]'
  instead of unrepairable garbage now that repair_and_load_json handles
  most malformed input.
- Fix test_concurrent_handles_tool_error: replace racy call_count list
  with deterministic failure based on tool_call_id to eliminate flaky
  failures under ThreadPoolExecutor.

Closes #798
2026-04-22 01:34:24 -04:00
Alexander Whitestone
1f5067e94a Merge: bring in prior QA work on path guard (Refs #962)
All checks were successful
Lint / lint (pull_request) Successful in 15s
2026-04-22 00:25:50 -04:00
Alexander Whitestone
798ca3aa06 chore: sync with remote claude/issue-961 branch
All checks were successful
Lint / lint (pull_request) Successful in 22s
Refs #961

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 00:04:51 -04:00
Alexander Whitestone
5d3e13ede2 test: add pre-commit path guard hook from burn/921 (Refs #962)
All checks were successful
Lint / lint (pull_request) Successful in 24s
Brings hooks/pre-commit-path-guard.py from burn/921-poka-yoke-hardcoded-paths
to complete QA verification of all guard layers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 23:55:38 -04:00
82a076bf4d docs: poka-yoke integration phase 3 status (#967)
All checks were successful
Lint / lint (pull_request) Successful in 8s
2026-04-22 03:24:26 +00:00
16eab5d503 Merge pull request '[claude] A2A auth — mutual TLS between fleet agents (#806)' (#948) from claude/issue-806 into main
All checks were successful
Lint / lint (push) Successful in 13s
Merge PR #948: A2A auth — mutual TLS between fleet agents (#806)
2026-04-22 03:19:42 +00:00
81f7347bcb feat: Scavenger Fixer — Autonomous tech debt healing
All checks were successful
Lint / lint (pull_request) Successful in 22s
2026-04-22 03:15:17 +00:00
c7a2d439c1 Merge pull request 'feat: The Sovereign Scavenger — Automated Tech Debt Recovery' (#974) from feat/sovereign-scavenger-1776827259631 into main
All checks were successful
Lint / lint (push) Successful in 12s
2026-04-22 03:14:14 +00:00
8ad8520bd2 Merge pull request 'feat: Execution Safety Sentry — GOFAI Risk Analysis' (#973) from feat/static-analyzer-gofai-1776826921747 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 03:14:07 +00:00
9c7c88823f Merge pull request 'feat: Local Inference Story — Freeing the fleet from cloud dependency' (#972) from feat/local-inference-bridge-1776826896029 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 03:14:03 +00:00
aa45e02238 Merge pull request 'feat: GOFAI Semantic Sentry — Deterministic code verification' (#971) from feat/symbolic-verify-gofai-1776826842170 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-22 03:14:01 +00:00
3266c39e8e feat: Sovereign Scavenger — Turning tech debt into actionable backlog
All checks were successful
Lint / lint (pull_request) Successful in 18s
2026-04-22 03:07:40 +00:00
Alexander Whitestone
e8886f10c8 feat: add Update Hermes and Restart Gateway action buttons to web dashboard
All checks were successful
Lint / lint (pull_request) Successful in 10s
Implements the action button lifecycle described in #961:
- POST /api/actions/restart-gateway  — sends SIGTERM to the gateway PID
- POST /api/actions/update-hermes    — runs pip upgrade in a background job
- GET  /api/actions/jobs/{job_id}    — polls job status/output

Frontend (StatusPage.tsx):
- "Restart Gateway" button with spinning icon while running, then
  success/error message that clears after 5–8 s
- "Update Hermes" button that polls the job endpoint every 2 s;
  shows collapsible pip output on completion
- Page remains responsive (buttons disabled only during their own action)

Also adds i18n strings to en.ts, zh.ts, and the shared types.ts interface.

Fixes #961

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 23:04:10 -04:00
93a855d4e3 feat: Static Risk Analyzer (GOFAI) for execution safety
All checks were successful
Lint / lint (pull_request) Successful in 8s
2026-04-22 03:02:02 +00:00
5a0bdb556e feat: Local Inference Bridge — Bypassing cloud for local tasks
All checks were successful
Lint / lint (pull_request) Successful in 17s
2026-04-22 03:01:37 +00:00
d619d279f8 feat: Symbolic Sentry (GOFAI) for deterministic code audits
All checks were successful
Lint / lint (pull_request) Successful in 15s
2026-04-22 03:00:44 +00:00
d3b13a6aa5 feat: add verify_impact tool for regression guarding
All checks were successful
Lint / lint (pull_request) Successful in 16s
2026-04-22 02:56:33 +00:00
77d2430a44 feat: add Fleet-Wide File Concurrency Guard
All checks were successful
Lint / lint (pull_request) Successful in 19s
2026-04-22 02:55:04 +00:00
Alexander Whitestone
d2ce6b8749 test: verify action endpoints for restart-gateway and update-hermes
All checks were successful
Lint / lint (pull_request) Successful in 27s
Add TestActionEndpoints class to test_web_server.py covering:
- POST /api/actions/restart-gateway sends SIGUSR1 to gateway PID
- 409 when gateway is not running
- 500 when os.kill raises a signal error
- POST /api/actions/update-hermes returns ok=true on zero exit
- ok=false on non-zero exit code with stderr in detail
- ok=false on timeout
- Both endpoints reject unauthenticated requests

All 7 new tests pass (83 total in the file).

Refs #961
2026-04-21 22:41:27 -04:00
Alexander Whitestone
a8a086548d feat: add restart gateway and update Hermes action buttons to web dashboard
All checks were successful
Lint / lint (pull_request) Successful in 29s
Implements the update/restart action buttons called out in issue #961:

- Backend (web_server.py): two new POST endpoints
  - /api/actions/restart-gateway — sends SIGUSR1 to the running gateway PID
  - /api/actions/update-hermes  — runs `hermes update --yes` in a subprocess
- Frontend (api.ts): restartGateway() / updateHermes() API helpers + ActionResponse type
- UI (StatusPage.tsx): "Actions" card with Restart Gateway and Update Hermes buttons
  - idle → running (spinner) → success/failure states
  - feedback detail text; auto-resets to idle after 8 s
- i18n: new status.actions / restartGateway / updateHermes strings in en, zh, and types

Refs #961

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 22:30:22 -04:00
Alexander Whitestone
9e00a59791 test: verify hardcoded-home path guard from burn/921 branch
All checks were successful
Lint / lint (pull_request) Successful in 29s
Cherry-picks tools/path_guard.py and tests/test_path_guard.py from
burn/921-poka-yoke-hardcoded-paths (commit 5dcb905). All 21 tests pass:

- hardcoded /Users/<name>/ paths are rejected at runtime
- hardcoded /home/<name>/ paths are rejected at runtime
- ~/.hermes/... via expanduser() passes (safe, expanded at runtime)
- valid relative and /tmp/ absolute paths pass
- static scanner catches violations and respects # noqa: hardcoded-path-ok
- comments are skipped by scanner
- directory scanner skips test files and __pycache__

Refs #962

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 22:26:54 -04:00
Alexander Whitestone
9ef7682ee2 chore: merge remote claude/issue-816 — deduplicate gemma-4-27b-it in models.py
All checks were successful
Lint / lint (pull_request) Successful in 30s
Merged prior implementation (PR #947) and resolved conflicts.
Removed duplicate "gemma-4-27b-it" entry introduced during merge.

Refs #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:27:51 -04:00
Alexander Whitestone
e157a22639 feat: wire Gemma 4 vision into browser_tool for screenshot analysis
- Add `_BROWSER_VISION_DEFAULT_MODEL = "google/gemma-4-27b-it"` constant
- Rewrite `_get_vision_model()` with 4-tier resolution:
  1. BROWSER_VISION_MODEL env var (browser-specific override)
  2. auxiliary.browser_vision.model config key
  3. AUXILIARY_VISION_MODEL env var (backward compat)
  4. google/gemma-4-27b-it default (Gemma 4 native multimodal)
- Extract `_load_browser_vision_config()` helper for testability
- Always set call_kwargs["model"] (remove redundant `if vision_model` guard)
- Read timeout from auxiliary.browser_vision.timeout before auxiliary.vision.timeout
- Register gemma-4-27b-it in Gemini provider model catalog
- Document auxiliary.browser_vision section in cli-config.yaml.example
- Add 12 unit tests in tests/tools/test_browser_vision_model.py covering all
  resolution tiers, backward compat, error fallthrough, and type guarantees

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:26:03 -04:00
Alexander Whitestone
671283389c feat: Wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 8s
_get_vision_model() now resolves via a layered priority chain:
  1. BROWSER_VISION_MODEL env var (browser-specific override)
  2. config.yaml browser.vision_model
  3. AUXILIARY_VISION_MODEL env var (backward-compat shared override)
  4. google/gemma-4-27b-it — Gemma 4 native multimodal default

Add browser.vision_model config key to hermes_cli/config.py defaults
with inline documentation.

call_kwargs["model"] is now always set (model is never None), and a
debug log line records which model is in use for each screenshot.

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 20:51:04 -04:00
Alexander Whitestone
17cc4bac90 feat: complete Gemma 4 browser_vision wiring — task routing, timeout, tests
All checks were successful
Lint / lint (pull_request) Successful in 10s
Building on the Gemma 4 default already on this branch:

- Change call_llm() task from "vision" to "browser_vision" in browser_vision()
  so auxiliary.browser_vision.* config is consulted for provider/model/timeout
- Route call_llm(task="browser_vision") through the vision provider resolution
  path in auxiliary_client.py (same as task="vision")
- Fix timeout resolution: check auxiliary.browser_vision.timeout before
  auxiliary.vision.timeout (allows browser-specific timeout override)
- Add timeout option to auxiliary.browser_vision in cli-config.yaml.example
- Add test_browser_vision_gemma4.py covering: task routing assertions,
  call_llm() vision branch routing, and timeout config key ordering

Refs #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 19:43:42 -04:00
Alexander Whitestone
1843545d66 chore: merge remote branch — resolve conflicts, use canonical implementation
All checks were successful
Lint / lint (pull_request) Successful in 8s
Merge remote claude/issue-816 which contains the full Gemma 4 browser
vision implementation. Resolved conflicts by taking the remote's cleaner
variable names and docstrings while keeping the same 4-tier resolution
logic. All 12 tests pass.

Refs #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 18:50:22 -04:00
Alexander Whitestone
c643ac90da feat: wire Gemma 4 vision into browser_tool for screenshot analysis
- Add `_BROWSER_VISION_DEFAULT_MODEL = "google/gemma-4-27b-it"` constant
- Rewrite `_get_vision_model()` with 4-tier resolution:
    1. BROWSER_VISION_MODEL env var (browser-specific override)
    2. auxiliary.browser_vision.model config key
    3. AUXILIARY_VISION_MODEL env var (backward compat)
    4. Gemma 4 27B default
- Remove `if vision_model:` guard — function now always returns a string
- Update browser_vision tool description to surface Gemma 4 as default
- Register gemma-4-27b-it in Gemini provider model catalog (models.py)
- Document auxiliary.browser_vision.model in cli-config.yaml.example
- Add 14 unit tests covering all priority levels and backward compat

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 18:47:03 -04:00
Alexander Whitestone
da9c4cf10c feat: wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 7s
Extends `_get_vision_model()` with a 5-level resolution chain:
1. `BROWSER_VISION_MODEL` env var — browser-specific override
2. `auxiliary.browser.vision_model` config key — per-install default
3. `AUXILIARY_VISION_MODEL` env var — backward-compat shared override
4. Auto-select `gemma-4-27b-it` when the main provider is Gemini/Google
5. `None` — fall through to `call_llm` vision router

Adds `_BROWSER_VISION_DEFAULT_MODEL = "gemma-4-27b-it"` constant and
registers `gemma-4-27b-it` in the Gemini provider model catalog.

16 new tests in `tests/tools/test_browser_vision_model.py` cover each
priority level, edge cases (empty env, config exceptions, wrong provider).

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 18:18:30 -04:00
Alexander Whitestone
4214082fb6 feat: A2A auth — mutual TLS between fleet agents
All checks were successful
Lint / lint (pull_request) Successful in 8s
Implements mTLS for securing agent-to-agent communication in the Hermes
fleet. Fixes #806.

Changes:
- scripts/gen_fleet_ca.sh: generate a self-signed Fleet CA (4096-bit RSA,
  10-year validity) that signs all agent certificates
- scripts/gen_agent_cert.sh: generate per-agent certs (Timmy, Allegro,
  Ezra) signed by the fleet CA with SAN entries and clientAuth/serverAuth
  extended key usage
- agent/mtls.py: new module providing:
  - build_server_ssl_context() — TLS_SERVER context with CERT_REQUIRED,
    enforces client cert against Fleet CA
  - build_client_ssl_context() — TLS_CLIENT context for outbound A2A calls
  - MTLSMiddleware — ASGI middleware that rejects unauthenticated requests
    to A2A routes (/.well-known/agent-card*, /api/agent-card, /a2a/) with
    HTTP 403 when mTLS is enabled
  - is_mtls_configured() — checks HERMES_MTLS_CERT/KEY/CA env vars
- hermes_cli/web_server.py: wire MTLSMiddleware into the FastAPI app;
  pass SSL context to uvicorn when HERMES_MTLS_* env vars are set so
  the server runs TLS with mandatory client cert verification
- ansible/roles/hermes_mtls/: Ansible role to distribute Fleet CA cert,
  agent cert, and agent key to fleet nodes; writes an env file with
  HERMES_MTLS_* vars and restarts the hermes-gateway service
- ansible/fleet_mtls.yml: fleet-wide playbook referencing the role for
  Timmy, Allegro, and Ezra nodes
- tests/test_mtls.py: 15 tests covering is_mtls_configured, SSL context
  creation with real cryptography-generated certs, and MTLSMiddleware
  (unauthorized agent rejected → 403, authorized agent accepted → 200)

mTLS is opt-in: set HERMES_MTLS_CERT, HERMES_MTLS_KEY, and HERMES_MTLS_CA
to enable. When unset, the server behaves exactly as before.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 18:04:00 -04:00
Alexander Whitestone
95bb842a21 feat: Wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 8s
Default browser_vision screenshots to google/gemma-4-27b-it (Gemma 4
native multimodal) for reduced latency and unified text+vision model.

Resolution order for _get_vision_model():
1. BROWSER_VISION_MODEL env var (new, browser-specific override)
2. auxiliary.browser_vision.model in config.yaml (new config key)
3. AUXILIARY_VISION_MODEL env var (existing global vision override)
4. Default: google/gemma-4-27b-it

Backward compatibility: existing AUXILIARY_VISION_MODEL users are
unaffected — their override still flows through to browser_vision.

Also documents the new auxiliary.browser_vision config section in
cli-config.yaml.example and adds 14 unit tests covering the full
priority chain.

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 17:14:32 -04:00
Alexander Whitestone
ac28444bf2 feat: add A2AMTLSServer routing API, A2AMTLSClient, and expand tests to 20 (#806)
All checks were successful
Lint / lint (pull_request) Successful in 9s
Builds on the existing A2AServer / build_*_ssl_context foundation:

- agent/a2a_mtls.py:
  - Add A2AMTLSServer: routing-based HTTPS server with add_route() and
    context-manager (__enter__/__exit__) lifecycle support
  - Add A2AMTLSClient: fleet-cert-presenting HTTP client with .get() / .post()
  - Widen imports (json, Callable, Dict, urlopen)

- tests/agent/test_a2a_mtls.py:
  - Fix datetime.utcnow() deprecation — use datetime.now(timezone.utc)
  - Add TestA2AMTLSServerAndClient (9 tests): routing GET/POST, 404,
    context-manager stop, rogue-cert rejection, A2AMTLSClient, concurrency
  - Total: 11 → 20 passing tests

Refs #806
2026-04-21 15:21:10 -04:00
Alexander Whitestone
12b5d9a7fd refactor: remove redundant vision_model guard in browser_vision
All checks were successful
Lint / lint (pull_request) Successful in 10s
_get_vision_model() now always returns a non-empty string (Gemma 4 default
or configured override), so the `if vision_model:` conditional guard is
unnecessary. Replace with unconditional assignment and add a debug log
line showing which model was selected.

Refs #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 14:09:40 -04:00
Alexander Whitestone
91faf6f956 feat: A2A auth — mutual TLS between fleet agents
All checks were successful
Lint / lint (pull_request) Successful in 10s
Implements mutual TLS for secure agent-to-agent communication (#806).

- scripts/gen_fleet_ca.sh: generate fleet CA (4096-bit RSA, 10-year)
- scripts/gen_agent_cert.sh: per-agent cert signed by fleet CA (timmy, allegro, ezra)
- agent/a2a_mtls.py: A2AServer requiring client cert verification (CERT_REQUIRED),
  build_server_ssl_context / build_client_ssl_context helpers, server_from_env()
- ansible/roles/fleet_mtls_certs/: distribute CA + per-agent certs to fleet nodes,
  write /etc/hermes/a2a.env, notify hermes-a2a service on change
- ansible/fleet_mtls.yml + ansible/inventory/fleet.ini.example: playbook + example inventory
- tests/agent/test_a2a_mtls.py: 11 tests — authorized agent accepted (200/202),
  self-signed cert rejected, no-cert rejected, lifecycle, env-var wiring

Fixes #806

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 13:28:28 -04:00
Alexander Whitestone
b6398b8b0d feat: wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 19s
Default browser screenshot analysis now uses Gemma 4 27B
(google/gemma-4-27b-it) instead of deferring to the auxiliary router's
auto-detection.  Gemma 4 is natively multimodal — the same model family
already in use for text tasks — which avoids cold-start model-switching
overhead and improves context continuity.

Resolution order for _get_vision_model():
  1. BROWSER_VISION_MODEL env var (browser-specific override)
  2. auxiliary.browser_vision.model in config.yaml
  3. AUXILIARY_VISION_MODEL env var (shared/legacy override)
  4. google/gemma-4-27b-it (new default)

- Add _BROWSER_VISION_DEFAULT_MODEL constant to browser_tool.py
- Document auxiliary.browser_vision config key in cli-config.yaml.example
- Add 10 unit tests covering all resolution steps

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:49:46 -04:00
a2a40429bd Merge pull request '[claude] Poka-yoke: auto-revert incomplete skill edits (#923)' (#946) from claude/issue-923 into main
All checks were successful
Lint / lint (push) Successful in 10s
2026-04-21 16:38:24 +00:00
ee61c5fa9d Merge pull request 'feat: Add queue health check script' (#912) from feat/queue-health-check into main
All checks were successful
Lint / lint (push) Successful in 34s
2026-04-21 15:37:59 +00:00
Alexander Whitestone
1fece10569 feat: poka-yoke auto-revert for incomplete skill edits (#923)
All checks were successful
Lint / lint (pull_request) Successful in 32s
Implement a transactional write-validate-commit-or-rollback pattern for
all skill_manage write operations (edit, patch, write_file):

- _backup_skill_file: timestamped .bak.{ts} snapshot before every write
- _validate_written_file: re-reads from disk after write to catch truncation,
  encoding errors, and broken YAML frontmatter
- _revert_from_backup: restores original content (or removes the corrupted
  file) on any validation failure
- _cleanup_old_backups: prunes to MAX_BACKUPS_PER_FILE (3) after success;
  failed edits keep their .bak file as a debugging aid

Also fixes pre-existing issue where _patch_skill error returns lacked a
`suggestion` field expected by test_skill_manager_error_context.py tests.

Adds 21 tests in test_skill_manager_autorevert.py covering every component
and an end-to-end simulation of mid-write failure + auto-revert.

Fixes #923

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 11:37:55 -04:00
46668505bc Merge pull request 'feat: tool fixation detection — break repetitive loops (#886)' (#914) from fix/886 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:35:08 +00:00
cac0c8224e Merge pull request 'fix: circuit breaker for error cascading (2.33x amplification)' (#927) from fix/885-circuit-breaker into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:35:04 +00:00
f38a64455d Merge pull request '[claude] Gateway config debt: add validation tests and API_SERVER_KEY warning (#892)' (#915) from claude/issue-892 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:33:19 +00:00
1b35a5a0d2 Merge pull request 'feat: Poka-yoke — hardcoded path guard (#921)' (#928) from fix/921-hardcoded-path-guard into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:33:14 +00:00
9172131b25 Merge pull request 'docs: tool investigation report from awesome-ai-tools (#926)' (#931) from fix/926 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:33:12 +00:00
407eab3331 Merge pull request 'feat: session deterministic seeding & marathon limits' (#919) from feat/session-management-1776700585635 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:29:44 +00:00
cf090a966d Merge pull request 'fix: Poka-yoke — detect and block tool hallucination before API calls (#922)' (#935) from fix/922 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:29:35 +00:00
b65be9b12c Merge pull request '[claude] Add tool investigation report: top 5 awesome-ai-tools recommendations (#926)' (#936) from claude/issue-926 into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:29:32 +00:00
3c1cff255e Merge pull request 'ci: integrate hardcoded path linter into CI workflow' (#938) from fix/865-ci-path-linter into main
Some checks failed
Lint / lint (push) Has been cancelled
2026-04-21 15:29:30 +00:00
690d100afc Merge pull request 'feat: Poka-yoke token budget — progressive context overflow guard (#925)' (#943) from burn/925-1776770102 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been skipped
Nix / nix (ubuntu-latest) (push) Failing after 5s
Tests / e2e (push) Successful in 5m8s
Tests / test (push) Failing after 30m13s
Nix / nix (macos-latest) (push) Has been cancelled
2026-04-21 15:29:02 +00:00
c6f0831738 Merge pull request 'feat: Python syntax validation before execute_code (#913)' (#917) from fix/913-syntax-validation into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:27:05 +00:00
30773ac1f9 Merge pull request 'fix: Path validation before read_file — poka-yoke (#887)' (#911) from fix/887-path-validation-read-file into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:55 +00:00
feb24bd08c Merge pull request 'feat: Block silent credential exposure in tool outputs (#839)' (#910) from fix/839-1776403070 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:47 +00:00
bc55f40505 Merge pull request 'feat: time-aware model routing for cron jobs (#889)' (#909) from fix/889 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:43 +00:00
2adc72335e Merge pull request 'fix: profile session isolation — tag and filter by profile' (#907) from fix/891-profile-isolation into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:39 +00:00
ab32670464 Merge pull request 'feat: Poka-yoke — detect and block tool hallucination before API calls (#922)' (#944) from burn/922-1776770102 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:23:56 +00:00
bfc0231297 Merge pull request 'docs: holographic + vector hybrid memory architecture (#879)' (#942) from fix/879 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:23:49 +00:00
cf2b09cf2f Merge pull request 'docs: emotional presence patterns for crisis support (#880)' (#941) from fix/880 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:23:45 +00:00
719bb537c0 Merge pull request 'feat: provider preflight validation before session start (#924)' (#932) from fix/924 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:23:02 +00:00
0bcbcf19ac Merge pull request 'feat: time-aware model routing for cron jobs #889' (#906) from fix/time-aware-routing-889 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:22:37 +00:00
27d2f2ca0e Merge pull request 'feat: Prevent context window overflow via proactive token counting (#838)' (#905) from fix/838-1776402240 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / e2e (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-04-21 15:22:31 +00:00
7e7dcfa345 Merge pull request 'fix: Gateway config validation and fallback fixes (#892)' (#904) from fix/892 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / e2e (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-04-21 15:22:22 +00:00
ba0e614446 Merge pull request 'feat: integrate 988 Suicide & Crisis Lifeline — automatic crisis escalation (#673)' (#903) from feat/673 into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:22:17 +00:00
4f5e641c92 Merge pull request 'fix: kill 9 dead cron jobs — audit and cleanup script' (#902) from fix/890-dead-cron-jobs into main
Some checks failed
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / e2e (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-04-21 15:22:15 +00:00
d61bd141f9 feat: add poka-yoke validation to non-execute_code dispatch (#922)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 32s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s
Tests / e2e (pull_request) Successful in 3m5s
Tests / test (pull_request) Failing after 36m26s
2026-04-21 12:01:57 +00:00
a4058af238 feat: wire poka-yoke validation into tool dispatch (#922) 2026-04-21 12:00:20 +00:00
08432a5618 test: poka-yoke validation tests (#922) 2026-04-21 11:59:26 +00:00
a875c6ed91 feat: poka-yoke tool call validation firewall (#922) 2026-04-21 11:59:25 +00:00
07c5b5b83d test: add token budget poka-yoke tests (#925)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 44s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s
Tests / test (pull_request) Failing after 25m21s
Tests / e2e (pull_request) Successful in 3m18s
2026-04-21 11:41:39 +00:00
ba56567631 docs: holographic + vector hybrid memory architecture (#879)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s
Tests / test (pull_request) Failing after 14m3s
Tests / e2e (pull_request) Successful in 1m53s
2026-04-21 11:41:31 +00:00
8ac26f54a5 feat: token budget with progressive poka-yoke thresholds (#925) 2026-04-21 11:40:39 +00:00
b807972d05 docs: emotional presence patterns for crisis support (#880)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 52s
Tests / e2e (pull_request) Successful in 3m53s
Tests / test (pull_request) Failing after 53m34s
2026-04-21 11:37:57 +00:00
6b5a6db668 ci: add Gitea Actions lint workflow
All checks were successful
Lint / lint (pull_request) Successful in 15s
Part of #865. Runs hardcoded path linter on every push/PR.
2026-04-21 11:37:33 +00:00
b702249c12 ci: add hardcoded path linter to CI workflow
Closes #865

Runs scripts/lint_hardcoded_paths.py as a CI check.
Uses continue-on-error for now since the linter may have false positives.
2026-04-21 11:37:31 +00:00
Alexander Whitestone
8023c9b8f2 docs: add tool investigation report for top 5 awesome-ai-tools recommendations
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 55s
Tests / e2e (pull_request) Successful in 3m56s
Tests / test (pull_request) Failing after 54m0s
Persists the research report from issue #926 as a markdown file following
the existing convention of research_*.md files in the repo. Documents the
top 5 tool recommendations (LiteLLM, Mem0, RAGFlow, LiteRT-LM, Claude-Mem)
with integration effort, impact scores, and phased implementation plan.

Refs #926
2026-04-21 07:26:44 -04:00
TERRA
9edd5383e7 feat: add hermes web console cockpit and browser self-healing (#394)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 36s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 31s
Tests / e2e (pull_request) Successful in 3m37s
Tests / test (pull_request) Failing after 38m26s
2026-04-21 02:00:41 -04:00
TERRA
f6c072f136 wip: add web console cockpit regression tests for #394 2026-04-21 02:00:41 -04:00
6eeee39c10 test(#922): Add tests for tool hallucination detection
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 1m15s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 1m8s
Tests / e2e (pull_request) Successful in 3m44s
Tests / test (pull_request) Failing after 1h9m15s
Tests for validation firewall:
- Unknown tool detection
- Missing required params
- Wrong type detection
- Hallucination patterns
- Rejection stats

Refs #922
2026-04-21 05:38:54 +00:00
b2d2d2c650 fix(#922): Poka-yoke — detect and block tool hallucination
Validation firewall between LLM tool-call output and execution:
1. Unknown tool names rejected
2. Malformed parameters caught
3. Missing required arguments detected
4. Hallucination patterns detected

All rejections logged with model provenance.
Agent receives rejection as tool result for self-correction.

Resolves #922
2026-04-21 05:38:22 +00:00
5b62bb8d81 feat(#394): Hermes web UI operator cockpit
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 43s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 1m9s
Tests / e2e (pull_request) Successful in 6m9s
Tests / test (pull_request) Failing after 1h3m4s
Minimal web interface for Hermes operation:
- Chat interface with streaming
- System status monitoring
- Crisis detection display
- Session management
- Dark theme, responsive design

Source-backed: Hermes Atlas pattern.
Refs #394
2026-04-21 05:34:22 +00:00
10f9fd690a feat(#394): Self-healing browser CDP layer (browser-harness)
Source-backed browser automation:
- CDP connection with auto-reconnect
- Self-healing on disconnects
- Screenshot, DOM inspection, JS evaluation
- Click, type, navigate primitives
- Session persistence

Refs #394
2026-04-21 05:33:32 +00:00
bdd0f2709b feat: provider preflight validation before session start (#924)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 47s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 52s
Tests / test (pull_request) Failing after 30m48s
Tests / e2e (pull_request) Successful in 2m9s
2026-04-21 04:48:57 +00:00
a9cbf7d69f docs: tool investigation report from awesome-ai-tools (#926)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 36s
Tests / e2e (pull_request) Successful in 2m56s
Tests / test (pull_request) Failing after 34m20s
2026-04-21 04:45:03 +00:00
b64f4d9632 feat: update run_agent.py for deep dive security
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 28s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 4s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 35s
Tests / test (pull_request) Failing after 1h0m5s
Tests / e2e (pull_request) Failing after 6m56s
Nix / nix (macos-latest) (pull_request) Has been cancelled
2026-04-21 00:41:55 +00:00
7caaf49a34 feat: deep dive integration of tests/test_shield_multilingual.py 2026-04-21 00:41:53 +00:00
e52f6d2cde feat: deep dive integration of tools/shield/detector.py 2026-04-21 00:41:52 +00:00
000d64deed feat: deep dive integration of agent/input_sanitizer.py 2026-04-21 00:41:50 +00:00
d527cb569b feat: deep dive integration of agent/shield.py 2026-04-21 00:41:49 +00:00
44ada06fd4 feat: update agent/privacy_filter.py for deep dive security 2026-04-21 00:41:48 +00:00
4cdda8701d feat: integrate hardcoded path guard into tool dispatch
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 32s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s
Tests / e2e (pull_request) Successful in 2m56s
Tests / test (pull_request) Failing after 1h1m7s
2026-04-21 00:31:01 +00:00
a80d30b342 feat: add pre-commit hook for hardcoded path detection 2026-04-21 00:29:33 +00:00
f098cf8c4a feat: add hardcoded path guard module (#921)
- Detects /Users/, /home/, ~/ in tool arguments
- Source code scanner for CI/pre-commit
- Runtime guard for tool dispatch
- noqa: hardcoded-path-ok escape hatch

Closes #921
2026-04-21 00:29:12 +00:00
30509b9c7c test: circuit breaker tests
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 38s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 40s
Tests / e2e (pull_request) Successful in 1m36s
Tests / test (pull_request) Failing after 17m13s
Part of #885
2026-04-21 00:28:15 +00:00
ccaa1cb021 feat: circuit breaker for error cascading
Closes #885

2.33x error cascade factor detected. After 3 consecutive errors,
circuit opens and agent must take corrective action.

Recovery pattern: terminal is the safety net (2300 recoveries).
2026-04-21 00:28:14 +00:00
c6f2855745 fix: restore _format_error helper for test compatibility (#916)
Some checks failed
Docker Build and Publish / build-and-push (push) Has been skipped
Nix / nix (ubuntu-latest) (push) Failing after 2s
Tests / e2e (push) Successful in 2m47s
Tests / test (push) Failing after 27m41s
Build Skills Index / build-index (push) Has been skipped
Build Skills Index / deploy-with-index (push) Has been skipped
Nix / nix (macos-latest) (push) Has been cancelled
fix: restore _format_error helper for test compatibility (#916)
2026-04-20 23:56:27 +00:00
9d180f31cc feat: add session templates
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 43s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s
Tests / test (pull_request) Failing after 45m24s
Tests / e2e (pull_request) Failing after 7m35s
2026-04-20 15:56:26 +00:00
3d8cf5122a feat: add agent/shield.py for SHIELD defense
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 31s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 40s
Tests / e2e (pull_request) Successful in 2m2s
Tests / test (pull_request) Failing after 52m0s
2026-04-20 15:54:48 +00:00
790b677978 feat: add tests/test_shield_multilingual.py for SHIELD defense 2026-04-20 15:54:46 +00:00
9a749d2854 feat: add agent/input_sanitizer.py for SHIELD defense 2026-04-20 15:54:45 +00:00
68534e78be feat: add tools/shield/detector.py for SHIELD defense 2026-04-20 15:54:43 +00:00
c17f64fa2c test: add syntax validation tests (#913)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 41s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 29s
Tests / e2e (pull_request) Successful in 2m2s
Tests / test (pull_request) Failing after 1h14m43s
2026-04-20 15:47:35 +00:00
bc7ffc2166 feat: Python syntax validation before execute_code (#913) 2026-04-20 15:46:23 +00:00
Alexander Whitestone
c22cdcaa8e fix: add _validate_gateway_config tests and API_SERVER_KEY network binding warning
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 23s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 27s
Tests / e2e (pull_request) Successful in 1m51s
Tests / test (pull_request) Failing after 37m0s
Refs #892 - Gateway config debt: missing keys and broken fallbacks

Changes:
- Add `_is_network_accessible()` helper to gateway/config.py (avoids circular
  import with gateway.platforms.base which imports from gateway.config)
- Add API_SERVER_KEY warning in `_validate_gateway_config`: when the API server
  is enabled on a network-accessible address (0.0.0.0, public IP, hostname) but
  no key is configured, log a warning at config-load time so operators see the
  issue before any adapter initialisation runs
- Add `TestValidateGatewayConfig` in tests/gateway/test_config.py covering:
  - idle_minutes <= 0 and None are corrected to 1440 (default)
  - at_hour outside 0-23 is corrected to 4 (default)
  - Boundary hours 0 and 23 are accepted unchanged
  - Empty platform token triggers a warning log
  - Disabled platform with empty token produces no warning
  - API server on 0.0.0.0 without key logs a warning
  - API server on 127.0.0.1 without key is silent (loopback is allowed)
  - API server with a key set logs no warning regardless of bind address

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 02:18:02 -04:00
Alexander Whitestone
ab968e910c feat: tool fixation detection — break repetitive loops (#886)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 37s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 43s
Tests / e2e (pull_request) Successful in 1m57s
Tests / test (pull_request) Failing after 18m57s
Marathon sessions show tool fixation: agent latches onto one tool
and calls it repeatedly. Observed streaks of 8-25 identical calls.

New agent/tool_fixation_detector.py:
- ToolFixationDetector: tracks consecutive tool calls
- record(tool_name): returns nudge prompt when threshold reached
- Default threshold: 5 consecutive calls (configurable via
  TOOL_FIXATION_THRESHOLD env var)
- Nudge prompt explains the fixation and suggests alternatives:
  1. Read error carefully
  2. Try different tool
  3. Ask user for clarification
  4. Check if task is complete
- get_streak_info(): current streak state
- format_report(): human-readable fixation events
- Singleton via get_fixation_detector()

Config:
- TOOL_FIXATION_THRESHOLD (default: 5)
- TOOL_FIXATION_WINDOW (default: 10)

Tests: tests/test_tool_fixation_detector.py (9 tests)

Closes #886
2026-04-17 01:57:37 -04:00
Alexander Whitestone
73984ca72f feat: Add queue health check script
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 29s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 31s
Tests / e2e (pull_request) Successful in 2m13s
Tests / test (pull_request) Failing after 28m10s
2026-04-17 01:26:07 -04:00
436c800def fix: add path validation before read_file (#887)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 35s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 38s
Tests / e2e (pull_request) Successful in 1m58s
Tests / test (pull_request) Failing after 42m6s
- Check if file exists before attempting read
- Return clear error with suggestions for similar files
- Suggest using search_files to find correct path
- Eliminates 83.7% of read_file errors (file not found)

Closes #887
2026-04-17 05:24:52 +00:00
cb331da4f1 test: Add credential redaction tests (#839)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 49s
Tests / e2e (pull_request) Successful in 2m50s
Tests / test (pull_request) Failing after 11m50s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 47s
2026-04-17 05:23:48 +00:00
fa892bfcb9 feat: Add credential redaction for tool outputs (#839) 2026-04-17 05:21:25 +00:00
Alexander Whitestone
0b72884750 feat: time-aware model routing for cron jobs (#889)
Some checks failed
Tests / test (pull_request) Failing after 25m4s
Tests / e2e (pull_request) Successful in 3m19s
Contributor Attribution Check / check-attribution (pull_request) Failing after 14s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 14s
Error rate peaks at 18:00 (9.4%) during evening cron batches vs 4.0%
at 09:00 during interactive work. Route cron tasks to stronger models
during off-hours when user is not present to correct errors.

New agent/time_aware_routing.py:
- resolve_time_aware_model(): routes based on hour, error rate, task type
- Interactive sessions: always use base model (user corrects errors)
- Cron during business hours: use base model (low error rate)
- Cron during off-hours with high error rate (>6%): upgrade to strong model
- get_hour_error_rate(): error rates by hour from empirical audit
- is_off_hours(): 18:00-05:59 = off-hours
- RoutingDecision: model, provider, reason, hour, error_rate
- get_routing_report(): 24h forecast of routing decisions

Config via env vars:
- CRON_STRONG_MODEL (default: xiaomi/mimo-v2-pro)
- CRON_CHEAP_MODEL (default: qwen2.5:7b)
- CRON_ERROR_THRESHOLD (default: 6.0%)

Tests: tests/test_time_aware_routing.py (9 tests)

Closes #889
2026-04-17 01:15:09 -04:00
a0ed1e6ff2 test: profile isolation tests
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 15s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Tests / test (pull_request) Failing after 18m33s
Tests / e2e (pull_request) Successful in 1m17s
Part of #891
2026-04-17 05:13:03 +00:00
b5ba272efe feat: profile session isolation
Closes #891

Tags sessions with originating profile and provides filtered
access so profiles cannot see each other's data.
2026-04-17 05:13:01 +00:00
2e0dfe27df feat: time-aware model routing for cron jobs #889
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 15s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 14s
Tests / test (pull_request) Failing after 18m19s
Tests / e2e (pull_request) Successful in 1m17s
2026-04-17 05:10:34 +00:00
d4cdfdc604 test: Add context budget tracker tests (#838)
Some checks failed
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19s
Contributor Attribution Check / check-attribution (pull_request) Failing after 16s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Tests / test (pull_request) Failing after 18m30s
Tests / e2e (pull_request) Successful in 1m16s
2026-04-17 05:06:54 +00:00
e3436e36c3 feat: Add context budget tracker for overflow prevention (#838) 2026-04-17 05:06:08 +00:00
34e7de6a4c feat: 988 Lifeline tests (#673)
Some checks failed
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 18s
Contributor Attribution Check / check-attribution (pull_request) Failing after 17s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Tests / test (pull_request) Failing after 18m18s
Tests / e2e (pull_request) Successful in 1m13s
2026-04-17 05:04:50 +00:00
dbabe0e6ae feat: 988 Suicide & Crisis Lifeline integration (#673)
agent/crisis_resources.py provides all 988 Lifeline contact
methods: phone (988), text (HOME to 988), chat, Spanish line.
Also Crisis Text Line (741741) and 911.

Closes #673
2026-04-17 05:04:48 +00:00
517e2c571e fix(#892): Gateway config validation and fallback fixes
Some checks failed
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 16s
Tests / test (pull_request) Failing after 18m29s
Tests / e2e (pull_request) Successful in 1m20s
Contributor Attribution Check / check-attribution (pull_request) Failing after 16s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Config validator and fallback fixes:
- Validate required keys (OPENROUTER_API_KEY, API_SERVER_KEY)
- Fix idle_minutes validation (>0 required)
- Fix Discord skill limit (reduce to 95 max)
- Validate provider configs
- Apply sensible defaults

Resolves #892
2026-04-17 05:04:11 +00:00
0b019327a3 docs: cron audit documentation
Some checks failed
Tests / e2e (pull_request) Successful in 49s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s
Contributor Attribution Check / check-attribution (pull_request) Failing after 14s
Tests / test (pull_request) Failing after 18m24s
Part of #890
2026-04-17 05:00:09 +00:00
6b0fca6944 feat: cron job audit and cleanup script
Closes #890

Finds dead cron jobs (zero completions, stale) and provides
--disable and --delete actions to clean them up.
2026-04-17 05:00:06 +00:00
05f8c2d188 Merge PR #899
Merged PR #899: feat: Allegro worker deliverables
2026-04-17 01:52:11 +00:00
ff2ce95ade feat(research): Allegro worker deliverables — fleet research reports + skill manager test
Some checks failed
Tests / e2e (pull_request) Successful in 1m39s
Tests / test (pull_request) Failing after 1h7m45s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Successful in 24s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 28s
Research reports:
- Vector DB research
- Workflow orchestration research
- Fleet knowledge graph SOTA research
- LLM inference optimization
- Local model crisis quality
- Memory systems SOTA
- Multi-agent coordination
- R5 vs E2E gap analysis
- Text-to-music-video

Test:
- test_skill_manager_error_context.py

[Allegro] Forge workers — 2026-04-16
2026-04-16 15:04:28 +00:00
Hermes Merge Bot
aedebfdf58 Merge PR #848 2026-04-16 02:12:13 -04:00
Hermes Merge Bot
adf49b1809 Merge PR #849 2026-04-16 02:11:21 -04:00
Hermes Merge Bot
52ea3a8935 Merge PR #850 2026-04-16 02:09:00 -04:00
Hermes Merge Bot
43246d6cb4 Merge PR #852 2026-04-16 02:08:06 -04:00
Hermes Merge Bot
20c5e237a7 Merge PR #861 2026-04-16 02:06:36 -04:00
Hermes Merge Bot
a0f4d10a7f Merge PR #855 2026-04-16 02:06:17 -04:00
Hermes Merge Bot
bc5d1cf6ff Merge PR #863 2026-04-16 02:05:44 -04:00
Hermes Merge Bot
dff451081d Merge PR #856 2026-04-16 02:05:42 -04:00
Hermes Merge Bot
5509b157c5 Merge PR #864 2026-04-16 02:05:05 -04:00
Hermes Merge Bot
fcc322fb81 Merge PR #867 2026-04-16 02:03:23 -04:00
Hermes Merge Bot
9bba9ecc40 Merge PR #866 2026-04-16 02:02:43 -04:00
Hermes Merge Bot
05086e58ea Merge PR #871 2026-04-16 02:00:55 -04:00
Hermes Merge Bot
7af6889767 Merge PR #869 2026-04-16 02:00:49 -04:00
5022db9d7b Merge pull request 'feat: self-modifying agent that improves its own prompts (#813)' (#897) from fix/813 into main 2026-04-16 05:29:11 +00:00
0f61474b74 Merge pull request 'feat: MCP server — expose hermes tools to fleet peers (#803)' (#896) from fix/803 into main
Auto-merged PR #896: feat: MCP server — expose hermes tools to fleet peers (#803)
2026-04-16 05:24:27 +00:00
Alexander Whitestone
a528bd5b1b fix: use .get() for env_vars key in _show_tool_availability_warnings
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 24s
Tests / test (pull_request) Failing after 1h2m1s
Tests / e2e (pull_request) Successful in 1m38s
Fixes KeyError: 'missing_vars' crash on CLI startup when toolsets are
unavailable. registry.py returns dicts with 'env_vars' key, but
_show_tool_availability_warnings() was accessing 'missing_vars' directly.

Now uses .get("env_vars") or .get("missing_vars") to handle both key
names, consistent with how doctor.py already handles this.

Fixes #834

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 01:23:48 -04:00
Alexander Whitestone
e63cdaf16f feat: self-modifying agent that improves its own prompts (#813)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been cancelled
Contributor Attribution Check / check-attribution (pull_request) Has been cancelled
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Tests / e2e (pull_request) Has been cancelled
Resolves #813. Agent analyzes session transcripts for failure
patterns and generates prompt patches to prevent future failures.

agent/self_modify.py (PromptLearner class):
- analyze_session(): detects 5 failure types from transcripts:
  retry_loop, timeout, hallucination, context_loss, tool_failure
- generate_patches(): converts patterns to prompt patches with
  confidence scoring (frequency-based)
- apply_patches(): appends learned rules to system prompt with
  backup and rollback support
- learn_from_session(): full cycle analyze → patch → apply

Failures → patterns → patches → improved prompts → fewer failures.

Safety: patches only ADD rules (append-only), never remove.
Rollback:  restores from timestamped backup.
2026-04-16 01:23:48 -04:00
Alexander Whitestone
2b7b12baf9 feat: MCP server — expose hermes tools to fleet peers (#803)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 44s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Tests / test (pull_request) Has been cancelled
Tests / e2e (pull_request) Has been cancelled
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19m48s
Resolves #803. Standalone MCP server that exposes safe hermes
tools to other fleet agents.

scripts/mcp_server.py:
- Exposes: terminal, file_read, file_search, web_search, session_search
- Blocks: approval, delegate, memory, config, cron, send_message
- Terminal uses approval.py dangerous command detection
- Auth via Bearer token (MCP_AUTH_KEY)
- HTTP endpoints: GET /mcp/tools, POST /mcp/tools/call, GET /health

Usage:
  python scripts/mcp_server.py --port 8081 --auth-key SECRET
  curl http://localhost:8081/mcp/tools
  curl -X POST http://localhost:8081/mcp/tools/call -d {"name":"file_read","arguments":{"path":"README.md"}}
2026-04-16 01:10:00 -04:00
Alexander Whitestone
6b40c5db7a fix: use env_vars key in _show_tool_availability_warnings to prevent KeyError
registry.py:check_tool_availability() returns unavailable dicts with key
"env_vars", but _show_tool_availability_warnings() in cli.py was accessing
u["missing_vars"] causing a KeyError crashing CLI startup whenever any
toolset was disabled.

Fix matches how doctor.py already handles the same data.

Fixes #834
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 00:42:03 -04:00
5a24894f78 fix: update hermes_cli/web_server.py for agent card discovery
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 43s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 5s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 38s
Tests / test (pull_request) Failing after 10m58s
Tests / e2e (pull_request) Successful in 1m32s
Nix / nix (macos-latest) (pull_request) Has been cancelled
2026-04-16 03:45:04 +00:00
a474eb8459 fix: add agent/agent_card.py for agent card discovery 2026-04-16 03:45:01 +00:00
Alexander Whitestone
3238cf4eb1 feat: Tool investigation report + Mem0 local provider (#842)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 38s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s
Tests / test (pull_request) Failing after 43m54s
Tests / e2e (pull_request) Successful in 2m5s
## Investigation Report
- docs/tool-investigation-2026-04-15.md: Full report analyzing 414 tools
  from awesome-ai-tools. Top 5 recommendations with integration paths.
- docs/plans/awesome-ai-tools-integration.md: Implementation tracking plan.

## Mem0 Local Provider (P1)
- plugins/memory/mem0_local/: New ChromaDB-backed memory provider.
  No API key required - fully sovereign. Compatible tool schemas with
  cloud Mem0 (mem0_profile, mem0_search, mem0_conclude).
- Pattern-based fact extraction from conversations.
- Deterministic dedup via content hashing.
- Circuit breaker for resilience.
- tests/plugins/memory/test_mem0_local.py: Full test coverage.

## Issues Filed
- #857: LightRAG integration (P2)
- #858: n8n workflow orchestration (P3)
- #859: RAGFlow document understanding (P4)
- #860: tensorzero LLMOps evaluation (P3)

Closes #842
2026-04-15 23:04:41 -04:00
eed87e454e test: Benchmark Gemma 4 vision accuracy vs current approach (#817)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 26s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 26s
Tests / e2e (pull_request) Successful in 2m38s
Tests / test (pull_request) Failing after 47m49s
Vision benchmark suite comparing Gemma 4 (google/gemma-4-27b-it) vs
current Gemini 3 Flash Preview (google/gemini-3-flash-preview).

Metrics:
- OCR accuracy (character + word overlap)
- Description completeness (keyword coverage)
- Structural quality (length, sentences, numbers)
- Latency (ms per image)
- Token usage
- Consistency across runs

Features:
- 24 diverse test images (screenshots, diagrams, photos, charts)
- Category-specific evaluation prompts
- Automated verdict with composite scoring
- JSON + markdown report output
- 28 unit tests passing

Usage:
  python benchmarks/vision_benchmark.py --images benchmarks/test_images.json
  python benchmarks/vision_benchmark.py --url https://example.com/img.png
  python benchmarks/vision_benchmark.py --generate-dataset

Closes #817.
2026-04-15 23:02:02 -04:00
Alexander Whitestone
f03709aa29 test: crisis hook integration tests with agent loop (#707)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 16s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s
Tests / e2e (pull_request) Failing after 12m38s
Tests / test (pull_request) Failing after 25m58s
10 integration tests verifying crisis detection works correctly
when called from the agent conversation flow:

- scan_user_message detects CRITICAL/HIGH/MEDIUM/LOW levels
- Safe messages pass through without triggering
- Tool handler returns valid JSON
- Compassion injection includes 988 lifeline for CRITICAL/HIGH
- Case insensitive detection
- Empty/None text handled gracefully
- False positive resistance on common non-crisis phrases
- Config check returns bool
- Callable from agent context (not just isolation tests)
2026-04-15 23:00:12 -04:00
Alexander Whitestone
4d8e004b5f fix: extend JSON repair to remaining json.loads sites in run_agent.py
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 42s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 4s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 36s
Tests / test (pull_request) Failing after 1h13m6s
Tests / e2e (pull_request) Successful in 1m32s
Nix / nix (macos-latest) (pull_request) Has been cancelled
Adds `repair_and_load_json()` to utils.py using the `json_repair` library
as a fallback when `json.loads()` fails. Replaces 8 non-hot-path json.loads
sites identified in issue #809:

- L2250: trajectory/sanitization message content parsing
- L2500: tool_call dict reconstruction in trajectory conversion
- L2535: tool_content parsing (JSON-like strings in tool responses)
- L2888: session log file loading (with warning on unrecoverable parse)
- L3119: todo content parsing in message processing
- L5963: vision result_json parsing
- L6761: memory flush tool call argument parsing
- L8300: cache serialization tool call args normalization

Each site uses an appropriate default ({} for tool args, None/continue for
content parsing) and a context label for debug tracing.

Fixes #809
2026-04-15 22:56:39 -04:00
85a654348a feat: poka-yoke — prevent hardcoded ~/.hermes paths (closes #835)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 27s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19s
Tests / e2e (pull_request) Successful in 1m55s
Tests / test (pull_request) Failing after 56m41s
scripts/lint_hardcoded_paths.py (new):
- Scans Python files for hardcoded home-directory paths
- Detects: Path.home()/.hermes without env fallback, /Users/<name>/, /home/<name>/
- Excludes: comments, docstrings, test files, skills, plugins, docs
- Excludes correct patterns: profiles_parent, current_default, native_home
- Supports --staged (git pre-commit), --fix (suggestions), --json output

scripts/pre-commit-hardcoded-paths.sh (new):
- Pre-commit hook that runs lint_hardcoded_paths.py --staged
- Blocks commits containing hardcoded path violations

tools/confirmation_daemon.py (fixed):
- Replaced Path.home() / '.hermes' / 'approval_whitelist.json'
  with get_hermes_home() / 'approval_whitelist.json'
- Added import of get_hermes_home from hermes_constants

tests/test_hardcoded_paths.py (new):
- 11 tests: detection, exclusion, fallback patterns, clean files
2026-04-15 22:56:32 -04:00
fc0d8fe5e9 fix: extend JSON repair to ALL remaining json.loads sites (#809)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 2s
Contributor Attribution Check / check-attribution (pull_request) Successful in 26s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 26s
Tests / e2e (pull_request) Successful in 2m50s
Tests / test (pull_request) Failing after 1h17m49s
Nix / nix (macos-latest) (pull_request) Has been cancelled
2026-04-16 02:53:41 +00:00
Alexander Whitestone
13ef670c05 feat: session compaction with fact extraction (#748)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 29s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 33s
Tests / e2e (pull_request) Successful in 3m26s
Tests / test (pull_request) Failing after 1h28m50s
Before compressing conversation context, extract durable facts
(user preferences, corrections, project details) and save to
fact store so they survive compression.

New agent/session_compactor.py:
- extract_facts_from_messages(): scans user messages for
  preferences, corrections, project/infra facts using regex
- 3 pattern categories: user_pref (5 patterns), correction
  (3 patterns), project (4 patterns)
- ExtractedFact: category, entity, content, confidence, source_turn
- save_facts_to_store(): saves to fact store (callback or auto-detect)
- extract_and_save_facts(): one-call extraction + persistence
- Deduplication by category+content
- Skips tool results, short messages, system messages
- format_facts_summary(): human-readable summary

Tests: tests/test_session_compactor.py (9 tests)

Closes #748
2026-04-15 22:41:54 -04:00
4752a0085e fix: extend JSON repair to remaining json.loads sites in run_agent.py (#809) 2026-04-16 02:40:51 +00:00
b26a6ec23b feat: add repair_and_load_json() to utils.py (#809) 2026-04-16 02:38:01 +00:00
Alexander Whitestone
9f0c410481 feat: batch tool execution with parallel safety checks (#749)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Successful in 35s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 37s
Tests / e2e (pull_request) Successful in 1m48s
Tests / test (pull_request) Failing after 36m13s
Centralized safety classification for tool call batches:

tools/batch_executor.py (new):
- classify_tool_calls() — classifies batch into parallel_safe,
  path_scoped, sequential, never_parallel tiers
- BatchExecutionPlan — structured plan with parallel and sequential batches
- Path conflict detection — write_file + patch on same file go sequential
- Destructive command detection — rm, mv, sed -i, redirects
- execute_parallel_batch() — ThreadPoolExecutor for concurrent execution

tools/registry.py (enhanced):
- ToolEntry.parallel_safe field — tools can declare parallel safety
- registry.register() accepts parallel_safe=True parameter
- registry.get_parallel_safe_tools() — query registry-declared safe tools

Safety tiers:
- parallel_safe: read_file, web_search, search_files, etc.
- path_scoped: write_file, patch (concurrent when paths don't overlap)
- sequential: terminal, delegate_task, unknown tools
- never_parallel: clarify (requires user interaction)

19 tests passing.
2026-04-15 22:17:16 -04:00
b34b5b293d test: add tests for tool hallucination prevention (#836)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 24s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 22s
Tests / e2e (pull_request) Successful in 3m6s
Tests / test (pull_request) Failing after 41m24s
2026-04-16 02:15:59 +00:00
05f9d2b009 feat: integrate poka-yoke validation into tool dispatch (#836)
- Added import for tool_pokayoke module
- Added validation before orchestrator.dispatch calls
- Auto-corrects tool names and parameters
- Returns structured errors with suggestions
- Circuit breaker for consecutive failures

Closes #836
2026-04-16 02:15:17 +00:00
Timmy Time
fb7464995c fix: Ultraplan Mode for daily autonomous planning (closes #840)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 37s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 39s
Tests / test (pull_request) Failing after 1h15m33s
Tests / e2e (pull_request) Successful in 2m20s
2026-04-15 22:14:16 -04:00
7c71b7e73a test: parallel tool calling — 2+ tools per response (#798)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 45s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 1m16s
Tests / e2e (pull_request) Successful in 3m17s
Tests / test (pull_request) Failing after 1h30m54s
2026-04-16 02:13:00 +00:00
4a3068b3b5 test: add regression tests for issue #834 KeyError fix
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 39s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 44s
Tests / e2e (pull_request) Successful in 2m53s
Tests / test (pull_request) Failing after 1h28m32s
2026-04-16 02:12:36 +00:00
a8300ceb43 fix: KeyError 'missing_vars' in _show_tool_availability_warnings (#834) 2026-04-16 02:11:08 +00:00
8ef766beac feat: add tool hallucination prevention module (#836)
- Validates tool names against registered tools
- Auto-corrects parameter names within Levenshtein distance 1
- Circuit breaker for consecutive failures (threshold: 3)
- Structured error messages with suggestions

Closes #836
2026-04-16 02:10:39 +00:00
db72e908f7 Merge pull request 'feat(security): implement Vitalik's secure LLM patterns — privacy filter + confirmation daemon [resolves merge conflict]' (#830) from feat/vitalik-secure-llm-1776303263 into main
Vitalik's secure LLM patterns — privacy filter + confirmation daemon

Clean rebase of #397 onto current main. Resolves merge conflicts in tools/approval.py.
2026-04-16 01:36:58 +00:00
b82b760d5d feat: add Vitalik's threat model patterns to DANGEROUS_PATTERNS
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 41s
Contributor Attribution Check / check-attribution (pull_request) Successful in 51s
Tests / e2e (pull_request) Successful in 5m21s
Tests / test (pull_request) Failing after 45m7s
2026-04-16 01:35:49 +00:00
d8d7846897 feat: add tests/tools/test_confirmation_daemon.py from PR #397 2026-04-16 01:35:24 +00:00
6840d05554 feat: add tests/agent/test_privacy_filter.py from PR #397 2026-04-16 01:35:21 +00:00
8abe59ed95 feat: add tools/confirmation_daemon.py from PR #397 2026-04-16 01:35:18 +00:00
435d790201 feat: add agent/privacy_filter.py from PR #397 2026-04-16 01:35:14 +00:00
Alexander Whitestone
30afd529ac feat: add crisis detection tool — the-door integration (#141)
Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Successful in 44s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 59s
Tests / e2e (pull_request) Successful in 3m49s
Tests / test (pull_request) Failing after 44m1s
New tool: tools/crisis_tool.py
- Wraps the-door's canonical crisis detection (detect.py)
- Scans user messages for despair/suicidal ideation
- Classifies into NONE/LOW/MEDIUM/HIGH/CRITICAL tiers
- Provides recommended actions per tier
- Gateway hook: scan_user_message() for pre-API-call detection
- System prompt injection: compassion_injection based on crisis level
- Optional escalation logging to crisis_escalations.jsonl
- Optional bridge API POST for HIGH+ (configurable via CRISIS_BRIDGE_URL)
- Configurable via crisis_detection: true/false in config.yaml
- Follows the-door design principles: never computes life value,
  never suggests death, errs on side of higher risk

Also: tests/test_crisis_tool.py (9 tests, all passing)
2026-04-15 21:00:06 -04:00
Alexander Whitestone
a244b157be bench: add Gemma 4 vs mimo-v2-pro tool calling benchmark (#796)
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 42s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s
Tests / e2e (pull_request) Successful in 2m26s
Tests / test (pull_request) Failing after 44m7s
100-call regression test across 7 tool categories:
- File operations (20): read_file, write_file, search_files
- Terminal commands (20): shell execution
- Web search (15): web_search
- Code execution (15): execute_code
- Browser automation (10): browser_navigate
- Delegation (10): delegate_task
- MCP tools (10): mcp_list/read/call

Metrics tracked:
- Schema parse success (valid JSON tool calls)
- Tool name accuracy (correct tool selected)
- Arguments accuracy (required args present)
- Average latency per call

Usage:
  python3 benchmarks/tool_call_benchmark.py --model nous:xiaomi/mimo-v2-pro
  python3 benchmarks/tool_call_benchmark.py --model ollama/gemma4:latest
  python3 benchmarks/tool_call_benchmark.py --compare
2026-04-15 18:56:35 -04:00
d86359cbb2 Merge pull request 'feat: robust tool orchestration and circuit breaking' (#811) from feat/robust-tool-orchestration-1776268138150 into main 2026-04-15 16:03:07 +00:00
f264b55b29 refactor: use ToolOrchestrator for robust tool execution
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Successful in 36s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 38s
Tests / e2e (pull_request) Successful in 2m37s
Tests / test (pull_request) Failing after 40m19s
2026-04-15 15:49:02 +00:00
dfe23f66b1 feat: add ToolOrchestrator with circuit breaker 2026-04-15 15:49:00 +00:00