Alexander Whitestone
eab5635a7a
fix: restore /usage account limits ( #958 )
Lint / lint (pull_request) Successful in 9s
2026-04-22 10:36:49 -04:00
d574690abe
Merge pull request 'feat: The Sovereign Accountant — Agent Telemetry' ( #1009 ) from feat/sovereign-accountant-agent-1776866068545 into main
Lint / lint (pull_request) Successful in 19s
Lint / lint (push) Successful in 28s
2026-04-22 13:55:16 +00:00
e208885de6
feat: wire telemetry hooks into auxiliary client
Lint / lint (pull_request) Successful in 10s
2026-04-22 13:54:32 +00:00
cd84fa2084
feat: add telemetry logger for token accounting
2026-04-22 13:54:30 +00:00
63babca056
Merge pull request 'docs: poka-yoke integration phase 3 status ( #967 )' ( #976 ) from fix/967 into main
Lint / lint (push) Successful in 11s
2026-04-22 13:39:43 +00:00
cab3c82c5c
Merge pull request '[claude] Add update/restart action buttons to web dashboard ( #961 )' ( #968 ) from claude/issue-961 into main
Lint / lint (push) Has been cancelled
2026-04-22 13:39:36 +00:00
64a8059f9f
Merge pull request '[claude] Verify hardcoded-home path guard on burn/921 branch ( #962 )' ( #964 ) from claude/issue-962 into main
Lint / lint (push) Has been cancelled
2026-04-22 13:39:32 +00:00
90f6fdef60
Merge pull request 'feat: Autonomous Regression Sentry — verify_impact tool' ( #970 ) from feat/impact-analysis-tool-1776826592325 into main
Lint / lint (push) Successful in 11s
2026-04-22 13:38:47 +00:00
18e3533a0a
Merge pull request 'feat: The Budgetary Sovereign Router — Efficiency Sauce' ( #1008 ) from feat/budgetary-router-1776864510362 into main
Lint / lint (push) Has been cancelled
2026-04-22 13:38:40 +00:00
60ccd825ec
Merge pull request 'feat: The Sovereign Teleport — State Migration Sauce' ( #1007 ) from feat/sovereign-teleport-1776864503956 into main
Lint / lint (push) Has been cancelled
2026-04-22 13:38:36 +00:00
e7d5a7f2cf
Merge pull request 'feat: The Scavenger Fixer — Closing the Autonomous Loop' ( #975 ) from feat/autonomous-scavenger-fix-1776827712502 into main
Lint / lint (push) Successful in 13s
2026-04-22 13:38:03 +00:00
9aaac192cf
Merge pull request 'test( #798 ): Parallel tool calling — 2+ tools per response' ( #988 ) from fix/798 into main
Lint / lint (push) Successful in 9s
2026-04-22 13:36:37 +00:00
f3d88ec31d
Merge pull request '[claude] Wire Gemma 4 vision into browser_tool for screenshot analysis ( #816 )' ( #947 ) from claude/issue-816 into main
Lint / lint (push) Successful in 13s
2026-04-22 13:36:20 +00:00
2f22570622
Merge pull request 'feat(web-console): Self-healing browser CDP + operator cockpit ( #394 )' ( #934 ) from feat/web-console-394 into main
Lint / lint (push) Has been cancelled
2026-04-22 13:36:14 +00:00
2022322606
Merge pull request 'feat: Deep Dive Security Integration - Multilayer Defense' ( #929 ) from feat/security-deep-dive-1776732106631 into main
Lint / lint (push) Has been cancelled
2026-04-22 13:36:08 +00:00
d6ec32fe93
Merge pull request 'feat: implement SHIELD Multilingual Defense & Input Sanitization' ( #918 ) from feat/shield-multilingual-1776700482647 into main
Lint / lint (push) Has been cancelled
2026-04-22 13:36:05 +00:00
2b284e75f6
Merge pull request 'feat: Multi-Agent Concurrency Guard — "Secret Sauce" for Fleet Scaling' ( #969 ) from feat/fleet-concurrency-guard-1776826501792 into main
Lint / lint (push) Successful in 16s
2026-04-22 13:29:01 +00:00
efa1fc034e
feat: Budgetary Sovereign Router — Complexity-aware steering
Lint / lint (pull_request) Successful in 25s
2026-04-22 13:28:31 +00:00
99d925d40b
feat: Sovereign Teleport — Cross-environment agent migration
Lint / lint (pull_request) Successful in 28s
2026-04-22 13:28:25 +00:00
Alexander Whitestone
ed250b1ca8
test( #798 ): Strengthen parallel tool calling tests + fix flaky concurrent tests
...
Lint / lint (pull_request) Successful in 10s
- Add TestAIAgentConcurrentExecution with 8 integration tests exercising
_execute_tool_calls_concurrent through AIAgent for 2/3/4-tool batches,
pass-rate reporting, and Gemma 4-style read patterns.
- Fix test_malformed_json_args_forces_sequential: use JSON array '[1,2,3]'
instead of unrepairable garbage now that repair_and_load_json handles
most malformed input.
- Fix test_concurrent_handles_tool_error: replace racy call_count list
with deterministic failure based on tool_call_id to eliminate flaky
failures under ThreadPoolExecutor.
Closes #798
2026-04-22 01:34:24 -04:00
Alexander Whitestone
1f5067e94a
Merge: bring in prior QA work on path guard (Refs #962 )
Lint / lint (pull_request) Successful in 15s
2026-04-22 00:25:50 -04:00
Alexander Whitestone
798ca3aa06
chore: sync with remote claude/issue-961 branch
...
Lint / lint (pull_request) Successful in 22s
Refs #961
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 00:04:51 -04:00
Alexander Whitestone
5d3e13ede2
test: add pre-commit path guard hook from burn/921 (Refs #962 )
...
Lint / lint (pull_request) Successful in 24s
Brings hooks/pre-commit-path-guard.py from burn/921-poka-yoke-hardcoded-paths
to complete QA verification of all guard layers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 23:55:38 -04:00
82a076bf4d
docs: poka-yoke integration phase 3 status ( #967 )
Lint / lint (pull_request) Successful in 8s
2026-04-22 03:24:26 +00:00
16eab5d503
Merge pull request '[claude] A2A auth — mutual TLS between fleet agents ( #806 )' ( #948 ) from claude/issue-806 into main
...
Lint / lint (push) Successful in 13s
Merge PR #948 : A2A auth — mutual TLS between fleet agents (#806 )
2026-04-22 03:19:42 +00:00
81f7347bcb
feat: Scavenger Fixer — Autonomous tech debt healing
Lint / lint (pull_request) Successful in 22s
2026-04-22 03:15:17 +00:00
c7a2d439c1
Merge pull request 'feat: The Sovereign Scavenger — Automated Tech Debt Recovery' ( #974 ) from feat/sovereign-scavenger-1776827259631 into main
Lint / lint (push) Successful in 12s
2026-04-22 03:14:14 +00:00
8ad8520bd2
Merge pull request 'feat: Execution Safety Sentry — GOFAI Risk Analysis' ( #973 ) from feat/static-analyzer-gofai-1776826921747 into main
Lint / lint (push) Has been cancelled
2026-04-22 03:14:07 +00:00
9c7c88823f
Merge pull request 'feat: Local Inference Story — Freeing the fleet from cloud dependency' ( #972 ) from feat/local-inference-bridge-1776826896029 into main
Lint / lint (push) Has been cancelled
2026-04-22 03:14:03 +00:00
aa45e02238
Merge pull request 'feat: GOFAI Semantic Sentry — Deterministic code verification' ( #971 ) from feat/symbolic-verify-gofai-1776826842170 into main
Lint / lint (push) Has been cancelled
2026-04-22 03:14:01 +00:00
3266c39e8e
feat: Sovereign Scavenger — Turning tech debt into actionable backlog
Lint / lint (pull_request) Successful in 18s
2026-04-22 03:07:40 +00:00
Alexander Whitestone
e8886f10c8
feat: add Update Hermes and Restart Gateway action buttons to web dashboard
...
Lint / lint (pull_request) Successful in 10s
Implements the action button lifecycle described in #961 :
- POST /api/actions/restart-gateway — sends SIGTERM to the gateway PID
- POST /api/actions/update-hermes — runs pip upgrade in a background job
- GET /api/actions/jobs/{job_id} — polls job status/output
Frontend (StatusPage.tsx):
- "Restart Gateway" button with spinning icon while running, then
success/error message that clears after 5–8 s
- "Update Hermes" button that polls the job endpoint every 2 s;
shows collapsible pip output on completion
- Page remains responsive (buttons disabled only during their own action)
Also adds i18n strings to en.ts, zh.ts, and the shared types.ts interface.
Fixes #961
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 23:04:10 -04:00
93a855d4e3
feat: Static Risk Analyzer (GOFAI) for execution safety
Lint / lint (pull_request) Successful in 8s
2026-04-22 03:02:02 +00:00
5a0bdb556e
feat: Local Inference Bridge — Bypassing cloud for local tasks
Lint / lint (pull_request) Successful in 17s
2026-04-22 03:01:37 +00:00
d619d279f8
feat: Symbolic Sentry (GOFAI) for deterministic code audits
Lint / lint (pull_request) Successful in 15s
2026-04-22 03:00:44 +00:00
d3b13a6aa5
feat: add verify_impact tool for regression guarding
Lint / lint (pull_request) Successful in 16s
2026-04-22 02:56:33 +00:00
77d2430a44
feat: add Fleet-Wide File Concurrency Guard
Lint / lint (pull_request) Successful in 19s
2026-04-22 02:55:04 +00:00
Alexander Whitestone
d2ce6b8749
test: verify action endpoints for restart-gateway and update-hermes
...
Lint / lint (pull_request) Successful in 27s
Add TestActionEndpoints class to test_web_server.py covering:
- POST /api/actions/restart-gateway sends SIGUSR1 to gateway PID
- 409 when gateway is not running
- 500 when os.kill raises a signal error
- POST /api/actions/update-hermes returns ok=true on zero exit
- ok=false on non-zero exit code with stderr in detail
- ok=false on timeout
- Both endpoints reject unauthenticated requests
All 7 new tests pass (83 total in the file).
Refs #961
2026-04-21 22:41:27 -04:00
Alexander Whitestone
a8a086548d
feat: add restart gateway and update Hermes action buttons to web dashboard
...
Lint / lint (pull_request) Successful in 29s
Implements the update/restart action buttons called out in issue #961 :
- Backend (web_server.py): two new POST endpoints
- /api/actions/restart-gateway — sends SIGUSR1 to the running gateway PID
- /api/actions/update-hermes — runs `hermes update --yes` in a subprocess
- Frontend (api.ts): restartGateway() / updateHermes() API helpers + ActionResponse type
- UI (StatusPage.tsx): "Actions" card with Restart Gateway and Update Hermes buttons
- idle → running (spinner) → success/failure states
- feedback detail text; auto-resets to idle after 8 s
- i18n: new status.actions / restartGateway / updateHermes strings in en, zh, and types
Refs #961
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 22:30:22 -04:00
Alexander Whitestone
9e00a59791
test: verify hardcoded-home path guard from burn/921 branch
...
Lint / lint (pull_request) Successful in 29s
Cherry-picks tools/path_guard.py and tests/test_path_guard.py from
burn/921-poka-yoke-hardcoded-paths (commit 5dcb905 ). All 21 tests pass:
- hardcoded /Users/<name>/ paths are rejected at runtime
- hardcoded /home/<name>/ paths are rejected at runtime
- ~/.hermes/... via expanduser() passes (safe, expanded at runtime)
- valid relative and /tmp/ absolute paths pass
- static scanner catches violations and respects # noqa: hardcoded-path-ok
- comments are skipped by scanner
- directory scanner skips test files and __pycache__
Refs #962
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 22:26:54 -04:00
Alexander Whitestone
9ef7682ee2
chore: merge remote claude/issue-816 — deduplicate gemma-4-27b-it in models.py
...
Lint / lint (pull_request) Successful in 30s
Merged prior implementation (PR #947 ) and resolved conflicts.
Removed duplicate "gemma-4-27b-it" entry introduced during merge.
Refs #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 21:27:51 -04:00
Alexander Whitestone
e157a22639
feat: wire Gemma 4 vision into browser_tool for screenshot analysis
...
- Add `_BROWSER_VISION_DEFAULT_MODEL = "google/gemma-4-27b-it"` constant
- Rewrite `_get_vision_model()` with 4-tier resolution:
1. BROWSER_VISION_MODEL env var (browser-specific override)
2. auxiliary.browser_vision.model config key
3. AUXILIARY_VISION_MODEL env var (backward compat)
4. google/gemma-4-27b-it default (Gemma 4 native multimodal)
- Extract `_load_browser_vision_config()` helper for testability
- Always set call_kwargs["model"] (remove redundant `if vision_model` guard)
- Read timeout from auxiliary.browser_vision.timeout before auxiliary.vision.timeout
- Register gemma-4-27b-it in Gemini provider model catalog
- Document auxiliary.browser_vision section in cli-config.yaml.example
- Add 12 unit tests in tests/tools/test_browser_vision_model.py covering all
resolution tiers, backward compat, error fallthrough, and type guarantees
Fixes #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 21:26:03 -04:00
Alexander Whitestone
671283389c
feat: Wire Gemma 4 vision into browser_tool for screenshot analysis
...
Lint / lint (pull_request) Successful in 8s
_get_vision_model() now resolves via a layered priority chain:
1. BROWSER_VISION_MODEL env var (browser-specific override)
2. config.yaml browser.vision_model
3. AUXILIARY_VISION_MODEL env var (backward-compat shared override)
4. google/gemma-4-27b-it — Gemma 4 native multimodal default
Add browser.vision_model config key to hermes_cli/config.py defaults
with inline documentation.
call_kwargs["model"] is now always set (model is never None), and a
debug log line records which model is in use for each screenshot.
Fixes #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 20:51:04 -04:00
Alexander Whitestone
17cc4bac90
feat: complete Gemma 4 browser_vision wiring — task routing, timeout, tests
...
Lint / lint (pull_request) Successful in 10s
Building on the Gemma 4 default already on this branch:
- Change call_llm() task from "vision" to "browser_vision" in browser_vision()
so auxiliary.browser_vision.* config is consulted for provider/model/timeout
- Route call_llm(task="browser_vision") through the vision provider resolution
path in auxiliary_client.py (same as task="vision")
- Fix timeout resolution: check auxiliary.browser_vision.timeout before
auxiliary.vision.timeout (allows browser-specific timeout override)
- Add timeout option to auxiliary.browser_vision in cli-config.yaml.example
- Add test_browser_vision_gemma4.py covering: task routing assertions,
call_llm() vision branch routing, and timeout config key ordering
Refs #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 19:43:42 -04:00
Alexander Whitestone
1843545d66
chore: merge remote branch — resolve conflicts, use canonical implementation
...
Lint / lint (pull_request) Successful in 8s
Merge remote claude/issue-816 which contains the full Gemma 4 browser
vision implementation. Resolved conflicts by taking the remote's cleaner
variable names and docstrings while keeping the same 4-tier resolution
logic. All 12 tests pass.
Refs #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 18:50:22 -04:00
Alexander Whitestone
c643ac90da
feat: wire Gemma 4 vision into browser_tool for screenshot analysis
...
- Add `_BROWSER_VISION_DEFAULT_MODEL = "google/gemma-4-27b-it"` constant
- Rewrite `_get_vision_model()` with 4-tier resolution:
1. BROWSER_VISION_MODEL env var (browser-specific override)
2. auxiliary.browser_vision.model config key
3. AUXILIARY_VISION_MODEL env var (backward compat)
4. Gemma 4 27B default
- Remove `if vision_model:` guard — function now always returns a string
- Update browser_vision tool description to surface Gemma 4 as default
- Register gemma-4-27b-it in Gemini provider model catalog (models.py)
- Document auxiliary.browser_vision.model in cli-config.yaml.example
- Add 14 unit tests covering all priority levels and backward compat
Fixes #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 18:47:03 -04:00
Alexander Whitestone
da9c4cf10c
feat: wire Gemma 4 vision into browser_tool for screenshot analysis
...
Lint / lint (pull_request) Successful in 7s
Extends `_get_vision_model()` with a 5-level resolution chain:
1. `BROWSER_VISION_MODEL` env var — browser-specific override
2. `auxiliary.browser.vision_model` config key — per-install default
3. `AUXILIARY_VISION_MODEL` env var — backward-compat shared override
4. Auto-select `gemma-4-27b-it` when the main provider is Gemini/Google
5. `None` — fall through to `call_llm` vision router
Adds `_BROWSER_VISION_DEFAULT_MODEL = "gemma-4-27b-it"` constant and
registers `gemma-4-27b-it` in the Gemini provider model catalog.
16 new tests in `tests/tools/test_browser_vision_model.py` cover each
priority level, edge cases (empty env, config exceptions, wrong provider).
Fixes #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 18:18:30 -04:00
Alexander Whitestone
4214082fb6
feat: A2A auth — mutual TLS between fleet agents
...
Lint / lint (pull_request) Successful in 8s
Implements mTLS for securing agent-to-agent communication in the Hermes
fleet. Fixes #806 .
Changes:
- scripts/gen_fleet_ca.sh: generate a self-signed Fleet CA (4096-bit RSA,
10-year validity) that signs all agent certificates
- scripts/gen_agent_cert.sh: generate per-agent certs (Timmy, Allegro,
Ezra) signed by the fleet CA with SAN entries and clientAuth/serverAuth
extended key usage
- agent/mtls.py: new module providing:
- build_server_ssl_context() — TLS_SERVER context with CERT_REQUIRED,
enforces client cert against Fleet CA
- build_client_ssl_context() — TLS_CLIENT context for outbound A2A calls
- MTLSMiddleware — ASGI middleware that rejects unauthenticated requests
to A2A routes (/.well-known/agent-card*, /api/agent-card, /a2a/) with
HTTP 403 when mTLS is enabled
- is_mtls_configured() — checks HERMES_MTLS_CERT/KEY/CA env vars
- hermes_cli/web_server.py: wire MTLSMiddleware into the FastAPI app;
pass SSL context to uvicorn when HERMES_MTLS_* env vars are set so
the server runs TLS with mandatory client cert verification
- ansible/roles/hermes_mtls/: Ansible role to distribute Fleet CA cert,
agent cert, and agent key to fleet nodes; writes an env file with
HERMES_MTLS_* vars and restarts the hermes-gateway service
- ansible/fleet_mtls.yml: fleet-wide playbook referencing the role for
Timmy, Allegro, and Ezra nodes
- tests/test_mtls.py: 15 tests covering is_mtls_configured, SSL context
creation with real cryptography-generated certs, and MTLSMiddleware
(unauthorized agent rejected → 403, authorized agent accepted → 200)
mTLS is opt-in: set HERMES_MTLS_CERT, HERMES_MTLS_KEY, and HERMES_MTLS_CA
to enable. When unset, the server behaves exactly as before.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 18:04:00 -04:00
Alexander Whitestone
95bb842a21
feat: Wire Gemma 4 vision into browser_tool for screenshot analysis
...
Lint / lint (pull_request) Successful in 8s
Default browser_vision screenshots to google/gemma-4-27b-it (Gemma 4
native multimodal) for reduced latency and unified text+vision model.
Resolution order for _get_vision_model():
1. BROWSER_VISION_MODEL env var (new, browser-specific override)
2. auxiliary.browser_vision.model in config.yaml (new config key)
3. AUXILIARY_VISION_MODEL env var (existing global vision override)
4. Default: google/gemma-4-27b-it
Backward compatibility: existing AUXILIARY_VISION_MODEL users are
unaffected — their override still flows through to browser_vision.
Also documents the new auxiliary.browser_vision config section in
cli-config.yaml.example and adds 14 unit tests covering the full
priority chain.
Fixes #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 17:14:32 -04:00
Alexander Whitestone
ac28444bf2
feat: add A2AMTLSServer routing API, A2AMTLSClient, and expand tests to 20 ( #806 )
...
Lint / lint (pull_request) Successful in 9s
Builds on the existing A2AServer / build_*_ssl_context foundation:
- agent/a2a_mtls.py:
- Add A2AMTLSServer: routing-based HTTPS server with add_route() and
context-manager (__enter__/__exit__) lifecycle support
- Add A2AMTLSClient: fleet-cert-presenting HTTP client with .get() / .post()
- Widen imports (json, Callable, Dict, urlopen)
- tests/agent/test_a2a_mtls.py:
- Fix datetime.utcnow() deprecation — use datetime.now(timezone.utc)
- Add TestA2AMTLSServerAndClient (9 tests): routing GET/POST, 404,
context-manager stop, rogue-cert rejection, A2AMTLSClient, concurrency
- Total: 11 → 20 passing tests
Refs #806
2026-04-21 15:21:10 -04:00
Alexander Whitestone
12b5d9a7fd
refactor: remove redundant vision_model guard in browser_vision
...
Lint / lint (pull_request) Successful in 10s
_get_vision_model() now always returns a non-empty string (Gemma 4 default
or configured override), so the `if vision_model:` conditional guard is
unnecessary. Replace with unconditional assignment and add a debug log
line showing which model was selected.
Refs #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 14:09:40 -04:00
Alexander Whitestone
91faf6f956
feat: A2A auth — mutual TLS between fleet agents
...
Lint / lint (pull_request) Successful in 10s
Implements mutual TLS for secure agent-to-agent communication (#806 ).
- scripts/gen_fleet_ca.sh: generate fleet CA (4096-bit RSA, 10-year)
- scripts/gen_agent_cert.sh: per-agent cert signed by fleet CA (timmy, allegro, ezra)
- agent/a2a_mtls.py: A2AServer requiring client cert verification (CERT_REQUIRED),
build_server_ssl_context / build_client_ssl_context helpers, server_from_env()
- ansible/roles/fleet_mtls_certs/: distribute CA + per-agent certs to fleet nodes,
write /etc/hermes/a2a.env, notify hermes-a2a service on change
- ansible/fleet_mtls.yml + ansible/inventory/fleet.ini.example: playbook + example inventory
- tests/agent/test_a2a_mtls.py: 11 tests — authorized agent accepted (200/202),
self-signed cert rejected, no-cert rejected, lifecycle, env-var wiring
Fixes #806
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 13:28:28 -04:00
Alexander Whitestone
b6398b8b0d
feat: wire Gemma 4 vision into browser_tool for screenshot analysis
...
Lint / lint (pull_request) Successful in 19s
Default browser screenshot analysis now uses Gemma 4 27B
(google/gemma-4-27b-it) instead of deferring to the auxiliary router's
auto-detection. Gemma 4 is natively multimodal — the same model family
already in use for text tasks — which avoids cold-start model-switching
overhead and improves context continuity.
Resolution order for _get_vision_model():
1. BROWSER_VISION_MODEL env var (browser-specific override)
2. auxiliary.browser_vision.model in config.yaml
3. AUXILIARY_VISION_MODEL env var (shared/legacy override)
4. google/gemma-4-27b-it (new default)
- Add _BROWSER_VISION_DEFAULT_MODEL constant to browser_tool.py
- Document auxiliary.browser_vision config key in cli-config.yaml.example
- Add 10 unit tests covering all resolution steps
Fixes #816
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 12:49:46 -04:00
a2a40429bd
Merge pull request '[claude] Poka-yoke: auto-revert incomplete skill edits ( #923 )' ( #946 ) from claude/issue-923 into main
Lint / lint (push) Successful in 10s
2026-04-21 16:38:24 +00:00
ee61c5fa9d
Merge pull request 'feat: Add queue health check script' ( #912 ) from feat/queue-health-check into main
Lint / lint (push) Successful in 34s
2026-04-21 15:37:59 +00:00
Alexander Whitestone
1fece10569
feat: poka-yoke auto-revert for incomplete skill edits ( #923 )
...
Lint / lint (pull_request) Successful in 32s
Implement a transactional write-validate-commit-or-rollback pattern for
all skill_manage write operations (edit, patch, write_file):
- _backup_skill_file: timestamped .bak.{ts} snapshot before every write
- _validate_written_file: re-reads from disk after write to catch truncation,
encoding errors, and broken YAML frontmatter
- _revert_from_backup: restores original content (or removes the corrupted
file) on any validation failure
- _cleanup_old_backups: prunes to MAX_BACKUPS_PER_FILE (3) after success;
failed edits keep their .bak file as a debugging aid
Also fixes pre-existing issue where _patch_skill error returns lacked a
`suggestion` field expected by test_skill_manager_error_context.py tests.
Adds 21 tests in test_skill_manager_autorevert.py covering every component
and an end-to-end simulation of mid-write failure + auto-revert.
Fixes #923
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 11:37:55 -04:00
46668505bc
Merge pull request 'feat: tool fixation detection — break repetitive loops ( #886 )' ( #914 ) from fix/886 into main
Lint / lint (push) Has been cancelled
2026-04-21 15:35:08 +00:00
cac0c8224e
Merge pull request 'fix: circuit breaker for error cascading (2.33x amplification)' ( #927 ) from fix/885-circuit-breaker into main
Lint / lint (push) Has been cancelled
2026-04-21 15:35:04 +00:00
f38a64455d
Merge pull request '[claude] Gateway config debt: add validation tests and API_SERVER_KEY warning ( #892 )' ( #915 ) from claude/issue-892 into main
Lint / lint (push) Has been cancelled
2026-04-21 15:33:19 +00:00
1b35a5a0d2
Merge pull request 'feat: Poka-yoke — hardcoded path guard ( #921 )' ( #928 ) from fix/921-hardcoded-path-guard into main
Lint / lint (push) Has been cancelled
2026-04-21 15:33:14 +00:00
9172131b25
Merge pull request 'docs: tool investigation report from awesome-ai-tools ( #926 )' ( #931 ) from fix/926 into main
Lint / lint (push) Has been cancelled
2026-04-21 15:33:12 +00:00
407eab3331
Merge pull request 'feat: session deterministic seeding & marathon limits' ( #919 ) from feat/session-management-1776700585635 into main
Lint / lint (push) Has been cancelled
2026-04-21 15:29:44 +00:00
cf090a966d
Merge pull request 'fix: Poka-yoke — detect and block tool hallucination before API calls ( #922 )' ( #935 ) from fix/922 into main
Lint / lint (push) Has been cancelled
2026-04-21 15:29:35 +00:00
b65be9b12c
Merge pull request '[claude] Add tool investigation report: top 5 awesome-ai-tools recommendations ( #926 )' ( #936 ) from claude/issue-926 into main
Lint / lint (push) Has been cancelled
2026-04-21 15:29:32 +00:00
3c1cff255e
Merge pull request 'ci: integrate hardcoded path linter into CI workflow' ( #938 ) from fix/865-ci-path-linter into main
Lint / lint (push) Has been cancelled
2026-04-21 15:29:30 +00:00
690d100afc
Merge pull request 'feat: Poka-yoke token budget — progressive context overflow guard ( #925 )' ( #943 ) from burn/925-1776770102 into main
Docker Build and Publish / build-and-push (push) Has been skipped
Nix / nix (ubuntu-latest) (push) Failing after 5s
Tests / e2e (push) Successful in 5m8s
Tests / test (push) Failing after 30m13s
Nix / nix (macos-latest) (push) Has been cancelled
2026-04-21 15:29:02 +00:00
c6f0831738
Merge pull request 'feat: Python syntax validation before execute_code ( #913 )' ( #917 ) from fix/913-syntax-validation into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:27:05 +00:00
30773ac1f9
Merge pull request 'fix: Path validation before read_file — poka-yoke ( #887 )' ( #911 ) from fix/887-path-validation-read-file into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:55 +00:00
feb24bd08c
Merge pull request 'feat: Block silent credential exposure in tool outputs ( #839 )' ( #910 ) from fix/839-1776403070 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:47 +00:00
bc55f40505
Merge pull request 'feat: time-aware model routing for cron jobs ( #889 )' ( #909 ) from fix/889 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:43 +00:00
2adc72335e
Merge pull request 'fix: profile session isolation — tag and filter by profile' ( #907 ) from fix/891-profile-isolation into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:26:39 +00:00
ab32670464
Merge pull request 'feat: Poka-yoke — detect and block tool hallucination before API calls ( #922 )' ( #944 ) from burn/922-1776770102 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:23:56 +00:00
bfc0231297
Merge pull request 'docs: holographic + vector hybrid memory architecture ( #879 )' ( #942 ) from fix/879 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:23:49 +00:00
cf2b09cf2f
Merge pull request 'docs: emotional presence patterns for crisis support ( #880 )' ( #941 ) from fix/880 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:23:45 +00:00
719bb537c0
Merge pull request 'feat: provider preflight validation before session start ( #924 )' ( #932 ) from fix/924 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:23:02 +00:00
0bcbcf19ac
Merge pull request 'feat: time-aware model routing for cron jobs #889 ' ( #906 ) from fix/time-aware-routing-889 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:22:37 +00:00
27d2f2ca0e
Merge pull request 'feat: Prevent context window overflow via proactive token counting ( #838 )' ( #905 ) from fix/838-1776402240 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / e2e (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-04-21 15:22:31 +00:00
7e7dcfa345
Merge pull request 'fix: Gateway config validation and fallback fixes ( #892 )' ( #904 ) from fix/892 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / e2e (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-04-21 15:22:22 +00:00
ba0e614446
Merge pull request 'feat: integrate 988 Suicide & Crisis Lifeline — automatic crisis escalation ( #673 )' ( #903 ) from feat/673 into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / test (push) Has been cancelled
Tests / e2e (push) Has been cancelled
2026-04-21 15:22:17 +00:00
4f5e641c92
Merge pull request 'fix: kill 9 dead cron jobs — audit and cleanup script' ( #902 ) from fix/890-dead-cron-jobs into main
Docker Build and Publish / build-and-push (push) Has been cancelled
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
Tests / e2e (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-04-21 15:22:15 +00:00
d61bd141f9
feat: add poka-yoke validation to non-execute_code dispatch ( #922 )
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 32s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s
Tests / e2e (pull_request) Successful in 3m5s
Tests / test (pull_request) Failing after 36m26s
2026-04-21 12:01:57 +00:00
a4058af238
feat: wire poka-yoke validation into tool dispatch ( #922 )
2026-04-21 12:00:20 +00:00
08432a5618
test: poka-yoke validation tests ( #922 )
2026-04-21 11:59:26 +00:00
a875c6ed91
feat: poka-yoke tool call validation firewall ( #922 )
2026-04-21 11:59:25 +00:00
07c5b5b83d
test: add token budget poka-yoke tests ( #925 )
Contributor Attribution Check / check-attribution (pull_request) Failing after 44s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s
Tests / test (pull_request) Failing after 25m21s
Tests / e2e (pull_request) Successful in 3m18s
2026-04-21 11:41:39 +00:00
ba56567631
docs: holographic + vector hybrid memory architecture ( #879 )
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s
Tests / test (pull_request) Failing after 14m3s
Tests / e2e (pull_request) Successful in 1m53s
2026-04-21 11:41:31 +00:00
8ac26f54a5
feat: token budget with progressive poka-yoke thresholds ( #925 )
2026-04-21 11:40:39 +00:00
b807972d05
docs: emotional presence patterns for crisis support ( #880 )
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 52s
Tests / e2e (pull_request) Successful in 3m53s
Tests / test (pull_request) Failing after 53m34s
2026-04-21 11:37:57 +00:00
6b5a6db668
ci: add Gitea Actions lint workflow
...
Lint / lint (pull_request) Successful in 15s
Part of #865 . Runs hardcoded path linter on every push/PR.
2026-04-21 11:37:33 +00:00
b702249c12
ci: add hardcoded path linter to CI workflow
...
Closes #865
Runs scripts/lint_hardcoded_paths.py as a CI check.
Uses continue-on-error for now since the linter may have false positives.
2026-04-21 11:37:31 +00:00
Alexander Whitestone
8023c9b8f2
docs: add tool investigation report for top 5 awesome-ai-tools recommendations
...
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 55s
Tests / e2e (pull_request) Successful in 3m56s
Tests / test (pull_request) Failing after 54m0s
Persists the research report from issue #926 as a markdown file following
the existing convention of research_*.md files in the repo. Documents the
top 5 tool recommendations (LiteLLM, Mem0, RAGFlow, LiteRT-LM, Claude-Mem)
with integration effort, impact scores, and phased implementation plan.
Refs #926
2026-04-21 07:26:44 -04:00
TERRA
9edd5383e7
feat: add hermes web console cockpit and browser self-healing ( #394 )
Contributor Attribution Check / check-attribution (pull_request) Failing after 36s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 31s
Tests / e2e (pull_request) Successful in 3m37s
Tests / test (pull_request) Failing after 38m26s
2026-04-21 02:00:41 -04:00
TERRA
f6c072f136
wip: add web console cockpit regression tests for #394
2026-04-21 02:00:41 -04:00
6eeee39c10
test( #922 ): Add tests for tool hallucination detection
...
Contributor Attribution Check / check-attribution (pull_request) Failing after 1m15s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 1m8s
Tests / e2e (pull_request) Successful in 3m44s
Tests / test (pull_request) Failing after 1h9m15s
Tests for validation firewall:
- Unknown tool detection
- Missing required params
- Wrong type detection
- Hallucination patterns
- Rejection stats
Refs #922
2026-04-21 05:38:54 +00:00
b2d2d2c650
fix( #922 ): Poka-yoke — detect and block tool hallucination
...
Validation firewall between LLM tool-call output and execution:
1. Unknown tool names rejected
2. Malformed parameters caught
3. Missing required arguments detected
4. Hallucination patterns detected
All rejections logged with model provenance.
Agent receives rejection as tool result for self-correction.
Resolves #922
2026-04-21 05:38:22 +00:00
5b62bb8d81
feat( #394 ): Hermes web UI operator cockpit
...
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 43s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 1m9s
Tests / e2e (pull_request) Successful in 6m9s
Tests / test (pull_request) Failing after 1h3m4s
Minimal web interface for Hermes operation:
- Chat interface with streaming
- System status monitoring
- Crisis detection display
- Session management
- Dark theme, responsive design
Source-backed: Hermes Atlas pattern.
Refs #394
2026-04-21 05:34:22 +00:00
10f9fd690a
feat( #394 ): Self-healing browser CDP layer (browser-harness)
...
Source-backed browser automation:
- CDP connection with auto-reconnect
- Self-healing on disconnects
- Screenshot, DOM inspection, JS evaluation
- Click, type, navigate primitives
- Session persistence
Refs #394
2026-04-21 05:33:32 +00:00
bdd0f2709b
feat: provider preflight validation before session start ( #924 )
Contributor Attribution Check / check-attribution (pull_request) Failing after 47s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 52s
Tests / test (pull_request) Failing after 30m48s
Tests / e2e (pull_request) Successful in 2m9s
2026-04-21 04:48:57 +00:00
a9cbf7d69f
docs: tool investigation report from awesome-ai-tools ( #926 )
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 36s
Tests / e2e (pull_request) Successful in 2m56s
Tests / test (pull_request) Failing after 34m20s
2026-04-21 04:45:03 +00:00
b64f4d9632
feat: update run_agent.py for deep dive security
Contributor Attribution Check / check-attribution (pull_request) Failing after 28s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 4s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 35s
Tests / test (pull_request) Failing after 1h0m5s
Tests / e2e (pull_request) Failing after 6m56s
Nix / nix (macos-latest) (pull_request) Has been cancelled
2026-04-21 00:41:55 +00:00
7caaf49a34
feat: deep dive integration of tests/test_shield_multilingual.py
2026-04-21 00:41:53 +00:00
e52f6d2cde
feat: deep dive integration of tools/shield/detector.py
2026-04-21 00:41:52 +00:00
000d64deed
feat: deep dive integration of agent/input_sanitizer.py
2026-04-21 00:41:50 +00:00
d527cb569b
feat: deep dive integration of agent/shield.py
2026-04-21 00:41:49 +00:00
44ada06fd4
feat: update agent/privacy_filter.py for deep dive security
2026-04-21 00:41:48 +00:00
4cdda8701d
feat: integrate hardcoded path guard into tool dispatch
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 32s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s
Tests / e2e (pull_request) Successful in 2m56s
Tests / test (pull_request) Failing after 1h1m7s
2026-04-21 00:31:01 +00:00
a80d30b342
feat: add pre-commit hook for hardcoded path detection
2026-04-21 00:29:33 +00:00
f098cf8c4a
feat: add hardcoded path guard module ( #921 )
...
- Detects /Users/, /home/, ~/ in tool arguments
- Source code scanner for CI/pre-commit
- Runtime guard for tool dispatch
- noqa: hardcoded-path-ok escape hatch
Closes #921
2026-04-21 00:29:12 +00:00
30509b9c7c
test: circuit breaker tests
...
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 38s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 40s
Tests / e2e (pull_request) Successful in 1m36s
Tests / test (pull_request) Failing after 17m13s
Part of #885
2026-04-21 00:28:15 +00:00
ccaa1cb021
feat: circuit breaker for error cascading
...
Closes #885
2.33x error cascade factor detected. After 3 consecutive errors,
circuit opens and agent must take corrective action.
Recovery pattern: terminal is the safety net (2300 recoveries).
2026-04-21 00:28:14 +00:00
c6f2855745
fix: restore _format_error helper for test compatibility ( #916 )
...
Docker Build and Publish / build-and-push (push) Has been skipped
Nix / nix (ubuntu-latest) (push) Failing after 2s
Tests / e2e (push) Successful in 2m47s
Tests / test (push) Failing after 27m41s
Build Skills Index / build-index (push) Has been skipped
Build Skills Index / deploy-with-index (push) Has been skipped
Nix / nix (macos-latest) (push) Has been cancelled
fix: restore _format_error helper for test compatibility (#916 )
2026-04-20 23:56:27 +00:00
9d180f31cc
feat: add session templates
Contributor Attribution Check / check-attribution (pull_request) Failing after 43s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 45s
Tests / test (pull_request) Failing after 45m24s
Tests / e2e (pull_request) Failing after 7m35s
2026-04-20 15:56:26 +00:00
3d8cf5122a
feat: add agent/shield.py for SHIELD defense
Contributor Attribution Check / check-attribution (pull_request) Failing after 31s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 40s
Tests / e2e (pull_request) Successful in 2m2s
Tests / test (pull_request) Failing after 52m0s
2026-04-20 15:54:48 +00:00
790b677978
feat: add tests/test_shield_multilingual.py for SHIELD defense
2026-04-20 15:54:46 +00:00
9a749d2854
feat: add agent/input_sanitizer.py for SHIELD defense
2026-04-20 15:54:45 +00:00
68534e78be
feat: add tools/shield/detector.py for SHIELD defense
2026-04-20 15:54:43 +00:00
c17f64fa2c
test: add syntax validation tests ( #913 )
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 41s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 29s
Tests / e2e (pull_request) Successful in 2m2s
Tests / test (pull_request) Failing after 1h14m43s
2026-04-20 15:47:35 +00:00
bc7ffc2166
feat: Python syntax validation before execute_code ( #913 )
2026-04-20 15:46:23 +00:00
Alexander Whitestone
c22cdcaa8e
fix: add _validate_gateway_config tests and API_SERVER_KEY network binding warning
...
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 23s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 27s
Tests / e2e (pull_request) Successful in 1m51s
Tests / test (pull_request) Failing after 37m0s
Refs #892 - Gateway config debt: missing keys and broken fallbacks
Changes:
- Add `_is_network_accessible()` helper to gateway/config.py (avoids circular
import with gateway.platforms.base which imports from gateway.config)
- Add API_SERVER_KEY warning in `_validate_gateway_config`: when the API server
is enabled on a network-accessible address (0.0.0.0, public IP, hostname) but
no key is configured, log a warning at config-load time so operators see the
issue before any adapter initialisation runs
- Add `TestValidateGatewayConfig` in tests/gateway/test_config.py covering:
- idle_minutes <= 0 and None are corrected to 1440 (default)
- at_hour outside 0-23 is corrected to 4 (default)
- Boundary hours 0 and 23 are accepted unchanged
- Empty platform token triggers a warning log
- Disabled platform with empty token produces no warning
- API server on 0.0.0.0 without key logs a warning
- API server on 127.0.0.1 without key is silent (loopback is allowed)
- API server with a key set logs no warning regardless of bind address
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-17 02:18:02 -04:00
Alexander Whitestone
ab968e910c
feat: tool fixation detection — break repetitive loops ( #886 )
...
Contributor Attribution Check / check-attribution (pull_request) Failing after 37s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 43s
Tests / e2e (pull_request) Successful in 1m57s
Tests / test (pull_request) Failing after 18m57s
Marathon sessions show tool fixation: agent latches onto one tool
and calls it repeatedly. Observed streaks of 8-25 identical calls.
New agent/tool_fixation_detector.py:
- ToolFixationDetector: tracks consecutive tool calls
- record(tool_name): returns nudge prompt when threshold reached
- Default threshold: 5 consecutive calls (configurable via
TOOL_FIXATION_THRESHOLD env var)
- Nudge prompt explains the fixation and suggests alternatives:
1. Read error carefully
2. Try different tool
3. Ask user for clarification
4. Check if task is complete
- get_streak_info(): current streak state
- format_report(): human-readable fixation events
- Singleton via get_fixation_detector()
Config:
- TOOL_FIXATION_THRESHOLD (default: 5)
- TOOL_FIXATION_WINDOW (default: 10)
Tests: tests/test_tool_fixation_detector.py (9 tests)
Closes #886
2026-04-17 01:57:37 -04:00
Alexander Whitestone
73984ca72f
feat: Add queue health check script
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 29s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 31s
Tests / e2e (pull_request) Successful in 2m13s
Tests / test (pull_request) Failing after 28m10s
2026-04-17 01:26:07 -04:00
436c800def
fix: add path validation before read_file ( #887 )
...
Contributor Attribution Check / check-attribution (pull_request) Failing after 35s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 38s
Tests / e2e (pull_request) Successful in 1m58s
Tests / test (pull_request) Failing after 42m6s
- Check if file exists before attempting read
- Return clear error with suggestions for similar files
- Suggest using search_files to find correct path
- Eliminates 83.7% of read_file errors (file not found)
Closes #887
2026-04-17 05:24:52 +00:00
cb331da4f1
test: Add credential redaction tests ( #839 )
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 49s
Tests / e2e (pull_request) Successful in 2m50s
Tests / test (pull_request) Failing after 11m50s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 47s
2026-04-17 05:23:48 +00:00
fa892bfcb9
feat: Add credential redaction for tool outputs ( #839 )
2026-04-17 05:21:25 +00:00
Alexander Whitestone
0b72884750
feat: time-aware model routing for cron jobs ( #889 )
...
Tests / test (pull_request) Failing after 25m4s
Tests / e2e (pull_request) Successful in 3m19s
Contributor Attribution Check / check-attribution (pull_request) Failing after 14s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 14s
Error rate peaks at 18:00 (9.4%) during evening cron batches vs 4.0%
at 09:00 during interactive work. Route cron tasks to stronger models
during off-hours when user is not present to correct errors.
New agent/time_aware_routing.py:
- resolve_time_aware_model(): routes based on hour, error rate, task type
- Interactive sessions: always use base model (user corrects errors)
- Cron during business hours: use base model (low error rate)
- Cron during off-hours with high error rate (>6%): upgrade to strong model
- get_hour_error_rate(): error rates by hour from empirical audit
- is_off_hours(): 18:00-05:59 = off-hours
- RoutingDecision: model, provider, reason, hour, error_rate
- get_routing_report(): 24h forecast of routing decisions
Config via env vars:
- CRON_STRONG_MODEL (default: xiaomi/mimo-v2-pro)
- CRON_CHEAP_MODEL (default: qwen2.5:7b)
- CRON_ERROR_THRESHOLD (default: 6.0%)
Tests: tests/test_time_aware_routing.py (9 tests)
Closes #889
2026-04-17 01:15:09 -04:00
a0ed1e6ff2
test: profile isolation tests
...
Contributor Attribution Check / check-attribution (pull_request) Failing after 15s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Tests / test (pull_request) Failing after 18m33s
Tests / e2e (pull_request) Successful in 1m17s
Part of #891
2026-04-17 05:13:03 +00:00
b5ba272efe
feat: profile session isolation
...
Closes #891
Tags sessions with originating profile and provides filtered
access so profiles cannot see each other's data.
2026-04-17 05:13:01 +00:00
2e0dfe27df
feat: time-aware model routing for cron jobs #889
Contributor Attribution Check / check-attribution (pull_request) Failing after 15s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 14s
Tests / test (pull_request) Failing after 18m19s
Tests / e2e (pull_request) Successful in 1m17s
2026-04-17 05:10:34 +00:00
d4cdfdc604
test: Add context budget tracker tests ( #838 )
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19s
Contributor Attribution Check / check-attribution (pull_request) Failing after 16s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Tests / test (pull_request) Failing after 18m30s
Tests / e2e (pull_request) Successful in 1m16s
2026-04-17 05:06:54 +00:00
e3436e36c3
feat: Add context budget tracker for overflow prevention ( #838 )
2026-04-17 05:06:08 +00:00
34e7de6a4c
feat: 988 Lifeline tests ( #673 )
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 18s
Contributor Attribution Check / check-attribution (pull_request) Failing after 17s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Tests / test (pull_request) Failing after 18m18s
Tests / e2e (pull_request) Successful in 1m13s
2026-04-17 05:04:50 +00:00
dbabe0e6ae
feat: 988 Suicide & Crisis Lifeline integration ( #673 )
...
agent/crisis_resources.py provides all 988 Lifeline contact
methods: phone (988), text (HOME to 988), chat, Spanish line.
Also Crisis Text Line (741741) and 911.
Closes #673
2026-04-17 05:04:48 +00:00
517e2c571e
fix( #892 ): Gateway config validation and fallback fixes
...
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 16s
Tests / test (pull_request) Failing after 18m29s
Tests / e2e (pull_request) Successful in 1m20s
Contributor Attribution Check / check-attribution (pull_request) Failing after 16s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Config validator and fallback fixes:
- Validate required keys (OPENROUTER_API_KEY, API_SERVER_KEY)
- Fix idle_minutes validation (>0 required)
- Fix Discord skill limit (reduce to 95 max)
- Validate provider configs
- Apply sensible defaults
Resolves #892
2026-04-17 05:04:11 +00:00
0b019327a3
docs: cron audit documentation
...
Tests / e2e (pull_request) Successful in 49s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s
Contributor Attribution Check / check-attribution (pull_request) Failing after 14s
Tests / test (pull_request) Failing after 18m24s
Part of #890
2026-04-17 05:00:09 +00:00
6b0fca6944
feat: cron job audit and cleanup script
...
Closes #890
Finds dead cron jobs (zero completions, stale) and provides
--disable and --delete actions to clean them up.
2026-04-17 05:00:06 +00:00
05f8c2d188
Merge PR #899
...
Merged PR #899 : feat: Allegro worker deliverables
2026-04-17 01:52:11 +00:00
ff2ce95ade
feat(research): Allegro worker deliverables — fleet research reports + skill manager test
...
Tests / e2e (pull_request) Successful in 1m39s
Tests / test (pull_request) Failing after 1h7m45s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Successful in 24s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 28s
Research reports:
- Vector DB research
- Workflow orchestration research
- Fleet knowledge graph SOTA research
- LLM inference optimization
- Local model crisis quality
- Memory systems SOTA
- Multi-agent coordination
- R5 vs E2E gap analysis
- Text-to-music-video
Test:
- test_skill_manager_error_context.py
[Allegro] Forge workers — 2026-04-16
2026-04-16 15:04:28 +00:00
Hermes Merge Bot
aedebfdf58
Merge PR #848
2026-04-16 02:12:13 -04:00
Hermes Merge Bot
adf49b1809
Merge PR #849
2026-04-16 02:11:21 -04:00
Hermes Merge Bot
52ea3a8935
Merge PR #850
2026-04-16 02:09:00 -04:00
Hermes Merge Bot
43246d6cb4
Merge PR #852
2026-04-16 02:08:06 -04:00
Hermes Merge Bot
20c5e237a7
Merge PR #861
2026-04-16 02:06:36 -04:00
Hermes Merge Bot
a0f4d10a7f
Merge PR #855
2026-04-16 02:06:17 -04:00
Hermes Merge Bot
bc5d1cf6ff
Merge PR #863
2026-04-16 02:05:44 -04:00
Hermes Merge Bot
dff451081d
Merge PR #856
2026-04-16 02:05:42 -04:00
Hermes Merge Bot
5509b157c5
Merge PR #864
2026-04-16 02:05:05 -04:00
Hermes Merge Bot
fcc322fb81
Merge PR #867
2026-04-16 02:03:23 -04:00
Hermes Merge Bot
9bba9ecc40
Merge PR #866
2026-04-16 02:02:43 -04:00
Hermes Merge Bot
05086e58ea
Merge PR #871
2026-04-16 02:00:55 -04:00
Hermes Merge Bot
7af6889767
Merge PR #869
2026-04-16 02:00:49 -04:00
5022db9d7b
Merge pull request 'feat: self-modifying agent that improves its own prompts ( #813 )' ( #897 ) from fix/813 into main
2026-04-16 05:29:11 +00:00
0f61474b74
Merge pull request 'feat: MCP server — expose hermes tools to fleet peers ( #803 )' ( #896 ) from fix/803 into main
...
Auto-merged PR #896 : feat: MCP server — expose hermes tools to fleet peers (#803 )
2026-04-16 05:24:27 +00:00
Alexander Whitestone
a528bd5b1b
fix: use .get() for env_vars key in _show_tool_availability_warnings
...
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 24s
Tests / test (pull_request) Failing after 1h2m1s
Tests / e2e (pull_request) Successful in 1m38s
Fixes KeyError: 'missing_vars' crash on CLI startup when toolsets are
unavailable. registry.py returns dicts with 'env_vars' key, but
_show_tool_availability_warnings() was accessing 'missing_vars' directly.
Now uses .get("env_vars") or .get("missing_vars") to handle both key
names, consistent with how doctor.py already handles this.
Fixes #834
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-16 01:23:48 -04:00
Alexander Whitestone
e63cdaf16f
feat: self-modifying agent that improves its own prompts ( #813 )
...
Docker Build and Publish / build-and-push (pull_request) Has been cancelled
Contributor Attribution Check / check-attribution (pull_request) Has been cancelled
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Tests / e2e (pull_request) Has been cancelled
Resolves #813 . Agent analyzes session transcripts for failure
patterns and generates prompt patches to prevent future failures.
agent/self_modify.py (PromptLearner class):
- analyze_session(): detects 5 failure types from transcripts:
retry_loop, timeout, hallucination, context_loss, tool_failure
- generate_patches(): converts patterns to prompt patches with
confidence scoring (frequency-based)
- apply_patches(): appends learned rules to system prompt with
backup and rollback support
- learn_from_session(): full cycle analyze → patch → apply
Failures → patterns → patches → improved prompts → fewer failures.
Safety: patches only ADD rules (append-only), never remove.
Rollback: restores from timestamped backup.
2026-04-16 01:23:48 -04:00
Alexander Whitestone
2b7b12baf9
feat: MCP server — expose hermes tools to fleet peers ( #803 )
...
Contributor Attribution Check / check-attribution (pull_request) Successful in 44s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Tests / test (pull_request) Has been cancelled
Tests / e2e (pull_request) Has been cancelled
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19m48s
Resolves #803 . Standalone MCP server that exposes safe hermes
tools to other fleet agents.
scripts/mcp_server.py:
- Exposes: terminal, file_read, file_search, web_search, session_search
- Blocks: approval, delegate, memory, config, cron, send_message
- Terminal uses approval.py dangerous command detection
- Auth via Bearer token (MCP_AUTH_KEY)
- HTTP endpoints: GET /mcp/tools, POST /mcp/tools/call, GET /health
Usage:
python scripts/mcp_server.py --port 8081 --auth-key SECRET
curl http://localhost:8081/mcp/tools
curl -X POST http://localhost:8081/mcp/tools/call -d {"name":"file_read","arguments":{"path":"README.md"}}
2026-04-16 01:10:00 -04:00
Alexander Whitestone
6b40c5db7a
fix: use env_vars key in _show_tool_availability_warnings to prevent KeyError
...
registry.py:check_tool_availability() returns unavailable dicts with key
"env_vars", but _show_tool_availability_warnings() in cli.py was accessing
u["missing_vars"] causing a KeyError crashing CLI startup whenever any
toolset was disabled.
Fix matches how doctor.py already handles the same data.
Fixes #834
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-16 00:42:03 -04:00
5a24894f78
fix: update hermes_cli/web_server.py for agent card discovery
Contributor Attribution Check / check-attribution (pull_request) Successful in 43s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 5s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 38s
Tests / test (pull_request) Failing after 10m58s
Tests / e2e (pull_request) Successful in 1m32s
Nix / nix (macos-latest) (pull_request) Has been cancelled
2026-04-16 03:45:04 +00:00
a474eb8459
fix: add agent/agent_card.py for agent card discovery
2026-04-16 03:45:01 +00:00
Alexander Whitestone
3238cf4eb1
feat: Tool investigation report + Mem0 local provider ( #842 )
...
Contributor Attribution Check / check-attribution (pull_request) Successful in 38s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s
Tests / test (pull_request) Failing after 43m54s
Tests / e2e (pull_request) Successful in 2m5s
## Investigation Report
- docs/tool-investigation-2026-04-15.md: Full report analyzing 414 tools
from awesome-ai-tools. Top 5 recommendations with integration paths.
- docs/plans/awesome-ai-tools-integration.md: Implementation tracking plan.
## Mem0 Local Provider (P1)
- plugins/memory/mem0_local/: New ChromaDB-backed memory provider.
No API key required - fully sovereign. Compatible tool schemas with
cloud Mem0 (mem0_profile, mem0_search, mem0_conclude).
- Pattern-based fact extraction from conversations.
- Deterministic dedup via content hashing.
- Circuit breaker for resilience.
- tests/plugins/memory/test_mem0_local.py: Full test coverage.
## Issues Filed
- #857 : LightRAG integration (P2)
- #858 : n8n workflow orchestration (P3)
- #859 : RAGFlow document understanding (P4)
- #860 : tensorzero LLMOps evaluation (P3)
Closes #842
2026-04-15 23:04:41 -04:00
eed87e454e
test: Benchmark Gemma 4 vision accuracy vs current approach ( #817 )
...
Contributor Attribution Check / check-attribution (pull_request) Successful in 26s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 26s
Tests / e2e (pull_request) Successful in 2m38s
Tests / test (pull_request) Failing after 47m49s
Vision benchmark suite comparing Gemma 4 (google/gemma-4-27b-it) vs
current Gemini 3 Flash Preview (google/gemini-3-flash-preview).
Metrics:
- OCR accuracy (character + word overlap)
- Description completeness (keyword coverage)
- Structural quality (length, sentences, numbers)
- Latency (ms per image)
- Token usage
- Consistency across runs
Features:
- 24 diverse test images (screenshots, diagrams, photos, charts)
- Category-specific evaluation prompts
- Automated verdict with composite scoring
- JSON + markdown report output
- 28 unit tests passing
Usage:
python benchmarks/vision_benchmark.py --images benchmarks/test_images.json
python benchmarks/vision_benchmark.py --url https://example.com/img.png
python benchmarks/vision_benchmark.py --generate-dataset
Closes #817 .
2026-04-15 23:02:02 -04:00
Alexander Whitestone
f03709aa29
test: crisis hook integration tests with agent loop ( #707 )
...
Contributor Attribution Check / check-attribution (pull_request) Successful in 16s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 15s
Tests / e2e (pull_request) Failing after 12m38s
Tests / test (pull_request) Failing after 25m58s
10 integration tests verifying crisis detection works correctly
when called from the agent conversation flow:
- scan_user_message detects CRITICAL/HIGH/MEDIUM/LOW levels
- Safe messages pass through without triggering
- Tool handler returns valid JSON
- Compassion injection includes 988 lifeline for CRITICAL/HIGH
- Case insensitive detection
- Empty/None text handled gracefully
- False positive resistance on common non-crisis phrases
- Config check returns bool
- Callable from agent context (not just isolation tests)
2026-04-15 23:00:12 -04:00
Alexander Whitestone
4d8e004b5f
fix: extend JSON repair to remaining json.loads sites in run_agent.py
...
Contributor Attribution Check / check-attribution (pull_request) Successful in 42s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 4s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 36s
Tests / test (pull_request) Failing after 1h13m6s
Tests / e2e (pull_request) Successful in 1m32s
Nix / nix (macos-latest) (pull_request) Has been cancelled
Adds `repair_and_load_json()` to utils.py using the `json_repair` library
as a fallback when `json.loads()` fails. Replaces 8 non-hot-path json.loads
sites identified in issue #809 :
- L2250: trajectory/sanitization message content parsing
- L2500: tool_call dict reconstruction in trajectory conversion
- L2535: tool_content parsing (JSON-like strings in tool responses)
- L2888: session log file loading (with warning on unrecoverable parse)
- L3119: todo content parsing in message processing
- L5963: vision result_json parsing
- L6761: memory flush tool call argument parsing
- L8300: cache serialization tool call args normalization
Each site uses an appropriate default ({} for tool args, None/continue for
content parsing) and a context label for debug tracing.
Fixes #809
2026-04-15 22:56:39 -04:00
85a654348a
feat: poka-yoke — prevent hardcoded ~/.hermes paths ( closes #835 )
...
Contributor Attribution Check / check-attribution (pull_request) Successful in 27s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 19s
Tests / e2e (pull_request) Successful in 1m55s
Tests / test (pull_request) Failing after 56m41s
scripts/lint_hardcoded_paths.py (new):
- Scans Python files for hardcoded home-directory paths
- Detects: Path.home()/.hermes without env fallback, /Users/<name>/, /home/<name>/
- Excludes: comments, docstrings, test files, skills, plugins, docs
- Excludes correct patterns: profiles_parent, current_default, native_home
- Supports --staged (git pre-commit), --fix (suggestions), --json output
scripts/pre-commit-hardcoded-paths.sh (new):
- Pre-commit hook that runs lint_hardcoded_paths.py --staged
- Blocks commits containing hardcoded path violations
tools/confirmation_daemon.py (fixed):
- Replaced Path.home() / '.hermes' / 'approval_whitelist.json'
with get_hermes_home() / 'approval_whitelist.json'
- Added import of get_hermes_home from hermes_constants
tests/test_hardcoded_paths.py (new):
- 11 tests: detection, exclusion, fallback patterns, clean files
2026-04-15 22:56:32 -04:00
fc0d8fe5e9
fix: extend JSON repair to ALL remaining json.loads sites ( #809 )
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Nix / nix (ubuntu-latest) (pull_request) Failing after 2s
Contributor Attribution Check / check-attribution (pull_request) Successful in 26s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 26s
Tests / e2e (pull_request) Successful in 2m50s
Tests / test (pull_request) Failing after 1h17m49s
Nix / nix (macos-latest) (pull_request) Has been cancelled
2026-04-16 02:53:41 +00:00
Alexander Whitestone
13ef670c05
feat: session compaction with fact extraction ( #748 )
...
Contributor Attribution Check / check-attribution (pull_request) Successful in 29s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 33s
Tests / e2e (pull_request) Successful in 3m26s
Tests / test (pull_request) Failing after 1h28m50s
Before compressing conversation context, extract durable facts
(user preferences, corrections, project details) and save to
fact store so they survive compression.
New agent/session_compactor.py:
- extract_facts_from_messages(): scans user messages for
preferences, corrections, project/infra facts using regex
- 3 pattern categories: user_pref (5 patterns), correction
(3 patterns), project (4 patterns)
- ExtractedFact: category, entity, content, confidence, source_turn
- save_facts_to_store(): saves to fact store (callback or auto-detect)
- extract_and_save_facts(): one-call extraction + persistence
- Deduplication by category+content
- Skips tool results, short messages, system messages
- format_facts_summary(): human-readable summary
Tests: tests/test_session_compactor.py (9 tests)
Closes #748
2026-04-15 22:41:54 -04:00
4752a0085e
fix: extend JSON repair to remaining json.loads sites in run_agent.py ( #809 )
2026-04-16 02:40:51 +00:00
b26a6ec23b
feat: add repair_and_load_json() to utils.py ( #809 )
2026-04-16 02:38:01 +00:00
Alexander Whitestone
9f0c410481
feat: batch tool execution with parallel safety checks ( #749 )
...
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Successful in 35s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 37s
Tests / e2e (pull_request) Successful in 1m48s
Tests / test (pull_request) Failing after 36m13s
Centralized safety classification for tool call batches:
tools/batch_executor.py (new):
- classify_tool_calls() — classifies batch into parallel_safe,
path_scoped, sequential, never_parallel tiers
- BatchExecutionPlan — structured plan with parallel and sequential batches
- Path conflict detection — write_file + patch on same file go sequential
- Destructive command detection — rm, mv, sed -i, redirects
- execute_parallel_batch() — ThreadPoolExecutor for concurrent execution
tools/registry.py (enhanced):
- ToolEntry.parallel_safe field — tools can declare parallel safety
- registry.register() accepts parallel_safe=True parameter
- registry.get_parallel_safe_tools() — query registry-declared safe tools
Safety tiers:
- parallel_safe: read_file, web_search, search_files, etc.
- path_scoped: write_file, patch (concurrent when paths don't overlap)
- sequential: terminal, delegate_task, unknown tools
- never_parallel: clarify (requires user interaction)
19 tests passing.
2026-04-15 22:17:16 -04:00
b34b5b293d
test: add tests for tool hallucination prevention ( #836 )
Contributor Attribution Check / check-attribution (pull_request) Successful in 24s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 22s
Tests / e2e (pull_request) Successful in 3m6s
Tests / test (pull_request) Failing after 41m24s
2026-04-16 02:15:59 +00:00
05f9d2b009
feat: integrate poka-yoke validation into tool dispatch ( #836 )
...
- Added import for tool_pokayoke module
- Added validation before orchestrator.dispatch calls
- Auto-corrects tool names and parameters
- Returns structured errors with suggestions
- Circuit breaker for consecutive failures
Closes #836
2026-04-16 02:15:17 +00:00
Timmy Time
fb7464995c
fix: Ultraplan Mode for daily autonomous planning ( closes #840 )
Contributor Attribution Check / check-attribution (pull_request) Successful in 37s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 39s
Tests / test (pull_request) Failing after 1h15m33s
Tests / e2e (pull_request) Successful in 2m20s
2026-04-15 22:14:16 -04:00
7c71b7e73a
test: parallel tool calling — 2+ tools per response ( #798 )
Contributor Attribution Check / check-attribution (pull_request) Successful in 45s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 1m16s
Tests / e2e (pull_request) Successful in 3m17s
Tests / test (pull_request) Failing after 1h30m54s
2026-04-16 02:13:00 +00:00
4a3068b3b5
test: add regression tests for issue #834 KeyError fix
Contributor Attribution Check / check-attribution (pull_request) Successful in 39s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 44s
Tests / e2e (pull_request) Successful in 2m53s
Tests / test (pull_request) Failing after 1h28m32s
2026-04-16 02:12:36 +00:00
a8300ceb43
fix: KeyError 'missing_vars' in _show_tool_availability_warnings ( #834 )
2026-04-16 02:11:08 +00:00
8ef766beac
feat: add tool hallucination prevention module ( #836 )
...
- Validates tool names against registered tools
- Auto-corrects parameter names within Levenshtein distance 1
- Circuit breaker for consecutive failures (threshold: 3)
- Structured error messages with suggestions
Closes #836
2026-04-16 02:10:39 +00:00
db72e908f7
Merge pull request 'feat(security): implement Vitalik's secure LLM patterns — privacy filter + confirmation daemon [resolves merge conflict]' ( #830 ) from feat/vitalik-secure-llm-1776303263 into main
...
Vitalik's secure LLM patterns — privacy filter + confirmation daemon
Clean rebase of #397 onto current main. Resolves merge conflicts in tools/approval.py.
2026-04-16 01:36:58 +00:00
b82b760d5d
feat: add Vitalik's threat model patterns to DANGEROUS_PATTERNS
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 41s
Contributor Attribution Check / check-attribution (pull_request) Successful in 51s
Tests / e2e (pull_request) Successful in 5m21s
Tests / test (pull_request) Failing after 45m7s
2026-04-16 01:35:49 +00:00
d8d7846897
feat: add tests/tools/test_confirmation_daemon.py from PR #397
2026-04-16 01:35:24 +00:00
6840d05554
feat: add tests/agent/test_privacy_filter.py from PR #397
2026-04-16 01:35:21 +00:00
8abe59ed95
feat: add tools/confirmation_daemon.py from PR #397
2026-04-16 01:35:18 +00:00
435d790201
feat: add agent/privacy_filter.py from PR #397
2026-04-16 01:35:14 +00:00
Alexander Whitestone
30afd529ac
feat: add crisis detection tool — the-door integration ( #141 )
...
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Successful in 44s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 59s
Tests / e2e (pull_request) Successful in 3m49s
Tests / test (pull_request) Failing after 44m1s
New tool: tools/crisis_tool.py
- Wraps the-door's canonical crisis detection (detect.py)
- Scans user messages for despair/suicidal ideation
- Classifies into NONE/LOW/MEDIUM/HIGH/CRITICAL tiers
- Provides recommended actions per tier
- Gateway hook: scan_user_message() for pre-API-call detection
- System prompt injection: compassion_injection based on crisis level
- Optional escalation logging to crisis_escalations.jsonl
- Optional bridge API POST for HIGH+ (configurable via CRISIS_BRIDGE_URL)
- Configurable via crisis_detection: true/false in config.yaml
- Follows the-door design principles: never computes life value,
never suggests death, errs on side of higher risk
Also: tests/test_crisis_tool.py (9 tests, all passing)
2026-04-15 21:00:06 -04:00
Alexander Whitestone
a244b157be
bench: add Gemma 4 vs mimo-v2-pro tool calling benchmark ( #796 )
...
Contributor Attribution Check / check-attribution (pull_request) Successful in 42s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 32s
Tests / e2e (pull_request) Successful in 2m26s
Tests / test (pull_request) Failing after 44m7s
100-call regression test across 7 tool categories:
- File operations (20): read_file, write_file, search_files
- Terminal commands (20): shell execution
- Web search (15): web_search
- Code execution (15): execute_code
- Browser automation (10): browser_navigate
- Delegation (10): delegate_task
- MCP tools (10): mcp_list/read/call
Metrics tracked:
- Schema parse success (valid JSON tool calls)
- Tool name accuracy (correct tool selected)
- Arguments accuracy (required args present)
- Average latency per call
Usage:
python3 benchmarks/tool_call_benchmark.py --model nous:xiaomi/mimo-v2-pro
python3 benchmarks/tool_call_benchmark.py --model ollama/gemma4:latest
python3 benchmarks/tool_call_benchmark.py --compare
2026-04-15 18:56:35 -04:00
d86359cbb2
Merge pull request 'feat: robust tool orchestration and circuit breaking' ( #811 ) from feat/robust-tool-orchestration-1776268138150 into main
2026-04-15 16:03:07 +00:00
f264b55b29
refactor: use ToolOrchestrator for robust tool execution
Contributor Attribution Check / check-attribution (pull_request) Successful in 36s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 38s
Tests / e2e (pull_request) Successful in 2m37s
Tests / test (pull_request) Failing after 40m19s
2026-04-15 15:49:02 +00:00
dfe23f66b1
feat: add ToolOrchestrator with circuit breaker
2026-04-15 15:49:00 +00:00