Compare commits

...

1 Commits

Author SHA1 Message Date
Alexander Payne
e8392e82de docs: add Cluster H safety regression packet
Some checks failed
Nix Lockfile Check / nix-lockfile-check (pull_request) Failing after 42s
Nix / nix (ubuntu-latest) (pull_request) Failing after 37s
Tests / e2e (pull_request) Successful in 4m39s
Tests / test (pull_request) Failing after 59m56s
Nix / nix (macos-latest) (pull_request) Has been cancelled
Add SAFETY_REGRESSION_PACKET/ with terminal transcripts proving all
acceptance criteria for Cluster H (safety / operator protection /
registry correctness).

Artifacts:
- gateway_self_destruction.txt — proves agent cannot self-destruct gateway via terminal path
- malformed_key_check.txt — proves non-ASCII API key corruption is detected and sanitized
- billing_error_output.txt — proves 402 billing failures produce actionable guidance
- tool_registry_proof.txt — proves built-in tools auto-discover on startup
- mcp_alias_resolution.txt — proves MCP aliases/toolsets resolve from live registry correctly
- SUMMARY.md — verdict summary

All tests executed against current main (HEAD). All 5 criteria PASS.

Closes #1060
2026-04-29 08:01:52 -04:00
6 changed files with 184 additions and 0 deletions

View File

@@ -0,0 +1,26 @@
# Cluster H — Safety / Operator Protection / Registry Correctness
## Release Proof Packet
The following tests were executed to verify the upstream source commits
(block gateway self-destruction; strip non-ASCII API-key corruption;
402 billing hint; built-in tool auto-discovery; MCP alias fix; live registry fixes).
All acceptance criteria verified successfully on main.
## Items
- **gateway_self_destruction** — Agent cannot self-destruct gateway via terminal path: PASS
- **malformed_key_check** — Non-ASCII API key corruption is detected and sanitized: PASS
- **billing_error_output** — 402 billing failures produce actionable guidance: PASS
- **tool_registry_proof** — Built-in tools auto-discover on startup: PASS
- **mcp_alias_resolution** — MCP aliases/toolsets resolve from live registry correctly: PASS
## Artifact Files
- `gateway_self_destruction.txt` — full pytest stdout/stderr for gateway protection tests
- `malformed_key_check.txt` — full pytest stdout/stderr for non-ASCII credential tests
- `billing_error_output.txt` — full pytest stdout/stderr for 402 billing classification tests
- `tool_registry_proof.txt` — full pytest stdout/stderr for built-in tool discovery test
- `mcp_alias_resolution.txt` — full pytest stdout/stderr for MCP alias resolution test
Generated 2026-04-29T08:01:43.856965 from main (HEAD)

View File

@@ -0,0 +1,36 @@
Test: 402 billing failures produce actionable guidance
Exit code: 0
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/agent/test_auxiliary_client.py::TestIsPaymentError --tb=short
=== STDOUT ===
============================= test session starts ==============================
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
configfile: pyproject.toml
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
created: 14/14 workers
14 workers [7 items]
scheduling tests via LoadScheduling
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_429_without_credits_message_is_not_payment
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_402_status_code
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_no_status_code_with_billing_message
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_generic_500_is_not_payment
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_no_status_code_no_message
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_402_with_credits_message
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_429_with_credits_message
[gw3] [ 14%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_429_with_credits_message
[gw0] [ 28%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_402_status_code
[gw1] [ 42%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_402_with_credits_message
[gw2] [ 57%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_429_without_credits_message_is_not_payment
[gw5] [ 71%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_no_status_code_with_billing_message
[gw4] [ 85%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_generic_500_is_not_payment
[gw6] [100%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_no_status_code_no_message
============================== 7 passed in 4.49s ===============================
=== STDERR ===

View File

@@ -0,0 +1,42 @@
Test: Agent cannot self-destruct the gateway via terminal path
Exit code: 0
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/tools/test_approval.py::TestGatewayProtection --tb=short
=== STDOUT ===
============================= test session starts ==============================
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
configfile: pyproject.toml
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
created: 14/14 workers
14 workers [10 items]
scheduling tests via LoadScheduling
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_disown_detected
tests/tools/test_approval.py::TestGatewayProtection::test_pkill_unrelated_not_flagged
tests/tools/test_approval.py::TestGatewayProtection::test_pkill_hermes_detected
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_setsid_detected
tests/tools/test_approval.py::TestGatewayProtection::test_pkill_gateway_detected
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_foreground_not_flagged
tests/tools/test_approval.py::TestGatewayProtection::test_killall_hermes_detected
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_ampersand_detected
tests/tools/test_approval.py::TestGatewayProtection::test_systemctl_restart_flagged
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_nohup_detected
[gw4] [ 10%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_foreground_not_flagged
[gw5] [ 20%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_systemctl_restart_flagged
[gw6] [ 30%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_pkill_hermes_detected
[gw2] [ 40%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_nohup_detected
[gw7] [ 50%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_killall_hermes_detected
[gw3] [ 60%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_setsid_detected
[gw8] [ 70%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_pkill_gateway_detected
[gw0] [ 80%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_disown_detected
[gw1] [ 90%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_ampersand_detected
[gw9] [100%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_pkill_unrelated_not_flagged
============================== 10 passed in 1.97s ==============================
=== STDERR ===

View File

@@ -0,0 +1,32 @@
Test: Non-ASCII API key corruption is detected and sanitized
Exit code: 0
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential --tb=short
=== STDOUT ===
============================= test session starts ==============================
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
configfile: pyproject.toml
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
created: 14/14 workers
14 workers [5 items]
scheduling tests via LoadScheduling
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_all_ascii_no_warning
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_ascii_key_unchanged
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_empty_key
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_strips_unicode_v_lookalike
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_strips_multiple_non_ascii
[gw0] [ 20%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_ascii_key_unchanged
[gw3] [ 40%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_empty_key
[gw1] [ 60%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_strips_unicode_v_lookalike
[gw4] [ 80%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_all_ascii_no_warning
[gw2] [100%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_strips_multiple_non_ascii
============================== 5 passed in 1.90s ===============================
=== STDERR ===

View File

@@ -0,0 +1,24 @@
Test: MCP aliases/toolsets resolve from live registry correctly
Exit code: 0
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/test_toolsets.py::TestValidateToolset::test_mcp_alias_uses_live_registry --tb=short
=== STDOUT ===
============================= test session starts ==============================
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
configfile: pyproject.toml
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
created: 14/14 workers
14 workers [1 item]
scheduling tests via LoadScheduling
tests/test_toolsets.py::TestValidateToolset::test_mcp_alias_uses_live_registry
[gw0] [100%] PASSED tests/test_toolsets.py::TestValidateToolset::test_mcp_alias_uses_live_registry
============================== 1 passed in 2.11s ===============================
=== STDERR ===

View File

@@ -0,0 +1,24 @@
Test: Built-in tools auto-discover on startup
Exit code: 0
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/tools/test_registry.py::TestBuiltinDiscovery::test_matches_previous_manual_builtin_tool_set --tb=short
=== STDOUT ===
============================= test session starts ==============================
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
configfile: pyproject.toml
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
created: 14/14 workers
14 workers [1 item]
scheduling tests via LoadScheduling
tests/tools/test_registry.py::TestBuiltinDiscovery::test_matches_previous_manual_builtin_tool_set
[gw0] [100%] PASSED tests/tools/test_registry.py::TestBuiltinDiscovery::test_matches_previous_manual_builtin_tool_set
============================== 1 passed in 3.12s ===============================
=== STDERR ===