Compare commits
1 Commits
step35/749
...
step35/106
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e8392e82de |
26
SAFETY_REGRESSION_PACKET/SUMMARY.md
Normal file
26
SAFETY_REGRESSION_PACKET/SUMMARY.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Cluster H — Safety / Operator Protection / Registry Correctness
|
||||
## Release Proof Packet
|
||||
|
||||
The following tests were executed to verify the upstream source commits
|
||||
(block gateway self-destruction; strip non-ASCII API-key corruption;
|
||||
402 billing hint; built-in tool auto-discovery; MCP alias fix; live registry fixes).
|
||||
|
||||
All acceptance criteria verified successfully on main.
|
||||
|
||||
## Items
|
||||
|
||||
- **gateway_self_destruction** — Agent cannot self-destruct gateway via terminal path: PASS
|
||||
- **malformed_key_check** — Non-ASCII API key corruption is detected and sanitized: PASS
|
||||
- **billing_error_output** — 402 billing failures produce actionable guidance: PASS
|
||||
- **tool_registry_proof** — Built-in tools auto-discover on startup: PASS
|
||||
- **mcp_alias_resolution** — MCP aliases/toolsets resolve from live registry correctly: PASS
|
||||
|
||||
## Artifact Files
|
||||
|
||||
- `gateway_self_destruction.txt` — full pytest stdout/stderr for gateway protection tests
|
||||
- `malformed_key_check.txt` — full pytest stdout/stderr for non-ASCII credential tests
|
||||
- `billing_error_output.txt` — full pytest stdout/stderr for 402 billing classification tests
|
||||
- `tool_registry_proof.txt` — full pytest stdout/stderr for built-in tool discovery test
|
||||
- `mcp_alias_resolution.txt` — full pytest stdout/stderr for MCP alias resolution test
|
||||
|
||||
Generated 2026-04-29T08:01:43.856965 from main (HEAD)
|
||||
36
SAFETY_REGRESSION_PACKET/billing_error_output.txt
Normal file
36
SAFETY_REGRESSION_PACKET/billing_error_output.txt
Normal file
@@ -0,0 +1,36 @@
|
||||
Test: 402 billing failures produce actionable guidance
|
||||
Exit code: 0
|
||||
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/agent/test_auxiliary_client.py::TestIsPaymentError --tb=short
|
||||
|
||||
=== STDOUT ===
|
||||
============================= test session starts ==============================
|
||||
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
|
||||
cachedir: .pytest_cache
|
||||
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
|
||||
configfile: pyproject.toml
|
||||
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
|
||||
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
|
||||
created: 14/14 workers
|
||||
14 workers [7 items]
|
||||
|
||||
scheduling tests via LoadScheduling
|
||||
|
||||
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_429_without_credits_message_is_not_payment
|
||||
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_402_status_code
|
||||
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_no_status_code_with_billing_message
|
||||
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_generic_500_is_not_payment
|
||||
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_no_status_code_no_message
|
||||
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_402_with_credits_message
|
||||
tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_429_with_credits_message
|
||||
[gw3] [ 14%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_429_with_credits_message
|
||||
[gw0] [ 28%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_402_status_code
|
||||
[gw1] [ 42%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_402_with_credits_message
|
||||
[gw2] [ 57%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_429_without_credits_message_is_not_payment
|
||||
[gw5] [ 71%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_no_status_code_with_billing_message
|
||||
[gw4] [ 85%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_generic_500_is_not_payment
|
||||
[gw6] [100%] PASSED tests/agent/test_auxiliary_client.py::TestIsPaymentError::test_no_status_code_no_message
|
||||
|
||||
============================== 7 passed in 4.49s ===============================
|
||||
|
||||
=== STDERR ===
|
||||
|
||||
42
SAFETY_REGRESSION_PACKET/gateway_self_destruction.txt
Normal file
42
SAFETY_REGRESSION_PACKET/gateway_self_destruction.txt
Normal file
@@ -0,0 +1,42 @@
|
||||
Test: Agent cannot self-destruct the gateway via terminal path
|
||||
Exit code: 0
|
||||
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/tools/test_approval.py::TestGatewayProtection --tb=short
|
||||
|
||||
=== STDOUT ===
|
||||
============================= test session starts ==============================
|
||||
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
|
||||
cachedir: .pytest_cache
|
||||
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
|
||||
configfile: pyproject.toml
|
||||
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
|
||||
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
|
||||
created: 14/14 workers
|
||||
14 workers [10 items]
|
||||
|
||||
scheduling tests via LoadScheduling
|
||||
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_disown_detected
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_pkill_unrelated_not_flagged
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_pkill_hermes_detected
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_setsid_detected
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_pkill_gateway_detected
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_foreground_not_flagged
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_killall_hermes_detected
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_ampersand_detected
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_systemctl_restart_flagged
|
||||
tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_nohup_detected
|
||||
[gw4] [ 10%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_foreground_not_flagged
|
||||
[gw5] [ 20%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_systemctl_restart_flagged
|
||||
[gw6] [ 30%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_pkill_hermes_detected
|
||||
[gw2] [ 40%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_nohup_detected
|
||||
[gw7] [ 50%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_killall_hermes_detected
|
||||
[gw3] [ 60%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_setsid_detected
|
||||
[gw8] [ 70%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_pkill_gateway_detected
|
||||
[gw0] [ 80%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_disown_detected
|
||||
[gw1] [ 90%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_gateway_run_with_ampersand_detected
|
||||
[gw9] [100%] PASSED tests/tools/test_approval.py::TestGatewayProtection::test_pkill_unrelated_not_flagged
|
||||
|
||||
============================== 10 passed in 1.97s ==============================
|
||||
|
||||
=== STDERR ===
|
||||
|
||||
32
SAFETY_REGRESSION_PACKET/malformed_key_check.txt
Normal file
32
SAFETY_REGRESSION_PACKET/malformed_key_check.txt
Normal file
@@ -0,0 +1,32 @@
|
||||
Test: Non-ASCII API key corruption is detected and sanitized
|
||||
Exit code: 0
|
||||
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential --tb=short
|
||||
|
||||
=== STDOUT ===
|
||||
============================= test session starts ==============================
|
||||
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
|
||||
cachedir: .pytest_cache
|
||||
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
|
||||
configfile: pyproject.toml
|
||||
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
|
||||
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
|
||||
created: 14/14 workers
|
||||
14 workers [5 items]
|
||||
|
||||
scheduling tests via LoadScheduling
|
||||
|
||||
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_all_ascii_no_warning
|
||||
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_ascii_key_unchanged
|
||||
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_empty_key
|
||||
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_strips_unicode_v_lookalike
|
||||
tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_strips_multiple_non_ascii
|
||||
[gw0] [ 20%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_ascii_key_unchanged
|
||||
[gw3] [ 40%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_empty_key
|
||||
[gw1] [ 60%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_strips_unicode_v_lookalike
|
||||
[gw4] [ 80%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_all_ascii_no_warning
|
||||
[gw2] [100%] PASSED tests/hermes_cli/test_non_ascii_credential.py::TestCheckNonAsciiCredential::test_strips_multiple_non_ascii
|
||||
|
||||
============================== 5 passed in 1.90s ===============================
|
||||
|
||||
=== STDERR ===
|
||||
|
||||
24
SAFETY_REGRESSION_PACKET/mcp_alias_resolution.txt
Normal file
24
SAFETY_REGRESSION_PACKET/mcp_alias_resolution.txt
Normal file
@@ -0,0 +1,24 @@
|
||||
Test: MCP aliases/toolsets resolve from live registry correctly
|
||||
Exit code: 0
|
||||
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/test_toolsets.py::TestValidateToolset::test_mcp_alias_uses_live_registry --tb=short
|
||||
|
||||
=== STDOUT ===
|
||||
============================= test session starts ==============================
|
||||
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
|
||||
cachedir: .pytest_cache
|
||||
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
|
||||
configfile: pyproject.toml
|
||||
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
|
||||
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
|
||||
created: 14/14 workers
|
||||
14 workers [1 item]
|
||||
|
||||
scheduling tests via LoadScheduling
|
||||
|
||||
tests/test_toolsets.py::TestValidateToolset::test_mcp_alias_uses_live_registry
|
||||
[gw0] [100%] PASSED tests/test_toolsets.py::TestValidateToolset::test_mcp_alias_uses_live_registry
|
||||
|
||||
============================== 1 passed in 2.11s ===============================
|
||||
|
||||
=== STDERR ===
|
||||
|
||||
24
SAFETY_REGRESSION_PACKET/tool_registry_proof.txt
Normal file
24
SAFETY_REGRESSION_PACKET/tool_registry_proof.txt
Normal file
@@ -0,0 +1,24 @@
|
||||
Test: Built-in tools auto-discover on startup
|
||||
Exit code: 0
|
||||
Command: /Users/apayne/.hermes/hermes-agent/venv/bin/python3 -m pytest -vv tests/tools/test_registry.py::TestBuiltinDiscovery::test_matches_previous_manual_builtin_tool_set --tb=short
|
||||
|
||||
=== STDOUT ===
|
||||
============================= test session starts ==============================
|
||||
platform darwin -- Python 3.11.14, pytest-9.0.3, pluggy-1.6.0 -- /Users/apayne/.hermes/hermes-agent/venv/bin/python3
|
||||
cachedir: .pytest_cache
|
||||
rootdir: /Users/apayne/burn-clone/STEP35-hermes-agent-1060
|
||||
configfile: pyproject.toml
|
||||
plugins: anyio-4.12.1, cov-7.1.0, xdist-3.8.0, split-0.11.0, asyncio-1.3.0
|
||||
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
|
||||
created: 14/14 workers
|
||||
14 workers [1 item]
|
||||
|
||||
scheduling tests via LoadScheduling
|
||||
|
||||
tests/tools/test_registry.py::TestBuiltinDiscovery::test_matches_previous_manual_builtin_tool_set
|
||||
[gw0] [100%] PASSED tests/tools/test_registry.py::TestBuiltinDiscovery::test_matches_previous_manual_builtin_tool_set
|
||||
|
||||
============================== 1 passed in 3.12s ===============================
|
||||
|
||||
=== STDERR ===
|
||||
|
||||
Reference in New Issue
Block a user