- Rename per-LLM-call hooks from pre_llm_request/post_llm_request for clarity vs pre_llm_call
- Emit summary kwargs only (counts, usage dict from normalize_usage); keep env_var_enabled for HERMES_DUMP_REQUESTS
- Add is_truthy_value/env_var_enabled to utils; wire hermes_cli.plugins._env_enabled through it
- Update Langfuse local setup doc; add scripts/langfuse_smoketest.py and optional ~/.hermes plugin tests
Made-with: Cursor
Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.
The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.
Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.
Supersedes #5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.
Co-identified-by: OutThisLife (PR #5138)
The plugin system defined six lifecycle hooks but only pre_tool_call and
post_tool_call were invoked. This activates the remaining four so that
external plugins (e.g. memory systems) can hook into the conversation
loop without touching core code.
Hook semantics:
- on_session_start: fires once when a new session is created
- pre_llm_call: fires once per turn before the tool-calling loop;
plugins can return {"context": "..."} to inject into the ephemeral
system prompt (not cached, not persisted)
- post_llm_call: fires once per turn after the loop completes, with
user_message and assistant_response for sync/storage
- on_session_end: fires at the end of every run_conversation call
invoke_hook() now returns a list of non-None callback return values,
enabling pre_llm_call context injection while remaining backward
compatible (existing hooks that return None are unaffected).
Salvaged from PR #2823.
Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>
- test_plugins.py: remove tests for unimplemented plugin command API
(get_plugin_command_handler, register_command never existed)
- test_redact.py: add autouse fixture to clear HERMES_REDACT_SECRETS
env var leaked by cli.py import in other tests
- test_signal.py: same HERMES_REDACT_SECRETS fix for phone redaction
- test_mattermost.py: add @bot_user_id to test messages after the
mention-only filter was added in #2443
- test_context_token_tracking.py: mock resolve_provider_client for
openai-codex provider that requires real OAuth credentials
Full suite: 5893 passed, 0 failed.
Reverts the sanitizer addition from PR #2466 (originally #2129).
We already have _empty_content_retries handling for reasoning-only
responses. The trailing strip risks silently eating valid messages
and is redundant with existing empty-content handling.