Compare commits
1 Commits
fix/851
...
claude/iss
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
11bdef4e3d |
@@ -1,4 +1,5 @@
|
||||
from agent.telemetry_logger import log_token_usage\n"""Shared auxiliary client router for side tasks.
|
||||
from agent.telemetry_logger import log_token_usage
|
||||
"""Shared auxiliary client router for side tasks.
|
||||
|
||||
Provides a single resolution chain so every consumer (context compression,
|
||||
session search, web extraction, vision analysis, browser vision) picks up
|
||||
@@ -396,7 +397,8 @@ class _CodexCompletionsAdapter:
|
||||
prompt_tokens=getattr(resp_usage, "input_tokens", 0),
|
||||
completion_tokens=getattr(resp_usage, "output_tokens", 0),
|
||||
total_tokens=getattr(resp_usage, "total_tokens", 0),
|
||||
)\n log_token_usage(usage.prompt_tokens, usage.completion_tokens, model)
|
||||
)
|
||||
log_token_usage(usage.prompt_tokens, usage.completion_tokens, model)
|
||||
except Exception as exc:
|
||||
logger.debug("Codex auxiliary Responses API call failed: %s", exc)
|
||||
raise
|
||||
@@ -529,7 +531,8 @@ class _AnthropicCompletionsAdapter:
|
||||
prompt_tokens=prompt_tokens,
|
||||
completion_tokens=completion_tokens,
|
||||
total_tokens=total_tokens,
|
||||
)\n log_token_usage(usage.prompt_tokens, usage.completion_tokens, model)
|
||||
)
|
||||
log_token_usage(usage.prompt_tokens, usage.completion_tokens, model)
|
||||
|
||||
choice = SimpleNamespace(
|
||||
index=0,
|
||||
|
||||
@@ -1,55 +0,0 @@
|
||||
# Issue #851 Verification
|
||||
|
||||
## Status: ✅ ALREADY IMPLEMENTED
|
||||
|
||||
Issue #851 is a research/audit issue whose own conclusion is that prompt caching is already extensively implemented in hermes-agent and that the remaining work is operational, not a repo-side code change.
|
||||
|
||||
This verification confirms that the current repo already contains the core implementation described in the issue body.
|
||||
|
||||
## Acceptance Criteria Check
|
||||
|
||||
1. ✅ Anthropic / OpenRouter prompt-caching support exists
|
||||
- `agent/prompt_caching.py:41-72` implements `apply_anthropic_cache_control()` with the documented system-plus-last-3 breakpoint strategy.
|
||||
- `run_agent.py:8301-8306` applies Anthropic/OpenRouter cache-control breakpoints during API message preparation.
|
||||
|
||||
2. ✅ OpenAI/Codex prompt-cache key support exists
|
||||
- `run_agent.py:6199-6213` sets `prompt_cache_key = self.session_id` on the responses path for non-GitHub responses.
|
||||
- `run_agent.py:3875-3878` explicitly passes through `prompt_cache_key` in normalized API kwargs.
|
||||
|
||||
3. ✅ System-prompt stability and cache-friendly message normalization exist
|
||||
- `run_agent.py:3155-3157` documents that the system prompt is cached and reused across turns to maximize prefix cache hits.
|
||||
- `run_agent.py:8314-8339` normalizes whitespace and tool-call JSON for bit-perfect prefix matching across turns.
|
||||
|
||||
4. ✅ Cache hit/miss logging infrastructure exists
|
||||
- `run_agent.py:8966-8980` logs cache read/write token stats, including `cached_tokens`, `cache_creation_input_tokens`, and hit percentage.
|
||||
|
||||
## Executed Verification
|
||||
|
||||
### Targeted tests run
|
||||
- `PYTHONPATH=/tmp/BURN2-FORGE-ALPHA-3 python3 -m pytest -q tests/agent/test_prompt_caching.py`
|
||||
- Result: `14 passed`
|
||||
|
||||
### Syntax verification
|
||||
- `PYTHONPATH=/tmp/BURN2-FORGE-ALPHA-3 python3 -m py_compile agent/prompt_caching.py run_agent.py`
|
||||
- Result: passed
|
||||
|
||||
## Evidence Summary
|
||||
|
||||
The issue body says:
|
||||
- prompt caching is already extensively implemented
|
||||
- the primary opportunities are operational: routing more workloads to Ollama, verifying provider support, and reporting cache hit rates
|
||||
|
||||
The repo state matches that conclusion:
|
||||
- caching primitives are present
|
||||
- integration points are wired into the runtime
|
||||
- targeted tests already exist and pass
|
||||
- no new implementation change is required to satisfy the issue's repo-side claim
|
||||
|
||||
## Recommendation
|
||||
|
||||
Close issue #851 as already implemented in the codebase.
|
||||
|
||||
If desired, follow-on work should be opened as separate operational issues for:
|
||||
- Ollama-heavy workload routing
|
||||
- provider-specific cache verification
|
||||
- nightly cache hit-rate reporting
|
||||
Reference in New Issue
Block a user