[Research] Prompt Caching Optimization — Existing Implementation Audit & Optimization Plan #851
Open
opened 2026-04-16 02:16:37 +00:00 by Timmy
·
2 comments
No Branch/Tag Specified
main
burn/1148-1777638825
step35/1123-adopt-lightrag-telegram-skil
step35/744-fix-gateway-reconnect-drops
step35/711-bug-crisis-hook-logs-warning
step35/858-feat-n8n-workflow-orchestrat
step35/955-qa-verify-kittentts-local-pr
step35/706-p2-crisis-detection-needs-mu
step35/745-feat-provider-cost-estimator
step35/749-feat-batch-tool-execution-wi
step35/754-feat-skill-dependency-resolv
step35/734-mcp-add-pid-file-lock-to-pre
step35/980-triage-live-memory-provider
step35/713-bug-llama-server-down-on-por
step35/712-anthropic-cybersecurity-skil
step35/708-eval-test-atlas-inference-en
step35/702-improvement-crisis-detection
step35/1120-hackathon-creative-submissio
step35/1063-release-proof-104-cluster-a
step35/1065-release-proof-104-cluster-c
step35/1124-integrate-self-improving-her
step35/990-bug-mempalace-plugin-still-f
step35/1064-release-proof-104-cluster-b
step35/837-poka-yoke-auto-revert-incomp
step35/1055-release-proof-cluster-c-comp
fix/cluster-b-oneshot-proof
step35/1054-release-proof-cluster-b-gate
step35/695-p0-crisis-detection-resource
fix/1071-oneshot-empty-response-v2
step35/1056-release-proof-cluster-d-api
step35/1060-safety-regression-packet
step35/1058-release-proof-cluster-f-runt
step35/857-feat-lightrag-integration-fo
step35/779-tests-acp-test-collection-fa
step35/740-feat-wire-crisis-detection-i
step35/707-test-crisis-hook-needs-integ
fix/1071-oneshot-empty-response
step35/981-bug-holographic-prefetch-mis
step35/1053-release-proof-cluster-a-disc
step35/1059-release-proof-cluster-g-mess
step35/781-structured-output-enforcemen
fix/817
claude/issue-1040
fix/851
fix/796
fix/862
fix/950
fix/868
fix/860
fix/952
fix/877
fix/878
claude/issue-1013
fix/951
fix/801
fix/804
fix/840
fix/1010
fix/1011
fix/799
fix/1013
fix/966
fix/949
fix/955
claude/issue-990
fix/1012
fix/954
fix/953
fix/797
fix/956
fix/960
fix/958
fix/957
fix/959
claude/issue-985
step35/1051-browser-use-audit
feat/sovereign-accountant-agent-1776866068545
feat/budgetary-router-1776864510362
feat/sovereign-teleport-1776864503956
fix/842
burn/hermes-agent-334
fix/837
fix/836
fix/982
fix/800
fix/835
fix/853
fix/859
fix/857
burn/hermes-agent-132
burn/hermes-agent-323
fix/876
burn/hermes-agent-327
burn/983-context-snapshots
fix/798
burn/858-1776835055
burn/805-1776835051
claude/issue-962
claude/issue-961
feat/context-rag-decision-framework-rebased
fix/967
feat/autonomous-scavenger-fix-1776827712502
feat/sovereign-scavenger-1776827259631
feat/greenlight-protocol-1776827237329
feat/static-analyzer-gofai-1776826921747
feat/local-inference-bridge-1776826896029
feat/symbolic-verify-gofai-1776826842170
feat/impact-analysis-tool-1776826592325
feat/fleet-concurrency-guard-1776826501792
feat/robust-local-completion-1776824630428
claude/issue-816
claude/issue-806
claude/issue-923
burn/924-1776770102
burn/922-1776770102
burn/925-1776770102
fix/879
fix/938
fix/880
fix/865-ci-path-linter
burn/921-poka-yoke-hardcoded-paths
burn/923-1776770102
claude/issue-926
feat/web-console-394
fix/922
fix/923
fix/924
fix/926
fix/925
feat/security-deep-dive-1776732106631
fix/921-hardcoded-path-guard
fix/format-error-missing
fix/885-circuit-breaker
fix/delete-stale-prioritized-issues
fix/916
feat/session-management-1776700585635
feat/shield-multilingual-1776700482647
fix/913-syntax-validation
claude/issue-892
fix/886
fix/888
feat/queue-health-check
fix/887-path-validation-read-file
fix/839-1776403070
fix/889
fix/902-cron-audit
fix/891-profile-isolation
fix/time-aware-routing-889
fix/838-1776402240
feat/673
fix/892
fix/890-dead-cron-jobs
feat/888-ast-parse-syntax-validation
fix/816
claude/issue-834
fix/813
fix/803
fix/701
fix/read_file-not-found-hint
fix/806
fix/673
fix/681
fix/679
burn/838-1776304432
fix/865
fix/agent-card-discovery-1776311100051
burn/842-1776304432
burn/817-1776304433
fix/707-crisis-integration-test
claude/issue-809
burn/835-1776304433
burn/809-1776304433
burn/838-context-budget-tracker
fix/748-session-compaction
fix/749-batch-tool-execution
burn/836-1776304433
burn/798-1776304433
burn/834-1776304433
fix/839
fix/795
fix/749-v2
fix/834-1776303915
fix/834
fix/819
feat/822-a2a-health
feat/vitalik-secure-llm-1776303263
fix/711-crisis-hook-log-level
fix/712
claude/issue-705
fix/141-crisis-tool
fix/706
fix/702-multilingual-crisis-detection
burn/804-1776264500
fix/713
fix/796-tool-call-benchmark
fix/708
feat/a2a-health-check-805
burn/796-tool-call-benchmark
feat/802-a2a-agent-card
feat/crisis-protocol-1776270957872
burn/800-gemma4-multimodal
feat/806-a2a-mtls
fix/resolve-397-conflict-1776303120
feat/robust-tool-orchestration-1776268138150
fix/658
burn/781-1776263880
fix/779
fix/781-json-repair
fix/659
fix/660
fix/662
fix/664
fix/665
feat/43-context-rag-decision-framework
fix/667
fix/672
fix/742
claude/issue-695
fix/666
fix/744-gateway-cron-notification-drop
fix/663
fix/670
fix/740
fix/741
fix/752
fix/753
fix/755
fix/754
fix/222
fix/746
fix/cron-delivery-retry-744
fix/743
fix/756
fix/747
fix/744
fix/705
fix/745
fix/668-api
fix/748
fix/749
fix/695
fix/725
fix/669
fix/issue-734
fix/734
burn/714-1776218235
burn/713-1776218256
burn/714-1776218190
fix/693-crisis-notification-integration
fix/issue-642-1
fix/issue-702-7
fix/issue-701-8
fix/issue-692-3
fix/issue-706-6
fix/issue-694-1
fix/issue-645-6
fix/issue-707-5
fix/issue-708-4
fix/issue-643-8
fix/issue-711-3
fix/issue-714-1
fix/issue-713-2
fix/issue-644-7
fix/cron-schedule-parse-error
feat/670-approval-tiers
feat/679-crisis-wiring
feat/674-atlas-inference-engine
feat/atlas-provider
feat/671-hybrid-search
fix/677-crisis-hook
feat/673-988-crisis-escalation
feat/672-soul-crisis-protocol
fix/673-crisis-hook-integration
fix/677-crisis-hook-integration
fix/672-crisis-protocol
feat/681-path-aware-risk
feat/671-hybrid-search-router
feat/667-context-faithful-prompting
fix/670-approval-tiers
feat/672-crisis-protocol
feat/673-988-lifeline
burn/validate-action-pokayoke
fix/693-test-branch
perf/lazy-session-creation
fix/624-error-context
fix/626-validate-feedback
fix/614-multilingual-shield
claude/issue-628
claude/issue-613
dispatch/350-1776180746
dispatch/295-1776180746
dispatch/329-1776180746
dispatch/372-1776180746
dispatch/327-1776180746
dispatch/326-1776180746
dispatch/296-1776180746
dispatch/375-1776180746
dispatch/321-1776180746
dispatch/324-1776180746
claude/issue-592
fix/582-shield-tool-args
claude/issue-579
claude/issue-565
dawn/295-1776130053
triage/295-1776129677
claude/issue-556
q/295-1776129480
dawn/326-1776130053
fix/538-context-pressure-threshold
fix/561-ssh-dispatch
dawn/322-1776130053
dawn/372-1776130053
triage/326-1776129677
dawn/350-1776130053
dawn/378-1776130053
dawn/329-1776130053
triage/322-1776129677
triage/372-1776129677
dawn/327-1776130053
feat/505-session-templates
triage/378-1776129677
triage/350-1776129677
q/372-1776129480
triage/329-1776129677
triage/327-1776129677
q/322-1776129480
q/378-1776129480
dawn/296-1776130053
fix/500-cloud-context-warning
queue/372-1776129201
dawn/324-1776130053
q/327-1776129480
q/329-1776129480
queue/378-1776129201
dawn/288-1776130053
q/350-1776129480
triage/296-1776129677
triage/324-1776129677
fix/499-hardcoded-paths
q/324-1776129480
q/296-1776129480
queue/322-1776129201
queue/327-1776129201
queue/324-1776129201
q/326-1776129480
queue/296-1776129201
queue/350-1776129201
fix/478-tilde-expand
fix/478-tilde-expansion
queue/329-1776129201
fix/478-hermes-home-tilde-expand
fix/468-cron-cloud-context
fix/479-optional-skills-hardcoded-paths
fix/479-hardcoded-paths
q/316-1776129677
q/288-1776129480
feat/334-profile-scoped-cron
whip/326-1776128804
dawn/375-1776130053
fix/375-deploy-crons-model-provider-comparison
whip/324-1776128804
burn/456-1776129600
triage/375-1776129677
fix/468-1776128804
q/375-1776129480
whip/372-1776128804
queue/375-1776129201
queue/288-1776129201
fix/457-ssh-dispatch-validation
whip/350-1776128804
whip/251-1776128804
fix/456-cloud-context-warning
queue/321-1776129201
whip/378-1776128804
whip/322-1776128804
whip/327-1776128804
whip/329-1776128804
whip/288-1776128804
whip/327-1776127281
whip/375-1776128804
whip/321-1776128804
dispatch/288-1776180746
triage/288-1776129677
whip/296-1776128804
whip/293-1776127532
whip/350-1776127532
whip/372-1776127532
whip/378-1776127532
whip/329-1776127532
whip/322-1776127532
whip/316-1776127532
whip/321-1776127532
whip/326-1776127532
whip/327-1776127532
whip/314-1776127532
whip/288-1776127532
whip/375-1776127532
burn/379-1776125702
burn/373-1776125702
queue/326-1776129201
burn/373-1776120221
burn/378-1776117791
burn/375-1776117778
burn/379-1776117790
burn/376-1776117777
burn/377-1776117775
burn/acp-272-1776117838
burn/web-console-325
burn/profile-cron-334
burn/prompt-injection-324
burn/skill-revert-295
burn/context-overflow-296
burn/honcho-eval-322
burn/privacy-filter-283
burn/model-benchmark-287
feat/marathon-session-limits-326
fix/poka-yoke-hardcoded-paths
feat/315-session-gc
feature/time-aware-model-routing-317
fix/gateway-config-debt-328
feat/lazy-session-creation
burn/378-1776120221
fix/weak-credential-guard
fix/tool-return-type-validation
fix/memory-no-match-not-error
feat/temporal-decay-holographic-memory
fix/syntax-preflight-execute-code
fix/cron-script-failure-detection
fix/empty-model-preflight
fix/cron-sync-guard-v2
fix/cron-interpreter-shutdown-352
feat/error-circuit-breaker
fix/circuit-breaker-error-cascade
feat/cron-run-now
ci/fix-mempalace-syntax
claude/issue-351
fix/cron-tick-backlog
feat/deploy-sync-guard
feat/20260413-cron-agent-kwargs
feat/profile-scoped-cron
fix/cron-ticker-startup
fix/empirical-audit-hardening
feat/skills-index-workflow
fix/credential-guard
feat/research-paper-scaffolder
feat/cron-tool-choice-propagation
perplexity/provider-allowlist
fix/json-repair-for-tool-calls
feat/context-rag-decision-framework
census/feature-inventory
fix/ci-stability
keymaxx/mimoomni/243
feature/improve-sovereignty-justification
claude/issue-1135
feat/mempalace-portal-1775695506634
feat/ci-no-duplicate-models
feat/mempalace-tool-1775642243437
fix/kimi-fallback-model
perplexity/mempalace-tests
upstream-sync
claude/issue-192
claude/issue-190
rescue/ollama-provider
rescue/v011-obfuscation-fix
claw-code/issue-151
claw-code/issue-126
groq/issue-168
timmy/issue-169-ollama-provider
gemini/issue-24
claude/issue-128
claude/issue-142
claude/issue-133
claude/issue-143
claude/issue-146
claude/issue-155
claude/issue-147
claude/issue-148
claude/issue-149
epic-999-phase-ii-forge
timmy/issue-123-process-resilience
timmy/issue-116-config-validation
epic-999-phase-i
security/v-011-skills-guard-bypass
gemini/security-hardening
gemini/sovereign-gitea-client
timmy-custom
security/fix-oauth-session-fixation
security/fix-skills-path-traversal
security/fix-file-toctou
security/fix-error-disclosure
security/add-rate-limiting
security/fix-browser-cdp
security/fix-docker-privilege
security/fix-auth-bypass
fix/sqlite-contention
tests/security-coverage
security/fix-race-condition
security/fix-ssrf
security/fix-secret-leakage
feat/gen-ai-evolution-phases-19-21
feat/gen-ai-evolution-phases-16-18
feat/gen-ai-evolution-phases-13-15
security/fix-path-traversal
security/fix-command-injection
feat/gen-ai-evolution-phases-10-12
feat/gen-ai-evolution-phases-7-9
feat/gen-ai-evolution-phases-4-6
feat/gen-ai-evolution-phases-1-3
feat/sovereign-evolution-redistribution
feat/apparatus-verification
feat/sovereign-intersymbolic-ai
feat/sovereign-learning-system
feat/sovereign-reasoning-engine
v7.0.0
BezalelGolden-20260408-195253
GoldenRockachopa
Labels
Clear labels
CI
QA
accessibility
assigned-claw-code
assigned-kimi
audit-finding
batch-pipeline
blocked
bug
claw-code-done
claw-code-in-progress
critical
docs
duplicate
epic
epic:memory
game
gaming
implementation
infra
infrastructure
juice
kimi-done
kimi-in-progress
mcp
mechanics
mobile
morrowind
needs-review
ops
p0-critical
p1-important
performance
phase:1-activate
phase:2-operational
phase:3-episodic
phase:4-hygiene
phase:5-integration
poka-yoke
polish
priority-high
priority-low
priority-medium
research
security
shared-infra
sovereign
stale
throughput-10x
token-masterplan
tool-evaluation
ux
velocity-engine
visuals
wont-fix
Continuous integration, runners, workflow issues
Quality assurance, testing, production audit
Auto-created for game epic
Queued for Code Claw (qwen/openrouter)
Task assigned to KimiClaw for processing
batch-pipeline label
Blocked by external dependency or merge conflict
Code Claw completed this task
Code Claw is actively working
Auto-created for game epic
Duplicate of another issue
Epic - large feature with multiple sub-tasks
Unified Memory Architecture epic
Auto-created for game epic
Gaming agent capabilities
Auto-created for game epic
Infrastructure, VPS, DNS, deployment
Auto-created for game epic
Auto-created for game epic
KimiClaw has completed this task
KimiClaw is actively working on this
MCP (Model Context Protocol) tools & servers
Auto-created for game epic
Auto-created for game epic
Morrowind Agent gameplay & MCP integration
PR or issue requires reviewer sign-off before merge/close
Auto-created for game epic
Phase 1: Activate existing systems
Phase 2: Operational memory bridge
Phase 3: Proactive episodic recall
Phase 4: Memory hygiene and quality
Phase 5: Integration and testing
Auto-created for game epic
Auto-created for game epic
Auto-created for game epic
Auto-created for game epic
Security hardening, vulnerability fixes
Shared infrastructure across projects (Playground + LPM)
Auto-created for game epic
No activity, pending triage or closure
throughput-10x label
token-masterplan label
Auto-created for game epic
Auto-generated by velocity engine
Auto-created for game epic
Closed as intentionally not fixing — explicit descope
No Label
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
Rockachopa
Timmy
allegro
antigravity
bezalel
claude
claw-code
codex-agent
ezra
gemini
google
grok
hermes
kimi
manus
perplexity
sonnet
Clear assignees
No Assignees
codex-agent
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Timmy_Foundation/hermes-agent#851
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Research Report: Prompt Caching Optimization
Backlog Item: #7 — Prompt Caching Optimization (Ratio: 2.0)
Research Date: 2026-04-16
Researcher: Hermes Overnight Scout (cron job)
TL;DR
Prompt caching is already extensively implemented in hermes-agent. The codebase has sophisticated support for Anthropic, OpenAI, Qwen portal, and Ollama automatic prefix caching. The primary optimization opportunity is routing more workloads to local Ollama where we measured 28.2x speedup on exact prefix matches.
Key Findings
Empirical benchmarks (Apple M3 Max, Ollama v0.20.2, gemma4:8b Q4_K_M):
Current implementation (already in codebase):
Recommended Actions (Priority Order)
Route cron jobs through Ollama — 47+ jobs could benefit from automatic prefix caching. Savings: ~1.2s/job/turn, ~5.4 hours/month recovered compute.
Verify Nous Research API caching — Default provider may not support
prompt_tokens_details.cached_tokens. Test and document.Add nightly cache hit rate report — Logging exists. Add a cron job to report cache percentages per provider.
Tune smart_model_routing thresholds — Route more simple queries to Ollama for cache benefits.
Set Ollama keep_alive to 24h — Prevent cold starts on frequently-used models.
Cost Impact
Full Research Brief
See attached detailed research brief for complete benchmarks, architecture analysis, and implementation recommendations.
This issue documents existing implementation and identifies optimization opportunities. No code changes required in hermes-agent — the caching architecture is solid. Work is operational: expand Ollama routing, verify provider support, monitor cache rates.
Verified on a fresh clone of current forge
mainthat prompt caching is already implemented in-repo and the issue body is describing an operational optimization lane, not a missing code slice.Evidence checked:
agent/prompt_caching.pyexists and implements Anthropiccache_controlbreakpointsrun_agent.pyalready wires_use_prompt_caching,apply_anthropic_cache_control, OpenAIprompt_cache_key, xAIx-grok-conv-id, and Ollama context/prefix-cache supportwebsite/docs/developer-guide/context-compression-and-caching.mdandwebsite/docs/integrations/providers.mdpython3 -m py_compile agent/prompt_caching.py tests/agent/test_prompt_caching.py run_agent.pyandpytest -q tests/agent/test_prompt_caching.py(14 passed)Conclusion: no truthful hermes-agent code delta remains for #851. The remaining work is operational (route more cron workloads to Ollama, verify provider cache support, add cache-rate monitoring), so I am stopping without opening a duplicate PR.
PR #1044 created for #851.
What landed:
docs/issue-851-verification.mddocumenting that the prompt-caching architecture described in the issue already exists onmainprompt_cache_key, system-prompt stability, and cache hit/miss loggingVerification:
PYTHONPATH=/tmp/BURN2-FORGE-ALPHA-3 python3 -m pytest -q tests/agent/test_prompt_caching.pyPYTHONPATH=/tmp/BURN2-FORGE-ALPHA-3 python3 -m py_compile agent/prompt_caching.py run_agent.py