hermes-agent/agent at 86960cdbb0148145890e2ee90b4e157fa899f6e1 - hermes-agent - Hermes Gitea

Timmy_Foundation/hermes-agent

Files

History

Teknium c8a5e36be8 feat(prompting): self-optimized GPT/Codex tool-use guidance via automated behavioral benchmarking (#6120 )

Hermes Agent identified and patched its own prompting blind spots through
automated self-evaluation — running 64+ tool-use benchmarks across GPT-5.4
and Codex-5.3, diagnosing 5 failure modes, writing targeted prompt patches,
and verifying the fix in a closed loop.

Failure modes discovered and fixed:
- Mental arithmetic (wrong answers: 39,152,053 vs correct 39,151,253)
- User profile hallucination ('Windows 11' when running on Linux)
- Time guessing without verification
- Clarification-seeking instead of acting ('open where?' for port checks)
- Hash computation from memory (SHA-256, encodings)
- Confusing system RAM with agent's own persistent memory store

Two new XML sections added to OPENAI_MODEL_EXECUTION_GUIDANCE:
- <mandatory_tool_use>: explicit categories that must always use tools
- <act_dont_ask>: default to action on obvious interpretations

Results:
  gpt-5.4:       68.8% → 100% tool compliance (+31.2pp)
  gpt-5.3-codex: 62.5% → 100% tool compliance (+37.5pp)
  Regression:    0/8 conversational prompts over-tooled

2026-04-08 04:06:42 -07:00

..

__init__.py

Refactor Terminal and AIAgent cleanup

2026-02-21 22:31:43 -08:00

anthropic_adapter.py

fix(anthropic): smart thinking block signature management (#6112 )

2026-04-08 03:38:08 -07:00

auxiliary_client.py

fix(minimax): correct context lengths, model catalog, thinking guard, aux model, and config base_url

2026-04-08 02:20:46 -07:00

builtin_memory_provider.py

refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites

2026-04-07 13:36:38 -07:00

context_compressor.py

Fix compaction summary retries for temperature-restricted models

2026-04-06 16:49:57 -07:00

context_references.py

refactor: replace inline HERMES_HOME re-implementations with get_hermes_home()

2026-04-07 10:40:34 -07:00

copilot_acp_client.py

fix: bridge tool-calls in copilot-acp adapter

2026-04-06 01:47:57 -07:00

credential_pool.py

refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )

2026-04-07 10:25:31 -07:00

display.py

refactor: remove 24 confirmed dead functions — 432 lines of unused code

2026-04-07 11:41:26 -07:00

insights.py

fix(insights): show cache tokens in overview so total adds up (#4428 )

2026-04-01 03:06:47 -07:00

memory_manager.py

refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites

2026-04-07 13:36:38 -07:00

memory_provider.py

refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )

2026-04-07 10:25:31 -07:00

model_metadata.py

fix(minimax): correct context lengths, model catalog, thinking guard, aux model, and config base_url

2026-04-08 02:20:46 -07:00

models_dev.py

refactor: replace inline HERMES_HOME re-implementations with get_hermes_home()

2026-04-07 10:40:34 -07:00

prompt_builder.py

feat(prompting): self-optimized GPT/Codex tool-use guidance via automated behavioral benchmarking (#6120 )

2026-04-08 04:06:42 -07:00

prompt_caching.py

fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter

2026-03-21 16:54:43 -07:00

redact.py

fix: mem0 API v2 compat, prefetch context fencing, secret redaction (#5423 )

2026-04-05 22:43:33 -07:00

retry_utils.py

feat(agent): add jittered retry backoff

2026-04-08 00:41:36 -07:00

skill_commands.py

feat(skills): add skill config interface + llm-wiki skill (#5635 )

2026-04-06 13:49:13 -07:00

skill_utils.py

refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )

2026-04-07 10:25:31 -07:00

smart_model_routing.py

Merge branch 'main' into rewbs/tool-use-charge-to-subscription

2026-04-02 11:00:35 +11:00

subdirectory_hints.py

refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )

2026-04-07 10:25:31 -07:00

title_generator.py

feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597 )

2026-03-28 14:35:28 -07:00

trajectory.py

Refactor Terminal and AIAgent cleanup

2026-02-21 22:31:43 -08:00

usage_pricing.py

fix: status bar shows 26K instead of 260K for token counts with trailing zeros (#3024 )

2026-03-25 12:45:58 -07:00