Commit Graph

17 Commits

Author SHA1 Message Date
Teknium
5c479eedf1 feat: improve context compaction handoff summaries (#1273)
Adapt PR #916 onto current main by replacing the old context summary marker
with a clearer handoff wrapper, updating the summarization prompt for
resume-oriented summaries, and preserving the current call_llm-based
compression path.
2026-03-14 02:33:31 -07:00
Teknium
07927f6bf2 feat(stt): add free local whisper transcription via faster-whisper (#1185)
* fix: Home Assistant event filtering now closed by default

Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.

Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)

A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.

All 49 gateway HA tests + 52 HA tool tests pass.

* docs: update Home Assistant integration documentation

- homeassistant.md: Fix event filtering docs to reflect closed-by-default
  behavior. Add watch_all option. Replace Python dict config example with
  YAML. Fix defaults table (was incorrectly showing 'all'). Add required
  configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
  diagram, platform toolsets table, and Next Steps links.

* fix(terminal): strip provider env vars from background and PTY subprocesses

Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:

- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)

Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.

Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.

Gap identified by PR #1004 (@PeterFile).

* feat(delegate): add observability metadata to subagent results

Enrich delegate_task results with metadata from the child AIAgent:

- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status

Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.

Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness

Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>

* feat(stt): add free local whisper transcription via faster-whisper

Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):

  STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)

Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
  config loading, local faster-whisper backend, and OpenAI API backend.
  Auto-downloads model (~150MB for 'base') on first voice message.
  Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
  provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
  non-string content (dict from llama.cpp, None). Fixes #1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
  config examples, and fallback behavior

Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user

Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>

---------

Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
Teknium
1bb8ed4495 chore: lower default compression threshold from 85% to 50% (#1096)
* fix: ClawHub skill install — use /download ZIP endpoint

The ClawHub API v1 version endpoint only returns file metadata
(path, size, sha256, contentType) without inline content or download
URLs. Our code was looking for inline content in the metadata, which
never existed, causing all ClawHub installs to fail with:
'no inline/raw file content was available'

Fix: Use the /api/v1/download endpoint (same as the official clawhub
CLI) to download skills as ZIP bundles and extract files in-memory.

Changes:
- Add _download_zip() method that downloads and extracts ZIP bundles
- Retry on 429 rate limiting with Retry-After header support
- Path sanitization and binary file filtering for security
- Keep _extract_files() as a fallback for inline/raw content
- Also fix nested file lookup (version_data.version.files)

* chore: lower default compression threshold from 85% to 50%

Triggers context compression earlier — at 50% of the model's context
window instead of 85%. Updated in all four places where the default
is defined: context_compressor.py, cli.py, run_agent.py, config.py,
and gateway/run.py.
2026-03-12 15:51:50 -07:00
teknium1
0aa31cd3cb feat: call_llm/async_call_llm + config slots + migrate all consumers
Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).
2026-03-11 20:52:19 -07:00
teknium1
8805e705a7 feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886
2026-03-11 19:46:47 -07:00
teknium1
77da3bbc95 fix: use correct role for summary message in context compressor
The summary message was always injected as 'user' role, which causes
consecutive user messages when the last preserved head message is also
'user'. Some APIs reject this (400 error), and it produces malformed
training data.

Fix: check the role of the last head message and pick the opposite role
for the summary — 'user' after assistant/tool, 'assistant' after user.

Based on PR #328 by johnh4098. Closes #328.
2026-03-08 23:09:04 -07:00
teknium1
d9f373654b feat: enhance auxiliary model configuration and environment variable handling
- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.
2026-03-08 18:06:47 -07:00
teknium1
306d92a9d7 refactor(context_compressor): improve summary generation logic and error handling
Updated the _generate_summary method to attempt summary generation using the auxiliary model first, with a fallback to the main model. If both attempts fail, the method now returns None instead of a placeholder, allowing the caller to handle missing summaries appropriately. This change enhances the robustness of context compression and improves logging for failure scenarios.
2026-03-07 11:54:51 -08:00
teknium1
5da55ea1e3 fix: sanitize orphaned tool-call/result pairs in message compression
Enhance message compression by adding a method to clean up orphaned tool-call and tool-result pairs. This ensures that the API receives well-formed messages, preventing errors related to mismatched IDs. The new functionality includes removing orphaned results and adding stub results for missing calls, improving overall message integrity during compression.
2026-03-07 08:08:00 -08:00
teknium1
c886333d32 feat: smart context length probing with persistent caching + banner display
Replaces the unsafe 128K fallback for unknown models with a descending
probe strategy (2M → 1M → 512K → 200K → 128K → 64K → 32K). When a
context-length error occurs, the agent steps down tiers and retries.
The discovered limit is cached per model+provider combo in
~/.hermes/context_length_cache.yaml so subsequent sessions skip probing.

Also parses API error messages to extract the actual context limit
(e.g. 'maximum context length is 32768 tokens') for instant resolution.

The CLI banner now displays the context window size next to the model
name (e.g. 'claude-opus-4 · 200K context · Nous Research').

Changes:
- agent/model_metadata.py: CONTEXT_PROBE_TIERS, persistent cache
  (save/load/get), parse_context_limit_from_error(), get_next_probe_tier()
- agent/context_compressor.py: accepts base_url, passes to metadata
- run_agent.py: step-down logic in context error handler, caches on success
- cli.py + hermes_cli/banner.py: context length in welcome banner
- tests: 22 new tests for probing, parsing, and caching

Addresses #132. PR #319's approach (8K default) rejected — too conservative.
2026-03-05 16:09:57 -08:00
teknium1
3e2ed18ad0 fix: fallback to main model endpoint when auxiliary summary client fails
When the auxiliary client (used for context compression summaries) fails
— e.g. due to a stale OpenRouter API key after switching to a local LLM
— fall back to the user's active endpoint (OPENAI_BASE_URL) instead of
returning a useless static summary string.

This handles the common scenario where a user switches providers via
'hermes model' but the old provider's API key remains in .env. The
auxiliary client picks up the stale key, fails (402/auth error), and
previously compression would produce garbage. Now it gracefully retries
with the working endpoint.

On successful fallback, the working client is cached for future
compressions in the same session so the fallback cost is paid only once.

Ref: #348
2026-03-04 17:58:09 -08:00
teknium1
25c65bc99e fix(agent): handle None content in context compressor (fixes #211)
The OpenAI API returns content: null on assistant messages that only
contain tool calls. msg.get('content', '') returns None (not '') when
the key exists with value None, causing TypeError on len() and string
concatenation in _generate_summary and compress.

Fix: msg.get('content') or '' — handles both missing keys and None.

Tests from PR #216 (@Farukest). Fix also in PR #215 (@cutepawss).
Both PRs had stale branches and couldn't be merged directly.

Closes #211
2026-03-02 01:35:52 -08:00
teknium1
500f0eab4a refactor(cli): Finalize OpenAI Codex Integration with OAuth
- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.
2026-02-28 21:47:51 -08:00
teknium1
6366177118 refactor: update context compression configuration to use config.yaml and improve model handling 2026-02-28 04:46:38 -08:00
teknium1
58fce0a37b feat(api): implement dynamic max tokens handling for various providers
- Added _max_tokens_param method in AIAgent to return appropriate max tokens parameter based on the provider (OpenAI vs. others).
- Updated API calls in AIAgent to utilize the new max tokens handling.
- Introduced auxiliary_max_tokens_param function in auxiliary_client for consistent max tokens management across auxiliary clients.
- Refactored multiple tools to use auxiliary_max_tokens_param for improved compatibility with different models and providers.
2026-02-26 20:23:56 -08:00
teknium1
ededaaa874 Hermes Agent UX Improvements 2026-02-22 02:16:11 -08:00
teknium1
9123cfb5dd Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00