hermes-agent

Author	SHA1	Message	Date
teknium1	c886333d32	feat: smart context length probing with persistent caching + banner display Replaces the unsafe 128K fallback for unknown models with a descending probe strategy (2M → 1M → 512K → 200K → 128K → 64K → 32K). When a context-length error occurs, the agent steps down tiers and retries. The discovered limit is cached per model+provider combo in ~/.hermes/context_length_cache.yaml so subsequent sessions skip probing. Also parses API error messages to extract the actual context limit (e.g. 'maximum context length is 32768 tokens') for instant resolution. The CLI banner now displays the context window size next to the model name (e.g. 'claude-opus-4 · 200K context · Nous Research'). Changes: - agent/model_metadata.py: CONTEXT_PROBE_TIERS, persistent cache (save/load/get), parse_context_limit_from_error(), get_next_probe_tier() - agent/context_compressor.py: accepts base_url, passes to metadata - run_agent.py: step-down logic in context error handler, caches on success - cli.py + hermes_cli/banner.py: context length in welcome banner - tests: 22 new tests for probing, parsing, and caching Addresses #132. PR #319's approach (8K default) rejected — too conservative.	2026-03-05 16:09:57 -08:00
teknium1	3e2ed18ad0	fix: fallback to main model endpoint when auxiliary summary client fails When the auxiliary client (used for context compression summaries) fails — e.g. due to a stale OpenRouter API key after switching to a local LLM — fall back to the user's active endpoint (OPENAI_BASE_URL) instead of returning a useless static summary string. This handles the common scenario where a user switches providers via 'hermes model' but the old provider's API key remains in .env. The auxiliary client picks up the stale key, fails (402/auth error), and previously compression would produce garbage. Now it gracefully retries with the working endpoint. On successful fallback, the working client is cached for future compressions in the same session so the fallback cost is paid only once. Ref: #348	2026-03-04 17:58:09 -08:00
teknium1	25c65bc99e	fix(agent): handle None content in context compressor (fixes #211 ) The OpenAI API returns content: null on assistant messages that only contain tool calls. msg.get('content', '') returns None (not '') when the key exists with value None, causing TypeError on len() and string concatenation in _generate_summary and compress. Fix: msg.get('content') or '' — handles both missing keys and None. Tests from PR #216 (@Farukest). Fix also in PR #215 (@cutepawss). Both PRs had stale branches and couldn't be merged directly. Closes #211	2026-03-02 01:35:52 -08:00
teknium1	500f0eab4a	refactor(cli): Finalize OpenAI Codex Integration with OAuth - Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults. - Updated the context compressor's summary target tokens to 2500 for improved performance. - Added external credential detection for Codex CLI to streamline authentication. - Refactored various components to ensure consistent handling of authentication and model selection across the application.	2026-02-28 21:47:51 -08:00
teknium1	6366177118	refactor: update context compression configuration to use config.yaml and improve model handling	2026-02-28 04:46:38 -08:00
teknium1	58fce0a37b	feat(api): implement dynamic max tokens handling for various providers - Added _max_tokens_param method in AIAgent to return appropriate max tokens parameter based on the provider (OpenAI vs. others). - Updated API calls in AIAgent to utilize the new max tokens handling. - Introduced auxiliary_max_tokens_param function in auxiliary_client for consistent max tokens management across auxiliary clients. - Refactored multiple tools to use auxiliary_max_tokens_param for improved compatibility with different models and providers.	2026-02-26 20:23:56 -08:00
teknium1	ededaaa874	Hermes Agent UX Improvements	2026-02-22 02:16:11 -08:00
teknium1	9123cfb5dd	Refactor Terminal and AIAgent cleanup	2026-02-21 22:31:43 -08:00

8 Commits