hermes-agent

Author	SHA1	Message	Date
teknium1	bf8350ac18	fix: evaluate() uses full agent loop with tools, not single-turn The evaluate method was doing single-turn chat_completion (no tools), which defeats the purpose of an agentic research benchmark. Fixed to run the full HermesAgentLoop with web_search/web_extract tools. Results comparison (Claude Sonnet 4.5, FRAMES benchmark): Without tools (broken): 0.56 mean correctness With agent loop + tools: 1.00 mean correctness, 0.994 reward New eval metrics: mean_correctness, mean_reward, mean_tool_calls, tool_usage_rate — all logged via evaluate_log() in lighteval format.	2026-03-09 19:53:28 -07:00
teknium1	a5c6348d41	Merge: WebResearchEnv compute_reward fix (verified with live test)	2026-03-09 19:29:19 -07:00
teknium1	320f881e0b	fix: WebResearchEnv compute_reward extracts from AgentResult.messages AgentResult has .messages (list of dicts), not .final_response or .tool_calls. Fixed compute_reward to extract the final response and tool names from the message history. Verified with live process mode test: - Agent used 7 tool calls (web_search, web_extract) - Produced a 1106-char researched response about Winter Olympics - Reward: 0.384 (partial correctness via LLM judge) - JSONL output contains valid tokens, masks, scores, messages	2026-03-09 19:29:12 -07:00
Brooklyn Nicholson	0d96f1991c	test: parallelize test suite with pytest-xdist ~2min sequential runs were painful. Added pytest-xdist and -n auto to run across all available cores. Tests already isolate state via tmp_path fixtures so no changes needed to test code. Local: 2677 passed in ~30s. CI gets 4 vCPUs on ubuntu-latest.	2026-03-09 20:47:34 -05:00
teknium1	172a38c344	fix: Docker persistent bind mounts fail with Permission denied cap-drop ALL removes DAC_OVERRIDE, which root needs to write to bind-mounted directories owned by the host user (uid 1000). This broke persistent Docker sandboxes — the container couldn't write to /workspace or /root. Add back the minimum capabilities needed: - DAC_OVERRIDE: root can write to bind-mounted dirs owned by host user - CHOWN: package managers (pip, npm, apt) need to set file ownership - FOWNER: needed for operations on files owned by other users Still drops all other capabilities (NET_RAW, SYS_ADMIN, etc.) and keeps no-new-privileges. Security boundary is the container itself. Verified end-to-end: create files → destroy container → new container with same task_id → files persist on host and are accessible in the new container.	2026-03-09 17:52:33 -07:00
teknium1	8bc0d4f77d	Merge: WebResearchEnv Atropos standards compliance	2026-03-09 17:45:57 -07:00
teknium1	8eabdefa8a	fix: bring WebResearchEnv up to Atropos environment standards The environment was merged missing several standard components. Updated to match the patterns established by 82 Atropos environments and our own HermesAgentBaseEnv contract. Added: - WebResearchEnvConfig — custom Pydantic config with reward weights, efficiency thresholds, eval settings, dataset config (all tunable via CLI/YAML without code changes) - config_init() classmethod — default server config (OpenRouter + Claude) so the env works out of the box - wandb_log() override — logs reward breakdown metrics (correctness, tool_usage, efficiency, diversity, correct_rate, tool_usage_rate) with proper buffer management and super() call - evaluate() — uses server.chat_completion instead of broken stub _run_agent_on_item(). Logs via evaluate_log() for lighteval- compatible output. Fixed: - Removed broken _run_agent_on_item() stub that returned empty results - evaluate() now uses server.chat_completion (same pattern as TerminalTestEnv) for actual model evaluation - compute_reward reads tool calls from AgentResult properly - LLM judge uses self.server.chat_completion instead of ctx Reward config is now tunable without code changes: --env.correctness_weight 0.6 --env.tool_usage_weight 0.2 --env.efficiency_weight 0.2 --env.diversity_bonus 0.1 --env.efficient_max_calls 5	2026-03-09 17:45:50 -07:00
teknium1	f658af45c2	Merge PR #446 : fix(cli): use correct visibility filter string in codex API model fetch Authored by PercyDikec. Fixes #445. Changes 'hide' to 'hidden' in _fetch_models_from_api to match _read_cache_models and the actual API response format.	2026-03-09 17:42:39 -07:00
teknium1	5212644861	fix(security): prevent shell injection in tilde-username path expansion Validate that the username portion of ~username paths contains only valid characters (alphanumeric, dot, hyphen, underscore) before passing to shell echo for expansion. Previously, paths like '~; rm -rf /' would be passed unquoted to self._exec(f'echo {path}'), allowing arbitrary command execution. The approach validates the username rather than using shlex.quote(), which would prevent tilde expansion from working at all since echo '~user' outputs the literal string instead of expanding it. Added tests for injection blocking and valid ~username/path expansion. Credit to @alireza78a for reporting (PR #442, issue #442).	2026-03-09 17:33:19 -07:00
teknium1	1151f84351	Merge PR #434 : feat: add WebResearchEnv RL environment for multi-step web research Authored by jackx707. Adds web_research_env.py (Atropos RL environment for multi-step web research using FRAMES benchmark) and batch generation config.	2026-03-09 17:24:20 -07:00
teknium1	9abd6bf342	fix: gateway missing docker_volumes config bridge + list serialization bug The gateway's config.yaml → env var bridge was missing docker_volumes, so Docker volume mounts configured in config.yaml were ignored for gateway sessions (Telegram, Discord, etc.) while working in CLI. Also fixes list serialization: str() produces Python repr with single quotes which json.loads() in terminal_tool.py can't parse. Now uses json.dumps() for list values. Based on PR #431 by @manuelschipper (applied manually due to stale branch).	2026-03-09 17:24:00 -07:00
Teknium	d2c7ef6b41	Merge pull request #792 from NousResearch/hermes/hermes-d2f5523a Merge PR #428: Improve type hints and error diagnostics in vision_tools + add 42 tests	2026-03-09 17:21:44 -07:00
0xbyt4	4e3a8a0637	fix: handle empty choices in MCP sampling callback SamplingHandler.__call__ accessed response.choices[0] without checking if the list was non-empty. LLM APIs can return empty choices on content filtering, provider errors, or rate limits, causing an unhandled IndexError that propagates to the MCP SDK and may crash the connection. Add a defensive guard that returns a proper ErrorData when choices is empty, None, or missing. Includes three test cases covering all variants.	2026-03-10 02:24:53 +03:00
teknium1	a34102049b	Merge: vision auto-detection fallback to local endpoints	2026-03-09 15:36:27 -07:00
teknium1	ef5d811aba	fix: vision auto-detection now falls back to custom/local endpoints Vision auto-mode previously only tried OpenRouter, Nous, and Codex for multimodal — deliberately skipping custom endpoints with the assumption they 'may not handle vision input.' This caused silent failures for users running local multimodal models (Qwen-VL, LLaVA, Pixtral, etc.) without any cloud API keys. Now custom endpoints are tried as a last resort in auto mode. If the model doesn't support vision, the API call fails gracefully — but users with local vision models no longer need to manually set auxiliary.vision.provider: main in config.yaml. Reported by @Spadav and @kotyKD.	2026-03-09 15:36:19 -07:00
teknium1	2d44ed1c5b	test: add comprehensive tests for vision_tools (42 tests) Covers PR #428 changes and existing vision_tools functionality: - _validate_image_url: 20 tests for urlparse-based validation - _determine_mime_type: 6 tests for MIME type detection - _image_to_base64_data_url: 3 tests for base64 conversion - _handle_vision_analyze: 5 tests for type hints, prompt building, AUXILIARY_VISION_MODEL env var override - Error logging exc_info: 3 async tests verifying stack traces are logged on download failure, analysis error, and cleanup error - check_vision_requirements & get_debug_session_info: 2 basic tests - Registry integration: 3 tests for tool registration	2026-03-09 15:32:02 -07:00
teknium1	fa2e72ae9c	docs: document docker_volumes config for shared host directories The Docker backend already supports user-configured volume mounts via docker_volumes, but it was undocumented — missing from DEFAULT_CONFIG, cli.py defaults, and configuration docs. Changes: - hermes_cli/config.py: Add docker_volumes to DEFAULT_CONFIG with inline documentation and examples - cli.py: Add docker_volumes to load_cli_config defaults - configuration.md: Full Docker Volume Mounts section with YAML examples, use cases (providing files, receiving outputs, shared workspaces), and env var alternative	2026-03-09 15:29:34 -07:00
teknium1	5bfc4ed53b	Merge PR #428 : Improve type hints and error diagnostics in vision_tools Authored by aydnOktay. Improves URL validation with urlparse, adds exc_info to error logs for full stack traces, and tightens type hints. Resolved merge conflict in _handle_vision_analyze: kept PR's string formatting with our AUXILIARY_VISION_MODEL env var logic.	2026-03-09 15:27:54 -07:00
teknium1	520aec20e0	fix: add mcp to dev dependencies for test suite MCP tests import from mcp.types but mcp wasn't in the dev optional dependencies. Fresh 'pip install -e .[dev]' setups failed 3 tests. Based on PR #427 by @teyrebaz33 (applied manually due to stale branch).	2026-03-09 15:12:54 -07:00
teknium1	64bec1d060	fix: Slack gateway setup missing event subscriptions and scopes The 'hermes gateway setup' instructions for Slack were missing: - The 'Subscribe to Events' step entirely (message.im, message.channels, app_mention, message.groups) - Several required scopes (app_mentions:read, groups:history, users:read, files:write) - Warning about bot only working in DMs without message.channels - Step to invite the bot to channels The 'hermes setup' flow (setup.py) and the website docs (slack.md) already had the correct information — only gateway.py was outdated. Reported by JordanB on Slack.	2026-03-09 14:31:19 -07:00
teknium1	ac58309dbd	docs: improve Slack setup guide with channel event subscriptions and scopes The #1 support issue with Slack is 'bot works in DMs but not channels'. This is almost always caused by missing event subscriptions (message.channels, message.groups) or missing OAuth scopes (channels:history, groups:history). Changes: - slack.md: Move channels:history and groups:history from optional to required scopes. Move message.channels and message.groups to required events. Add new 'How the Bot Responds' section explaining DM vs channel behavior. Add Step 8 for inviting bot to channels. Expand troubleshooting table with specific 'works in DMs not channels' entry. Add quick checklist for channel debugging. - setup.py: Expand Slack setup wizard with all required scopes, event subscriptions, and a warning that without message.channels/message.groups the bot only works in DMs. Add link to full docs. Improve Member ID discovery instructions. - config.py: Update SLACK_BOT_TOKEN and SLACK_APP_TOKEN descriptions to list required scopes and event subscriptions inline.	2026-03-09 14:00:11 -07:00
teyrebaz33	94023e6a85	feat: conditional skill activation based on tool availability Skills can now declare fallback_for_toolsets, fallback_for_tools, requires_toolsets, and requires_tools in their SKILL.md frontmatter. The system prompt builder filters skills automatically based on which tools are available in the current session. - Add _read_skill_conditions() to parse conditional frontmatter fields - Add _skill_should_show() to evaluate conditions against available tools - Update build_skills_system_prompt() to accept and apply tool availability - Pass valid_tool_names and available toolsets from run_agent.py - Backward compatible: skills without conditions always show; calling build_skills_system_prompt() with no args preserves existing behavior Closes #539	2026-03-09 23:13:39 +03:00
teknium1	5eaf4a3f32	feat: Telegram send_document and send_video for native file attachments Implement send_document() and send_video() overrides in TelegramAdapter so the agent can deliver files (PDFs, CSVs, docs, etc.) and videos as native Telegram attachments instead of just printing the file path as text. The base adapter already routes MEDIA:<path> tags by extension — audio goes to send_voice(), images to send_image_file(), and everything else falls through to send_document(). But TelegramAdapter didn't override send_document() or send_video(), so those fell back to plain text. Now when the agent includes MEDIA:/path/to/report.pdf in its response, users get a proper downloadable file attachment in Telegram. Features: - send_document: sends files via bot.send_document with display name, caption (truncated to 1024), and reply_to support - send_video: sends videos via bot.send_video with inline playback - Both fall back to base class text if the Telegram API call fails - 10 new tests covering success, custom filename, file-not-found, not-connected, caption truncation, API error fallback, and reply_to Requested by @TigerHixTang on Twitter.	2026-03-09 13:07:10 -07:00
Teknium	a5a5d82a21	Merge pull request #784 from NousResearch/feat/slack-app-mention-and-documents feat(slack): fix app_mention 404 + add document/video support	2026-03-09 13:04:50 -07:00
teknium1	34e8d088c2	feat(slack): fix app_mention 404 + add document/video support - Register no-op app_mention event handler to suppress Bolt 404 errors. The 'message' handler already processes @mentions in channels, so app_mention is acknowledged without duplicate processing. - Add send_document() for native file attachments (PDFs, CSVs, etc.) via files_upload_v2, matching the pattern from Telegram PR #779. - Add send_video() for native video uploads via files_upload_v2. - Handle incoming document attachments from users: download, cache, and inject text content for .txt/.md files (capped at 100KB), following the same pattern as the Telegram adapter. - Add _download_slack_file_bytes() helper for raw byte downloads. - Add 24 new tests covering all new functionality. Fixes the unhandled app_mention events reported in gateway logs.	2026-03-09 13:02:59 -07:00
memosr.eth	b78b605ba9	fix: replace print() with logger.error() in file_tools	2026-03-09 22:29:16 +03:00
teyrebaz33	c3cf88b202	feat(cli,gateway): add /personality none and custom personality support Closes #643 Changes: - /personality none\|default\|neutral — clears system prompt overlay - Custom personalities in config.yaml support dict format with: name, description, system_prompt, tone, style directives - Backwards compatible — existing string format still works - CLI + gateway both updated - 18 tests covering none/default/neutral, dict format, string format, list display, save to config	2026-03-09 17:31:54 +03:00
0xbyt4	58b756f04c	fix: clean up empty file after failed wl-paste clipboard extraction When wl-paste produces empty output, the destination file was left on disk as a 0-byte orphan. Now explicitly removed before returning False.	2026-03-09 17:17:10 +03:00
0xbyt4	34f8ac2d85	fix: replace blocking time.sleep with await asyncio.sleep in WhatsApp connect time.sleep(1) inside async def connect() blocks the entire event loop for 1 second. Replaced with await asyncio.sleep(1) to yield control back to the event loop while waiting for the killed port process to release.	2026-03-09 17:16:26 +03:00
0xbyt4	1a10eb8cd9	fix: off-by-one in setup toggle selection error message Error message said "between 1 and N+1" for N items, showing a max value that would itself be rejected. Now correctly says "between 1 and N".	2026-03-09 17:15:23 +03:00
luisv-1	59705b80cd	Add tools summary flag to Hermes CLI Made-with: Cursor	2026-03-09 16:50:53 +03:00
aydnOktay	46a7d6aeb2	Improve Telegram gateway error handling and logging	2026-03-09 15:58:01 +03:00
teknium1	c754135965	fix: banner wraps in narrow terminals (Kitty, small windows) The full HERMES-AGENT ASCII logo needs ~95 columns, and the side-by-side caduceus + tools panel needs ~80. In narrow terminals (Kitty default, resized windows) everything wraps into visual garbage. Fixes: - show_banner() auto-detects terminal width and falls back to compact banner when < 80 columns - build_welcome_banner() skips the ASCII logo when < 95 columns - Compact banner now dynamically sized via _build_compact_banner() instead of a hardcoded 64-char box that also wrapped in narrow terms - Same width checks applied to /clear command's banner refresh The up/down arrow key issue in Kitty terminal for multiline input is a known Kitty keyboard protocol (CSI u) vs prompt_toolkit compatibility gap — arrow keys work correctly in standard terminals and tmux. Users can work around it by running in tmux or setting TERM=xterm-256color.	2026-03-09 05:57:36 -07:00
teknium1	c6b75baad0	feat: find-nearby skill and Telegram location support Adds a 'find-nearby' skill for discovering nearby places using OpenStreetMap (Overpass + Nominatim). No API keys needed. Works with: - Coordinates (from Telegram location pins) - Addresses, cities, zip codes, landmarks (auto-geocoded) - Multiple place types (restaurant, cafe, bar, pharmacy, etc.) Returns names, distances, cuisine, hours, addresses, and Google Maps links (pin + directions). 184-line stdlib-only script. Also adds Telegram location message handling: - New MessageType.LOCATION in gateway base - Telegram adapter handles LOCATION and VENUE messages - Injects lat/lon coordinates into conversation context - Prompts agent to ask what the user wants nearby Inspired by PR #422 (reimplemented with simpler script and broader skill scope — addresses/cities/zips, not just Telegram coordinates).	2026-03-09 05:31:10 -07:00
teknium1	a7ad6f6d28	Merge: custom providers instant activation + model persistence	2026-03-09 05:08:01 -07:00
teknium1	1a2141d04d	fix: custom providers activate immediately, save model name Selecting a saved custom provider now switches instantly without probing /models — the model name is stored in the config entry as a complete profile (name + url + key + model). Changes: - custom_providers entries now include 'model' field - Selecting a saved provider with a model just activates it - Only probes /models if no model is saved (first-time setup) - Menu shows saved model name: 'Local (localhost:8000) — llama-70b' - Dedup on re-entry: still activates the model, just doesn't add a duplicate config entry (updates model name if changed)	2026-03-09 05:07:53 -07:00
teknium1	ff3f3169b2	Merge: auto-save custom endpoints + removal option	2026-03-09 04:58:27 -07:00
teknium1	f4580b6010	feat: auto-save custom endpoints + removal option When a user adds a custom endpoint via 'hermes model' → 'Custom endpoint', it now automatically saves to custom_providers in config.yaml so it persists and appears in the provider menu on subsequent runs. Deduplicates by base_url. Auto-generated names based on URL: http://localhost:8000/v1 → 'Local (localhost:8000)' https://xyz.runpod.ai/v1 → 'RunPod (xyz.runpod.ai)' https://api.example.com/v1 → 'Api.example.com' Also adds 'Remove a saved custom provider' option to the menu (only shown when custom providers exist) with a selection UI to pick which one to remove. Users can also manually edit custom_providers in config.yaml for full control over names and settings.	2026-03-09 04:58:20 -07:00
aydnOktay	d82fcef91b	Improve Discord gateway error handling and logging	2026-03-09 14:33:21 +03:00
teknium1	7b63a787b3	Merge: named custom providers in hermes model	2026-03-09 03:45:26 -07:00
teknium1	069570d103	feat: support multiple named custom providers in `hermes model` Users with multiple local servers or custom endpoints can now define them all in config.yaml and switch between them from the model selection menu: custom_providers: - name: 'Local Llama 70B' base_url: 'http://localhost:8000/v1' api_key: 'not-needed' - name: 'RunPod vLLM' base_url: 'https://xyz.runpod.ai/v1' api_key: 'rp_xxxxx' These appear in `hermes model` provider selection alongside the built-in providers. When selected, the endpoint's /models API is probed to show available models in a selection menu. Previously only a single 'Custom endpoint' option existed, requiring manual URL entry each time you wanted to switch between local servers. Requested by @ZiarnoBobu on Twitter.	2026-03-09 03:45:17 -07:00
teknium1	0dafdcab86	Merge: skill reorganization + sub-category support - Sub-category support in prompt_builder.py (backwards-compatible) - Split mlops (40 skills) into 7 logical sub-categories - Merged 8 singleton categories into logical parents - Fixed 2 misplaced skills (code-review, ml-paper-writing)	2026-03-09 03:40:11 -07:00
Teknium	654e16187e	feat(mcp): add sampling support — server-initiated LLM requests (#753 ) Add MCP sampling/createMessage capability via SamplingHandler class. Text-only sampling + tool use in sampling with governance (rate limits, model whitelist, token caps, tool loop limits). Per-server audit metrics. Based on concept from PR #366 by eren-karakus0. Restructured as class-based design with bug fixes and tests using real MCP SDK types. 50 new tests, 2600 total passing.	2026-03-09 03:37:38 -07:00
teknium1	732c66b0f3	refactor: reorganize skills into sub-categories The skills directory was getting disorganized — mlops alone had 40 skills in a flat list, and 12 categories were singletons with just one skill each. Code change: - prompt_builder.py: Support sub-categories in skill scanner. skills/mlops/training/axolotl/SKILL.md now shows as category 'mlops/training' instead of just 'mlops'. Backwards-compatible with existing flat structure. Split mlops (40 skills) into 7 sub-categories: - mlops/training (12): accelerate, axolotl, flash-attention, grpo-rl-training, peft, pytorch-fsdp, pytorch-lightning, simpo, slime, torchtitan, trl-fine-tuning, unsloth - mlops/inference (8): gguf, guidance, instructor, llama-cpp, obliteratus, outlines, tensorrt-llm, vllm - mlops/models (6): audiocraft, clip, llava, segment-anything, stable-diffusion, whisper - mlops/vector-databases (4): chroma, faiss, pinecone, qdrant - mlops/evaluation (5): huggingface-tokenizers, lm-evaluation-harness, nemo-curator, saelens, weights-and-biases - mlops/cloud (2): lambda-labs, modal - mlops/research (1): dspy Merged singleton categories: - gifs → media (gif-search joins youtube-content) - music-creation → media (heartmula, songsee) - diagramming → creative (excalidraw joins ascii-art) - ocr-and-documents → productivity - domain → research (domain-intel) - feeds → research (blogwatcher) - market-data → research (polymarket) Fixed misplaced skills: - mlops/code-review → software-development (not ML-specific) - mlops/ml-paper-writing → research (academic writing) Added DESCRIPTION.md files for all new/updated categories.	2026-03-09 03:35:53 -07:00
teknium1	1f0944de21	fix: handle non-string content from OpenAI-compatible servers (#759 ) Some local LLM servers (llama-server, etc.) return message.content as a dict or list instead of a plain string. This caused AttributeError 'dict object has no attribute strip' on every API call. Normalizes content to string immediately after receiving the response: - dict: extracts 'text' or 'content' field, falls back to json.dumps - list: extracts text parts (OpenAI multimodal content format) - other: str() conversion Applied at the single point where response.choices[0].message is read in the main agent loop, so all downstream .strip()/.startswith()/[:100] operations work regardless of server implementation. Closes #759	2026-03-09 03:32:32 -07:00
0xbyt4	912efe11b5	fix(tests): add content attribute to fake result objects _FakeReadResult and _FakeSearchResult now expose the attributes that read_file_tool/search_tool access after the redact_sensitive_text integration from main.	2026-03-09 13:25:52 +03:00
0xbyt4	4684aaffdc	merge: resolve file_tools.py conflict with origin/main Combine read/search loop detection with main's redact_sensitive_text and truncation hint features. Add tracker reset to TestSearchHints to prevent cross-test state leakage.	2026-03-09 13:21:46 +03:00
teknium1	f1a1b58319	fix: hermes setup doesn't update provider when switching to OpenRouter When switching FROM Codex/Nous/custom TO OpenRouter via 'hermes setup', the old provider stayed active because setup only saved the API key but never updated config.yaml or auth.json. This caused resolve_provider() to keep returning the old provider (e.g. openai-codex) even after the user selected OpenRouter. Fix: the OpenRouter path in setup now deactivates any OAuth provider in auth.json and writes model.provider='openrouter' to config.yaml, matching what all other provider paths already do.	2026-03-09 03:14:22 -07:00
teknium1	c21d77ca08	Merge: OBLITERATUS skill v2.0 + unified gateway compression OBLITERATUS skill (PR #408 updated): - 9 CLI methods, 28 analysis modules, 116 model presets - Default method: advanced (multi-direction SVD, norm-preserving) - Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20% - References, templates, and real-world pitfalls included Gateway compression fix (PR #739): - Unified session hygiene with agent compression config - Uses model context length × compression.threshold from config.yaml - Removed hardcoded 100k/200-msg thresholds	2026-03-09 02:59:41 -07:00
teknium1	d6c710706f	docs: add real-world testing findings to OBLITERATUS skill Added pitfalls discovered during live abliteration testing: - Models < 1B have fragmented refusal, respond poorly (0.5B: 60%→20%) - Models 3B+ work much better (3B: 75%→0% with advanced defaults) - aggressive method can backfire on small models (made it worse) - Spectral certification RED is common even when refusal rate is 0% - Fixed torch property: total_mem → total_memory	2026-03-09 02:52:54 -07:00

... 3 4 5 6 7 ...

1369 Commits