Commit Graph

20 Commits

Author SHA1 Message Date
teknium1
c1775de56f feat: filesystem checkpoints and /rollback command
Automatic filesystem snapshots before destructive file operations,
with user-facing rollback.  Inspired by PR #559 (by @alireza78a).

Architecture:
- Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR
- CheckpointManager: take/list/restore, turn-scoped dedup, pruning
- Transparent — the LLM never sees it, no tool schema, no tokens
- Once per turn — only first write_file/patch triggers a snapshot

Integration:
- Config: checkpoints.enabled + checkpoints.max_snapshots
- CLI flag: hermes --checkpoints
- Trigger: run_agent.py _execute_tool_calls() before write_file/patch
- /rollback slash command in CLI + gateway (list, restore by number)
- Pre-rollback snapshot auto-created on restore (undo the undo)

Safety:
- Never blocks file operations — all errors silently logged
- Skips root dir, home dir, dirs >50K files
- Disables gracefully when git not installed
- Shadow repo completely isolated from project git

Tests: 35 new tests, all passing (2798 total suite)
Docs: feature page, config reference, CLI commands reference
2026-03-10 00:49:15 -07:00
teknium1
fa2e72ae9c docs: document docker_volumes config for shared host directories
The Docker backend already supports user-configured volume mounts via
docker_volumes, but it was undocumented — missing from DEFAULT_CONFIG,
cli.py defaults, and configuration docs.

Changes:
- hermes_cli/config.py: Add docker_volumes to DEFAULT_CONFIG with
  inline documentation and examples
- cli.py: Add docker_volumes to load_cli_config defaults
- configuration.md: Full Docker Volume Mounts section with YAML
  examples, use cases (providing files, receiving outputs, shared
  workspaces), and env var alternative
2026-03-09 15:29:34 -07:00
teknium1
3ffaac00dd feat: bell_on_complete — terminal bell when agent finishes
Adds a simple config option to play the terminal bell (\a) when the
agent finishes a response. Useful for long-running tasks — switch to
another window and your terminal will ding when done.

Works over SSH since the bell character propagates through the
connection. Most terminal emulators can be configured to flash the
taskbar, play a sound, or show a visual indicator on bell.

Config (default: off):
  display:
    bell_on_complete: true

Closes #318
2026-03-08 21:30:48 -07:00
teknium1
a8bf414f4a feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.

## New tool: browser_console

Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.

## Enhanced tool: browser_vision(annotate=True)

New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.

## Config: browser.record_sessions

Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default

## Built-in skill: dogfood

Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
   (Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence

Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template

## Tests

21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation

Addresses #315.
2026-03-08 21:28:12 -07:00
teknium1
2d1a1c1c47 refactor: remove redundant 'openai' auxiliary provider, clean up docs
The 'openai' provider was redundant — using OPENAI_BASE_URL +
OPENAI_API_KEY with provider: 'main' already covers direct OpenAI API.

Provider options are now: auto, openrouter, nous, codex, main.

- Removed _try_openai(), _OPENAI_AUX_MODEL, _OPENAI_BASE_URL
- Replaced openai tests with codex provider tests
- Updated all docs to remove 'openai' option and clarify 'main'
- 'main' description now explicitly mentions it works with OpenAI API,
  local models, and any OpenAI-compatible endpoint

Tests: 2467 passed.
2026-03-08 18:50:26 -07:00
teknium1
71e81728ac feat: Codex OAuth vision support + multimodal content adapter
The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.

Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
  multimodal format to Responses API format:
  - {type: 'text'} → {type: 'input_text'}
  - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)

Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
  since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples

Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.

Tests: 2459 passed.
2026-03-08 18:44:33 -07:00
teknium1
ae4a674c84 feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.

Provider options are now: auto, openrouter, nous, openai, main.

Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
  _resolve_forced_provider(), updated auxiliary_max_tokens_param()
  to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
  configuration.md with Common Setups section showing OpenAI,
  OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution

Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
teknium1
169615abc8 docs: add Auxiliary Models section to user-facing configuration docs
Adds clear how-to documentation for changing the vision model, web
extraction model, and compression model to the user-facing docs site
(website/docs/user-guide/configuration.md).

Includes:
- Full auxiliary config.yaml example
- 'Changing the Vision Model' walkthrough with config + env var options
- Provider options table (auto/openrouter/nous/main)
- Multimodal safety warning for vision
- Environment variable reference table
- Updated the warning about OpenRouter-dependent tools to mention
  auxiliary model configuration
2026-03-08 18:10:55 -07:00
teknium1
cf63b2471f docs: add resume history display to sessions, CLI, config, and AGENTS docs
- sessions.md: New 'Conversation Recap on Resume' subsection with visual
  example, feature bullet points, and config snippet
- cli.md: New 'Session Resume Display' subsection with cross-reference
- configuration.md: Add resume_display to display settings YAML block
- AGENTS.md: Add _preload_resumed_session() and _display_resumed_history()
  to key components, add UX note about resume panel
2026-03-08 17:55:14 -07:00
teknium1
4be783446a fix: wire worktree flag into hermes CLI entry point + docs + tests
Critical fixes:
- Add --worktree/-w to hermes_cli/main.py argparse (both chat
  subcommand and top-level parser) so 'hermes -w' works via the
  actual CLI entry point, not just 'python cli.py -w'
- Pass worktree flag through cmd_chat() kwargs to cli_main()
- Handle worktree attr in bare 'hermes' and --resume/--continue paths

Bug fixes in cli.py:
- Skip worktree creation for --list-tools/--list-toolsets (wasteful)
- Wrap git worktree subprocess.run in try/except (crash on timeout)
- Add stale worktree pruning on startup (_prune_stale_worktrees):
  removes clean worktrees older than 24h left by crashed/killed sessions

Documentation updates:
- AGENTS.md: add --worktree to CLI commands table
- cli-config.yaml.example: add worktree config section
- website/docs/reference/cli-commands.md: add to core commands
- website/docs/user-guide/cli.md: add usage examples
- website/docs/user-guide/configuration.md: add config docs

Test improvements (17 → 31 tests):
- Stale worktree pruning (prune old clean, keep recent, keep dirty)
- Directory symlink via .worktreeinclude
- Edge cases (no commits, not a repo, pre-existing .worktrees/)
- CLI flag/config OR logic
- TERMINAL_CWD integration
- System prompt injection format
2026-03-07 21:05:40 -08:00
teknium1
b84f9e410c feat: default reasoning effort from xhigh to medium
Reduces token usage and latency for most tasks by defaulting to
medium reasoning effort instead of xhigh. Users can still override
via config or CLI flag. Updates code, tests, example config, and docs.
2026-03-07 10:14:19 -08:00
teknium1
388dd4789c feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers
Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn)
to the inference provider system. All use standard OpenAI-compatible
chat/completions endpoints with Bearer token auth.

Core changes:
- auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var
  fields. Added providers to PROVIDER_REGISTRY. Added provider aliases
  (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key
  providers in resolve_provider(). Added resolve_api_key_provider_credentials()
  and get_api_key_provider_status() helpers.
- runtime_provider.py: Added generic API-key provider branch in
  resolve_runtime_provider() — any provider with auth_type='api_key'
  is automatically handled.
- main.py: Added providers to hermes model menu with generic
  _model_flow_api_key_provider() flow. Updated _has_any_provider_configured()
  to check all provider env vars. Updated argparse --provider choices.
- setup.py: Added providers to setup wizard with API key prompts and
  curated model lists.
- config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY,
  etc.) to OPTIONAL_ENV_VARS.
- status.py: Added API key display and provider status section.
- doctor.py: Added connectivity checks for each provider endpoint.
- cli.py: Updated provider docstrings.

Docs: Updated README.md, .env.example, cli-config.yaml.example,
cli-commands.md, environment-variables.md, configuration.md.

Tests: 50 new tests covering registry, aliases, resolution, auto-detection,
credential resolution, and runtime provider dispatch.

Inspired by PR #33 (numman-ali) which proposed a provider registry approach.
Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related
provider improvements merged earlier in this changeset.
2026-03-06 18:55:18 -08:00
teknium1
68fbae5692 docs: add Custom & Self-Hosted LLM Providers guide
Comprehensive guide for using Hermes Agent with alternative LLM backends:
- Ollama (local models, zero config)
- vLLM (high-performance GPU inference)
- SGLang (RadixAttention, prefix caching)
- llama.cpp / llama-server (CPU & Metal inference)
- LiteLLM Proxy (multi-provider gateway)
- ClawRouter (cost-optimized routing with complexity scoring)
- 10+ other compatible providers table (Together, Groq, DeepSeek, etc.)
- Choosing the Right Setup decision table
- General custom endpoint setup instructions

All of these work via the existing OPENAI_BASE_URL + OPENAI_API_KEY
custom endpoint support — no code changes needed.
2026-03-06 14:16:06 -08:00
teknium1
39299e2de4 Merge PR #451: feat: Add Daytona environment backend
Authored by rovle. Adds Daytona as the sixth terminal execution backend
with cloud sandboxes, persistent workspaces, and full CLI/gateway integration.
Includes 24 unit tests and 8 integration tests.
2026-03-06 03:32:40 -08:00
teknium1
363633e2ba fix: allow self-hosted Firecrawl without API key + add self-hosting docs
On top of PR #460: self-hosted Firecrawl instances don't require an API
key (USE_DB_AUTHENTICATION=false), so don't force users to set a dummy
FIRECRAWL_API_KEY when FIRECRAWL_API_URL is set. Also adds a proper
self-hosting section to the configuration docs explaining what you get,
what you lose, and how to set it up (Docker stack, tradeoffs vs cloud).

Added 2 more tests (URL-only without key, neither-set raises).
2026-03-05 16:44:21 -08:00
caentzminger
d7d10b14cd feat(tools): add support for self-hosted firecrawl
Adds optional FIRECRAWL_API_URL environment variable to support
self-hosted Firecrawl deployments alongside the cloud service.

- Add FIRECRAWL_API_URL to optional env vars in hermes_cli/config.py
- Update _get_firecrawl_client() in tools/web_tools.py to accept custom API URL
- Add tests for client initialization with/without URL
- Document new env var in installation and config guides
2026-03-05 16:16:18 -06:00
rovle
74a36b0729 docs: add Daytona to backend lists in docs
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:55:41 -08:00
teknium1
19016497ef docs: fix all remaining minor accuracy issues
- updating.md: Note that 'hermes update' auto-handles config migration
- cli.md: Add summary_model to compression config, fix display config
  (add personality/compact), remove unverified pastes/ claim
- configuration.md: Add 5 missing config sections (stt, human_delay,
  code_execution, delegation, clarify), fix display defaults,
  fix reasoning_effort default to empty/unset
- messaging/index.md: Add GATEWAY_ALLOWED_USERS to security section
- skills.md: Add category field to skills_list return value
- mcp.md: Document auto-registered utility tools (resources/prompts)
- architecture.md: Fix file_tools.py reference, base_url default to None,
  synchronous agent loop pseudocode
- cli-commands.md: Fix hermes logout description
- environment-variables.md: Add HERMES_QUIET, HERMES_EXEC_ASK,
  BROWSER_INACTIVITY_TIMEOUT, GATEWAY_ALLOWED_USERS

Verification scan: 27/27 checks passed, zero issues remaining.
2026-03-05 07:00:51 -08:00
teknium1
d578d06f59 docs: comprehensive accuracy audit fixes (35+ corrections)
CRITICAL fixes:
- Installation: Remove false prerequisites (installer auto-installs everything except git)
- Tools: Remove non-existent 'web_crawl' tool from tools table
- Memory: Remove non-existent 'read' action (only add/replace/remove exist)
- Code execution: Fix 'search' to 'search_files' in sandbox tools list
- CLI commands: Fix --model/--provider/--toolsets/--verbose as chat subcommand flags

IMPORTANT fixes:
- Installation: Add missing installer features (Node.js, ripgrep, ffmpeg, skills seeding)
- Installation: Add 6 missing package extras to table (mcp, honcho, tts-premium, etc)
- Installation: Fix mkdir to include all directories the installer creates
- Quickstart: Add OpenAI Codex to provider table
- CLI: Fix all 'hermes --flag' to 'hermes chat --flag' across all docs
- Configuration: Remove non-existent --max-turns CLI flag
- Tools: Fix 'search' to 'search_files', add missing 'process' tool
- Skills: Remove skills_categories() (not a registered tool)
- Cron: Remove unsupported 'daily at 9am' schedule format
- TTS: Fix output directory to ~/.hermes/audio_cache/
- Delegation: Clarify depth limit wording
- Architecture: Fix default model, chat() signature, file names
- Contributing: Fix Python requirement from 3.11+ to 3.10+
- CLI reference: Add missing commands (login, tools, sessions subcommands)
- Env vars: Fix TERMINAL_DOCKER_IMAGE default, add HERMES_MODEL
2026-03-05 06:50:22 -08:00
teknium1
ada3713e77 feat: add documentation website (Docusaurus)
- 25 documentation pages covering Getting Started, User Guide, Developer Guide, and Reference
- Docusaurus with custom amber/gold theme matching the landing page branding
- GitHub Actions workflow to deploy landing page + docs to GitHub Pages
- Landing page at root, docs at /docs/ on hermes-agent.nousresearch.com
- Content extracted and restructured from existing repo docs (README, AGENTS.md, CONTRIBUTING.md, docs/)
- Auto-deploy on push to main when website/ or landingpage/ changes
2026-03-05 05:24:55 -08:00