diff --git a/README.md b/README.md index 6d4b436a5..c05b65e8d 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,29 @@ # Hermes Agent ⚕ -An AI agent with advanced tool-calling capabilities, featuring a flexible toolsets system, messaging integrations, and scheduled tasks. +

+ Discord + License: MIT + Built by Nous Research +

+ +**An open-source AI agent you can actually live with.** Install it on a machine, give it your messaging accounts, and it becomes a persistent personal agent that grows with you — learning your projects, building its own skills, running tasks on a schedule, and reaching you wherever you are. It's not a coding copilot tethered to an IDE or a chatbot wrapper around a single API. It's an autonomous agent that lives on your server, remembers what it learns, and gets more capable the longer it runs. + +Use any model you want — log in with a [Nous Portal](https://portal.nousresearch.com) subscription for zero-config access, connect an [OpenRouter](https://openrouter.ai) key for 200+ models, or point it at your own VLLM/SGLang endpoint. Switch with `hermes model` — no code changes, no lock-in. + +Built by [Nous Research](https://nousresearch.com). Under the hood, the same architecture powers [batch data generation](#batch-processing) and [RL training environments](#-atropos-rl-environments) for training the next generation of tool-calling models. + + + + + + + + + +
A real terminal interfaceNot a web UI — a full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output. Built for people who live in the terminal and want an agent that keeps up.
Lives where you doTelegram, Discord, Slack, WhatsApp, and CLI — all from a single gateway process. Send it a voice memo from your phone, get a researched answer with citations. Cross-platform message mirroring means a conversation started on Telegram can continue on Discord.
Grows the longer it runsPersistent memory across sessions — the agent remembers your preferences, your projects, your environment. When it solves a hard problem, it writes a skill document for next time. Skills are searchable, shareable, and compatible with the agentskills.io open standard. A Skills Hub lets you install community skills or publish your own.
Scheduled automationsBuilt-in cron scheduler with delivery to any platform. Set up a daily AI funding report delivered to Telegram, a nightly backup verification on Discord, a weekly dependency audit that opens PRs, or a morning news briefing — all in natural language. The gateway runs them unattended.
Delegates and parallelizesSpawn isolated subagents for parallel workstreams — each gets its own conversation and terminal. The agent can also write Python scripts that call its own tools via RPC, collapsing multi-step pipelines into a single turn with zero intermediate context cost.
Real sandboxingFive terminal backends — local, Docker, SSH, Singularity, and Modal — with persistent workspaces, background process management, with the option to make these machines ephemeral. Run it against a remote machine so it can't modify its own code.
Research-readyBatch runner for generating thousands of tool-calling trajectories in parallel. Atropos RL environments for training models with reinforcement learning on agentic tasks. Trajectory compression for fitting training data into token budgets.
+ +--- ## Quick Install @@ -29,8 +51,9 @@ The installer will: After installation, reload your shell and run: ```bash -hermes setup # Configure API keys (if you skipped during install) -hermes # Start chatting! +source ~/.bashrc # or: source ~/.zshrc +hermes setup # Configure API keys (if you skipped during install) +hermes # Start chatting! ``` --- @@ -41,44 +64,20 @@ The installer (`hermes setup`) walks you through selecting a provider and model. ```bash hermes # Start chatting! +hermes model # Switch provider or model interactively +hermes tools # See all available tools ``` -To change your provider or model later: - -```bash -hermes model # Interactive provider & model selector -``` - -This lets you switch between **Nous Portal** (subscription), **OpenRouter** (100+ models, pay-per-use), or a **custom endpoint** (VLLM, SGLang, any OpenAI-compatible API) at any time. +This lets you switch between **Nous Portal** (subscription), **OpenRouter** (200+ models, pay-per-use), or a **custom endpoint** (VLLM, SGLang, any OpenAI-compatible API) at any time. --- ## Updating -**Quick update (installer version):** ```bash hermes update # Update to latest version (prompts for new config) ``` -**Manual update (if you cloned the repo yourself):** -```bash -cd /path/to/hermes-agent -export VIRTUAL_ENV="$(pwd)/venv" - -# Pull latest code and submodules -git pull origin main -git submodule update --init --recursive - -# Reinstall (picks up new dependencies) -uv pip install -e ".[all]" -uv pip install -e "./mini-swe-agent" -uv pip install -e "./tinker-atropos" - -# Check for new config options added since your last update -hermes config check -hermes config migrate # Interactively add any missing options -``` - **Uninstalling:** ```bash hermes uninstall # Uninstall (can keep configs for later reinstall) @@ -153,14 +152,12 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro | Feature | Provider | Env Variable | |---------|----------|--------------| -| Custom OpenAI Endpoint (OAI or VLLM/SGLANG) | [platform.openai.com](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` | | Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY` | | Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` | | Image generation | [FAL](https://fal.ai/) | `FAL_KEY` | | Premium TTS voices | [ElevenLabs](https://elevenlabs.io/) | `ELEVENLABS_API_KEY` | -| OpenAI TTS voices | [OpenAI](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` | +| OpenAI TTS + voice transcription | [OpenAI](https://platform.openai.com/api-keys) | `VOICE_TOOLS_OPENAI_KEY` | | RL Training | [Tinker](https://tinker-console.thinkingmachines.ai/) + [WandB](https://wandb.ai/) | `TINKER_API_KEY`, `WANDB_API_KEY` | -| Voice transcription | [OpenAI](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` | | Slack integration | [Slack](https://api.slack.com/apps) | `SLACK_BOT_TOKEN`, `SLACK_APP_TOKEN` | | Messaging | Telegram, Discord | `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN` | @@ -1263,6 +1260,30 @@ hermes --- +### Manual Update + +If you installed manually (not via `hermes update`): + +```bash +cd /path/to/hermes-agent +export VIRTUAL_ENV="$(pwd)/venv" + +# Pull latest code and submodules +git pull origin main +git submodule update --init --recursive + +# Reinstall (picks up new dependencies) +uv pip install -e ".[all]" +uv pip install -e "./mini-swe-agent" +uv pip install -e "./tinker-atropos" + +# Check for new config options added since your last update +hermes config check +hermes config migrate # Interactively add any missing options +``` + +--- + ## Batch Processing Process multiple prompts in parallel with automatic checkpointing: @@ -1337,7 +1358,9 @@ All variables go in `~/.hermes/.env`. Run `hermes config set VAR value` to set t |----------|-------------| | `OPENROUTER_API_KEY` | OpenRouter API key (recommended for flexibility) | | `ANTHROPIC_API_KEY` | Direct Anthropic access | -| `OPENAI_API_KEY` | Direct OpenAI access | +| `OPENAI_API_KEY` | API key for custom OpenAI-compatible endpoints (used with `OPENAI_BASE_URL`) | +| `OPENAI_BASE_URL` | Base URL for custom endpoint (VLLM, SGLang, etc.) | +| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for TTS and voice transcription (separate from custom endpoint) | **Provider Auth (OAuth):** | Variable | Description | diff --git a/tools/transcription_tools.py b/tools/transcription_tools.py index e767eace8..7c4b5d36e 100644 --- a/tools/transcription_tools.py +++ b/tools/transcription_tools.py @@ -76,7 +76,7 @@ def transcribe_audio(file_path: str, model: Optional[str] = None) -> dict: try: from openai import OpenAI - client = OpenAI(api_key=api_key) + client = OpenAI(api_key=api_key, base_url="https://api.openai.com/v1") with open(file_path, "rb") as audio_file: transcription = client.audio.transcriptions.create( diff --git a/tools/tts_tool.py b/tools/tts_tool.py index da2f83a21..34f8dbcfc 100644 --- a/tools/tts_tool.py +++ b/tools/tts_tool.py @@ -224,7 +224,7 @@ def _generate_openai_tts(text: str, output_path: str, tts_config: Dict[str, Any] else: response_format = "mp3" - client = OpenAIClient(api_key=api_key) + client = OpenAIClient(api_key=api_key, base_url="https://api.openai.com/v1") response = client.audio.speech.create( model=model, voice=voice,