feat(skills): add optional telephony skill with Twilio, SMS, and AI calls (#1289)

* feat: improve context compaction handoff summaries

Adapt PR #916 onto current main by replacing the old context summary marker
with a clearer handoff wrapper, updating the summarization prompt for
resume-oriented summaries, and preserving the current call_llm-based
compression path.

* fix: clearer error when docker backend is unavailable

* fix: preserve docker discovery in backend preflight

Follow up on salvaged PR #940 by reusing find_docker() during the new
availability check so non-PATH Docker Desktop installs still work. Add
a regression test covering the resolved executable path.

* test: make gateway async tests xdist-safe

Replace sync test usage of asyncio.get_event_loop().run_until_complete()
with asyncio.run() so tests do not depend on an ambient current event loop.
Also create the email disconnect poll task inside a running loop. This fixes
xdist/CI failures where workers have no current loop in MainThread.

* feat(skills): add phone-calls skill for outbound AI voice calls

Reformulated from core tool (PR #847 feedback) into a skill with a
standalone helper script. No new dependencies — uses only Python stdlib.

Two providers supported:
- Bland.ai (default): simple setup, one API key
- Vapi: flexible, better voice quality via ElevenLabs/Deepgram + Twilio

Includes:
- SKILL.md with full procedure, safety rules, provider docs, pitfalls
- scripts/phone_call.py CLI helper (call, status, diagnose commands)

* feat(skills): expand phone-calls into optional telephony skill

Follow up on salvaged PR #965 by moving the capability into optional-skills
and broadening it from outbound AI calling to a full telephony skill. Add
Twilio number provisioning, env/state persistence, SMS/MMS, inbound SMS
polling, Vapi import helpers, and a provider decision tree while keeping
telephony out of core runtime code.

* docs(skills): clarify Hermes TTS telephony workflow

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
Co-authored-by: mormio <morganemoss@gmai.com>
This commit is contained in:
Teknium
2026-03-14 04:16:48 -07:00
committed by GitHub
parent 02752c83b4
commit 1a857123b3
3 changed files with 1989 additions and 0 deletions

View File

@@ -0,0 +1,417 @@
---
name: telephony
description: Give Hermes phone capabilities without core tool changes. Provision and persist a Twilio number, send and receive SMS/MMS, make direct calls, and place AI-driven outbound calls through Bland.ai or Vapi.
version: 1.0.0
author: Nous Research
license: MIT
metadata:
hermes:
tags: [telephony, phone, sms, mms, voice, twilio, bland.ai, vapi, calling, texting]
related_skills: [find-nearby, google-workspace, agentmail]
category: productivity
---
# Telephony — Numbers, Calls, and Texts without Core Tool Changes
This optional skill gives Hermes practical phone capabilities while keeping telephony out of the core tool list.
It ships with a helper script, `scripts/telephony.py`, that can:
- save provider credentials into `~/.hermes/.env`
- search for and buy a Twilio phone number
- remember that owned number for later sessions
- send SMS / MMS from the owned number
- poll inbound SMS for that number with no webhook server required
- make direct Twilio calls using TwiML `<Say>` or `<Play>`
- import the owned Twilio number into Vapi
- place outbound AI calls through Bland.ai or Vapi
## What this solves
This skill is meant to cover the practical phone tasks users actually want:
- outbound calls
- texting
- owning a reusable agent number
- checking messages that arrive to that number later
- preserving that number and related IDs between sessions
- future-friendly telephony identity for inbound SMS polling and other automations
It does **not** turn Hermes into a real-time inbound phone gateway. Inbound SMS is handled by polling the Twilio REST API. That is enough for many workflows, including notifications and some one-time-code retrieval, without adding core webhook infrastructure.
## Safety rules — mandatory
1. Always confirm before placing a call or sending a text.
2. Never dial emergency numbers.
3. Never use telephony for harassment, spam, impersonation, or anything illegal.
4. Treat third-party phone numbers as sensitive operational data:
- do not save them to Hermes memory
- do not include them in skill docs, summaries, or follow-up notes unless the user explicitly wants that
5. It is fine to persist the **agent-owned Twilio number** because that is part of the user's configuration.
6. VoIP numbers are **not guaranteed** to work for all third-party 2FA flows. Use with caution and set user expectations clearly.
## Decision tree — which service to use?
Use this logic instead of hardcoded provider routing:
### 1) "I want Hermes to own a real phone number"
Use **Twilio**.
Why:
- easiest path to buying and keeping a number
- best SMS / MMS support
- simplest inbound SMS polling story
- cleanest future path to inbound webhooks or call handling
Use cases:
- receive texts later
- send deployment alerts / cron notifications
- maintain a reusable phone identity for the agent
- experiment with phone-based auth flows later
### 2) "I only need the easiest outbound AI phone call right now"
Use **Bland.ai**.
Why:
- quickest setup
- one API key
- no need to first buy/import a number yourself
Tradeoff:
- less flexible
- voice quality is decent, but not the best
### 3) "I want the best conversational AI voice quality"
Use **Twilio + Vapi**.
Why:
- Twilio gives you the owned number
- Vapi gives you better conversational AI call quality and more voice/model flexibility
Recommended flow:
1. Buy/save a Twilio number
2. Import it into Vapi
3. Save the returned `VAPI_PHONE_NUMBER_ID`
4. Use `ai-call --provider vapi`
### 4) "I want to call with a custom prerecorded voice message"
Use **Twilio direct call** with a public audio URL.
Why:
- easiest way to play a custom MP3
- pairs well with Hermes `text_to_speech` plus a public file host or tunnel
## Files and persistent state
The skill persists telephony state in two places:
### `~/.hermes/.env`
Used for long-lived provider credentials and owned-number IDs, for example:
- `TWILIO_ACCOUNT_SID`
- `TWILIO_AUTH_TOKEN`
- `TWILIO_PHONE_NUMBER`
- `TWILIO_PHONE_NUMBER_SID`
- `BLAND_API_KEY`
- `VAPI_API_KEY`
- `VAPI_PHONE_NUMBER_ID`
- `PHONE_PROVIDER` (AI call provider: bland or vapi)
### `~/.hermes/telephony_state.json`
Used for skill-only state that should survive across sessions, for example:
- remembered default Twilio number / SID
- remembered Vapi phone number ID
- last inbound message SID/date for inbox polling checkpoints
This means:
- the next time the skill is loaded, `diagnose` can tell you what number is already configured
- `twilio-inbox --since-last --mark-seen` can continue from the previous checkpoint
## Locate the helper script
After installing this skill, locate the script like this:
```bash
SCRIPT="$(find ~/.hermes/skills -path '*/telephony/scripts/telephony.py' -print -quit)"
```
If `SCRIPT` is empty, the skill is not installed yet.
## Install
This is an official optional skill, so install it from the Skills Hub:
```bash
hermes skills search telephony
hermes skills install official/productivity/telephony
```
## Provider setup
### Twilio — owned number, SMS/MMS, direct calls, inbound SMS polling
Sign up at:
- https://www.twilio.com/try-twilio
Then save credentials into Hermes:
```bash
python3 "$SCRIPT" save-twilio ACXXXXXXXXXXXXXXXXXXXXXXXXXXXX your_auth_token_here
```
Search for available numbers:
```bash
python3 "$SCRIPT" twilio-search --country US --area-code 702 --limit 5
```
Buy and remember a number:
```bash
python3 "$SCRIPT" twilio-buy "+17025551234" --save-env
```
List owned numbers:
```bash
python3 "$SCRIPT" twilio-owned
```
Set one of them as the default later:
```bash
python3 "$SCRIPT" twilio-set-default "+17025551234" --save-env
# or
python3 "$SCRIPT" twilio-set-default PNXXXXXXXXXXXXXXXXXXXXXXXXXXXX --save-env
```
### Bland.ai — easiest outbound AI calling
Sign up at:
- https://app.bland.ai
Save config:
```bash
python3 "$SCRIPT" save-bland your_bland_api_key --voice mason
```
### Vapi — better conversational voice quality
Sign up at:
- https://dashboard.vapi.ai
Save the API key first:
```bash
python3 "$SCRIPT" save-vapi your_vapi_api_key
```
Import your owned Twilio number into Vapi and persist the returned phone number ID:
```bash
python3 "$SCRIPT" vapi-import-twilio --save-env
```
If you already know the Vapi phone number ID, save it directly:
```bash
python3 "$SCRIPT" save-vapi your_vapi_api_key --phone-number-id vapi_phone_number_id_here
```
## Diagnose current state
At any time, inspect what the skill already knows:
```bash
python3 "$SCRIPT" diagnose
```
Use this first when resuming work in a later session.
## Common workflows
### A. Buy an agent number and keep using it later
1. Save Twilio credentials:
```bash
python3 "$SCRIPT" save-twilio AC... auth_token_here
```
2. Search for a number:
```bash
python3 "$SCRIPT" twilio-search --country US --area-code 702 --limit 10
```
3. Buy it and save it into `~/.hermes/.env` + state:
```bash
python3 "$SCRIPT" twilio-buy "+17025551234" --save-env
```
4. Next session, run:
```bash
python3 "$SCRIPT" diagnose
```
This shows the remembered default number and inbox checkpoint state.
### B. Send a text from the agent number
```bash
python3 "$SCRIPT" twilio-send-sms "+15551230000" "Your deployment completed successfully."
```
With media:
```bash
python3 "$SCRIPT" twilio-send-sms "+15551230000" "Here is the chart." --media-url "https://example.com/chart.png"
```
### C. Check inbound texts later with no webhook server
Poll the inbox for the default Twilio number:
```bash
python3 "$SCRIPT" twilio-inbox --limit 20
```
Only show messages that arrived after the last checkpoint, and advance the checkpoint when you're done reading:
```bash
python3 "$SCRIPT" twilio-inbox --since-last --mark-seen
```
This is the main answer to “how do I access messages the number receives next time the skill is loaded?”
### D. Make a direct Twilio call with built-in TTS
```bash
python3 "$SCRIPT" twilio-call "+15551230000" --message "Hello! This is Hermes calling with your status update." --voice Polly.Joanna
```
### E. Call with a prerecorded / custom voice message
This is the main path for reusing Hermes's existing `text_to_speech` support.
Use this when:
- you want the call to use Hermes's configured TTS voice rather than Twilio `<Say>`
- you want a one-way voice delivery (briefing, alert, joke, reminder, status update)
- you do **not** need a live conversational phone call
Generate or host audio separately, then:
```bash
python3 "$SCRIPT" twilio-call "+155****0000" --audio-url "https://example.com/briefing.mp3"
```
Recommended Hermes TTS -> Twilio Play workflow:
1. Generate the audio with Hermes `text_to_speech`.
2. Make the resulting MP3 publicly reachable.
3. Place the Twilio call with `--audio-url`.
Example agent flow:
- Ask Hermes to create the message audio with `text_to_speech`
- If needed, expose the file with a temporary static host / tunnel / object storage URL
- Use `twilio-call --audio-url ...` to deliver it by phone
Good hosting options for the MP3:
- a temporary public object/storage URL
- a short-lived tunnel to a local static file server
- any existing HTTPS URL the phone provider can fetch directly
Important note:
- Hermes TTS is great for prerecorded outbound messages
- Bland/Vapi are better for **live conversational AI calls** because they handle the real-time telephony audio stack themselves
- Hermes STT/TTS alone is not being used here as a full duplex phone conversation engine; that would require a much heavier streaming/webhook integration than this skill is trying to introduce
### F. Navigate a phone tree / IVR with Twilio direct calling
If you need to press digits after the call connects, use `--send-digits`.
Twilio interprets `w` as a short wait.
```bash
python3 "$SCRIPT" twilio-call "+18005551234" --message "Connecting to billing now." --send-digits "ww1w2w3"
```
This is useful for reaching a specific menu branch before handing off to a human or delivering a short status message.
### G. Outbound AI phone call with Bland.ai
```bash
python3 "$SCRIPT" ai-call "+15551230000" "Call the dental office, ask for a cleaning appointment on Tuesday afternoon, and if they do not have Tuesday availability, ask for Wednesday or Thursday instead." --provider bland --voice mason --max-duration 3
```
Check status:
```bash
python3 "$SCRIPT" ai-status <call_id> --provider bland
```
Ask Bland analysis questions after completion:
```bash
python3 "$SCRIPT" ai-status <call_id> --provider bland --analyze "Was the appointment confirmed?,What date and time?,Any special instructions?"
```
### H. Outbound AI phone call with Vapi on your owned number
1. Import your Twilio number into Vapi:
```bash
python3 "$SCRIPT" vapi-import-twilio --save-env
```
2. Place the call:
```bash
python3 "$SCRIPT" ai-call "+15551230000" "You are calling to make a dinner reservation for two at 7:30 PM. If that is unavailable, ask for the nearest time between 6:30 and 8:30 PM." --provider vapi --max-duration 4
```
3. Check result:
```bash
python3 "$SCRIPT" ai-status <call_id> --provider vapi
```
## Suggested agent procedure
When the user asks for a call or text:
1. Determine which path fits the request via the decision tree.
2. Run `diagnose` if configuration state is unclear.
3. Gather the full task details.
4. Confirm with the user before dialing or texting.
5. Use the correct command.
6. Poll for results if needed.
7. Summarize the outcome without persisting third-party numbers to Hermes memory.
## What this skill still does not do
- real-time inbound call answering
- webhook-based live SMS push into the agent loop
- guaranteed support for arbitrary third-party 2FA providers
Those would require more infrastructure than a pure optional skill.
## Pitfalls
- Twilio trial accounts and regional rules can restrict who you can call/text.
- Some services reject VoIP numbers for 2FA.
- `twilio-inbox` polls the REST API; it is not instant push delivery.
- Vapi outbound calling still depends on having a valid imported number.
- Bland is easiest, but not always the best-sounding.
- Do not store arbitrary third-party phone numbers in Hermes memory.
## Verification checklist
After setup, you should be able to do all of the following with just this skill:
1. `diagnose` shows provider readiness and remembered state
2. search and buy a Twilio number
3. persist that number to `~/.hermes/.env`
4. send an SMS from the owned number
5. poll inbound texts for the owned number later
6. place a direct Twilio call
7. place an AI call via Bland or Vapi
## References
- Twilio phone numbers: https://www.twilio.com/docs/phone-numbers/api
- Twilio messaging: https://www.twilio.com/docs/messaging/api/message-resource
- Twilio voice: https://www.twilio.com/docs/voice/api/call-resource
- Vapi docs: https://docs.vapi.ai/
- Bland.ai: https://app.bland.ai/

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,229 @@
from __future__ import annotations
import importlib.util
import json
import os
import sys
from pathlib import Path
SCRIPT_PATH = (
Path(__file__).resolve().parents[2]
/ "optional-skills"
/ "productivity"
/ "telephony"
/ "scripts"
/ "telephony.py"
)
def load_module():
spec = importlib.util.spec_from_file_location("telephony_skill", SCRIPT_PATH)
module = importlib.util.module_from_spec(spec)
assert spec.loader is not None
sys.modules[spec.name] = module
spec.loader.exec_module(module)
return module
def test_save_twilio_writes_env_and_state(tmp_path: Path, monkeypatch):
mod = load_module()
monkeypatch.setenv("HERMES_HOME", str(tmp_path / ".hermes"))
result = mod.save_twilio(
"AC123",
"secret-token",
phone_number="+1 (702) 555-1234",
phone_sid="PN123",
)
env_text = (tmp_path / ".hermes" / ".env").read_text(encoding="utf-8")
state = json.loads((tmp_path / ".hermes" / "telephony_state.json").read_text(encoding="utf-8"))
assert result["success"] is True
assert "TWILIO_ACCOUNT_SID=AC123" in env_text
assert "TWILIO_AUTH_TOKEN=secret-token" in env_text
assert "TWILIO_PHONE_NUMBER=+17025551234" in env_text
assert "TWILIO_PHONE_NUMBER_SID=PN123" in env_text
assert state["twilio"]["default_phone_number"] == "+17025551234"
assert state["twilio"]["default_phone_sid"] == "PN123"
def test_upsert_env_updates_existing_values(tmp_path: Path):
mod = load_module()
env_path = tmp_path / ".env"
env_path.write_text("TWILIO_PHONE_NUMBER=+15550000000\nOTHER=keep\n", encoding="utf-8")
mod._upsert_env_file(
{
"TWILIO_PHONE_NUMBER": "+15551112222",
"TWILIO_PHONE_NUMBER_SID": "PN999",
},
env_path=env_path,
)
env_text = env_path.read_text(encoding="utf-8")
assert "TWILIO_PHONE_NUMBER=+15551112222" in env_text
assert "TWILIO_PHONE_NUMBER_SID=PN999" in env_text
assert "OTHER=keep" in env_text
def test_messages_after_checkpoint_returns_only_newer_items():
mod = load_module()
messages = [
{"sid": "SM3", "body": "newest"},
{"sid": "SM2", "body": "middle"},
{"sid": "SM1", "body": "oldest"},
]
assert mod._messages_after_checkpoint(messages, "") == messages
assert mod._messages_after_checkpoint(messages, "SM2") == [{"sid": "SM3", "body": "newest"}]
assert mod._messages_after_checkpoint(messages, "SM3") == []
def test_twilio_buy_number_saves_env_and_state(tmp_path: Path):
mod = load_module()
state_path = tmp_path / "telephony_state.json"
env_path = tmp_path / ".env"
mod._twilio_request = lambda method, path, params=None, form=None: {
"sid": "PN111",
"phone_number": "+17025550123",
"friendly_name": "Test Number",
"capabilities": {"voice": True, "sms": True},
}
result = mod._twilio_buy_number(
"+17025550123",
save_env=True,
state_path=state_path,
env_path=env_path,
)
state = json.loads(state_path.read_text(encoding="utf-8"))
env_text = env_path.read_text(encoding="utf-8")
assert result["phone_sid"] == "PN111"
assert state["twilio"]["default_phone_number"] == "+17025550123"
assert state["twilio"]["default_phone_sid"] == "PN111"
assert "TWILIO_PHONE_NUMBER=+17025550123" in env_text
assert "TWILIO_PHONE_NUMBER_SID=PN111" in env_text
def test_twilio_inbox_marks_seen_checkpoint(tmp_path: Path):
mod = load_module()
state_path = tmp_path / "telephony_state.json"
mod._save_state(
{
"version": 1,
"twilio": {
"default_phone_number": "+17025550123",
"default_phone_sid": "PN111",
"last_inbound_message_sid": "SM1",
},
},
state_path,
)
mod._twilio_owned_numbers = lambda limit=50: [
mod.OwnedTwilioNumber(
sid="PN111",
phone_number="+17025550123",
friendly_name="Main",
capabilities={"voice": True, "sms": True},
)
]
mod._twilio_request = lambda method, path, params=None, form=None: {
"messages": [
{
"sid": "SM3",
"direction": "inbound",
"status": "received",
"from": "+15551230000",
"to": "+17025550123",
"date_sent": "Tue, 14 Mar 2026 09:00:00 +0000",
"body": "new message",
"num_media": "0",
},
{
"sid": "SM1",
"direction": "inbound",
"status": "received",
"from": "+15551110000",
"to": "+17025550123",
"date_sent": "Tue, 14 Mar 2026 08:00:00 +0000",
"body": "old message",
"num_media": "0",
},
]
}
result = mod._twilio_inbox(limit=10, since_last=True, mark_seen=True, state_path=state_path)
state = json.loads(state_path.read_text(encoding="utf-8"))
assert result["count"] == 1
assert result["messages"][0]["sid"] == "SM3"
assert state["twilio"]["last_inbound_message_sid"] == "SM3"
def test_vapi_import_twilio_number_saves_phone_number_id(tmp_path: Path):
mod = load_module()
state_path = tmp_path / "telephony_state.json"
env_path = tmp_path / ".env"
mod._vapi_api_key = lambda: "vapi-key"
mod._twilio_creds = lambda: ("AC123", "token123")
mod._resolve_twilio_number = lambda identifier=None: mod.OwnedTwilioNumber(
sid="PN111",
phone_number="+17025550123",
friendly_name="Main",
capabilities={"voice": True, "sms": True},
)
mod._json_request = lambda method, url, headers=None, params=None, form=None, json_body=None: {
"id": "vapi-phone-xyz"
}
result = mod._vapi_import_twilio_number(
save_env=True,
state_path=state_path,
env_path=env_path,
)
state = json.loads(state_path.read_text(encoding="utf-8"))
env_text = env_path.read_text(encoding="utf-8")
assert result["phone_number_id"] == "vapi-phone-xyz"
assert state["vapi"]["phone_number_id"] == "vapi-phone-xyz"
assert "VAPI_PHONE_NUMBER_ID=vapi-phone-xyz" in env_text
def test_diagnose_includes_decision_tree_and_saved_state(tmp_path: Path, monkeypatch):
mod = load_module()
hermes_home = tmp_path / ".hermes"
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
mod._save_state(
{
"version": 1,
"twilio": {
"default_phone_number": "+17025550123",
"last_inbound_message_sid": "SM123",
},
"vapi": {
"phone_number_id": "vapi-abc",
},
},
hermes_home / "telephony_state.json",
)
(hermes_home / ".env").parent.mkdir(parents=True, exist_ok=True)
(hermes_home / ".env").write_text(
"TWILIO_ACCOUNT_SID=AC123\nTWILIO_AUTH_TOKEN=token\nBLAND_API_KEY=bland\n",
encoding="utf-8",
)
result = mod.diagnose()
assert result["providers"]["twilio"]["default_phone_number"] == "+17025550123"
assert result["providers"]["twilio"]["last_inbound_message_sid"] == "SM123"
assert result["providers"]["bland"]["configured"] is True
assert result["providers"]["vapi"]["phone_number_id"] == "vapi-abc"
assert any(item["use"] == "Twilio" for item in result["decision_tree"])