Move vision items to GitHub issues (#314, #315)

Voice Mode → #314 Dogfood Skill → #315 The VISION.md doc is removed in favor of detailed, trackable GitHub issues. Issues are assignable, discussable, and linkable to PRs.
2026-03-03 01:26:05 -08:00
parent 535b46f813
commit f084538cb9
1 changed files with 0 additions and 75 deletions
--- a/VISION.md
+++ b/VISION.md
@@ -1,75 +0,0 @@
-# Hermes Agent — Vision Board & Roadmap
-
-A living brainstorming doc for features, ideas, and strategic direction.
-Last updated: March 2, 2026
-
---
-
-## Voice Mode
-
-**Inspiration:** Claude Code's /voice rollout (March 2026) — lets users talk
-to the coding agent instead of typing, toggled with a slash command.
-
-### CLI UX (primary target)
-
-The voice mode lives inside the existing CLI terminal experience:
-
-1. **Activation:** User types `/voice` in the Hermes CLI to toggle voice on/off
-2. **Status indicator:** A persistent banner appears at the top of the prompt
-   area: `Voice mode enabled — hold Space to speak`
-3. **Push-to-talk:** User holds the Space bar to record. Releasing sends the
-   audio for transcription. The input prompt placeholder changes to guide:
-   `> hold space bar to speak`
-4. **Transcription:** Speech is transcribed to text and submitted as a normal
-   user message — the agent processes it identically to typed input
-5. **Agent response:** Text response streams to the terminal as usual.
-   Optionally, TTS can read the response aloud (we already have
-   text_to_speech). Could be a `/voice tts` sub-toggle.
-6. **Deactivation:** `/voice` again to toggle off, returns to normal typing
-
-**Implementation notes:**
- Push-to-talk needs raw terminal/keyboard input (prompt_toolkit has key
-  binding support — we already use it for the CLI input)
- Audio capture via PyAudio or sounddevice, stream to STT provider
- Visual feedback while recording: waveform animation or pulsing indicator
-  in the terminal (could use rich/textual for this)
- Space bar hold must NOT conflict with normal typing when voice is off
-
-### Gateway Platforms
-
- **Telegram:** Already receives voice messages natively — transcribe them
-  automatically with STT and process as text. Users already send voice
-  notes; we just need to handle the audio file.
- **Discord:** Similar — voice messages come as attachments, transcribe and
-  process
- **WhatsApp:** Voice notes are a primary interaction mode, same approach
-
-### Ideas
-
- Agent can already do TTS output (text_to_speech tool exists) — pair with
-  voice input for a full conversational loop
- Latency matters — voice conversations feel bad above ~2s response time
- Could adjust system prompt in voice mode to be more concise/conversational
- Audio cues for tool call confirmations, errors, completion
- Streaming STT (transcribe while user is still speaking) for lower latency
-
-### Open Questions
-
- Which STT provider? (Whisper local, Deepgram, AssemblyAI, etc.)
-  - Local Whisper = no API dependency but needs GPU for speed
-  - Deepgram/AssemblyAI = fast streaming, but adds a service dependency
- Should voice mode change the system prompt to be more conversational/concise?
- How to handle tool call confirmations in voice — audio cues?
- Do we want full duplex (agent can interrupt/be interrupted) or half-duplex?
-
---
-
-## Ideas Backlog
-
-*(New ideas get added here, then organized into sections as they mature)*
-
---
-
-## Shipped
-
-*(Track completed vision items here for posterity)*