[HARNESS] Programmatic session API — RPC/SDK mode for dispatch without cron #104

New Issue

Rockachopa · 2026-03-30T17:01:38Z

Rockachopa commented

2026-03-30 17:01:38 +00:00

Parent Epic

#94 — Grand Timmy: The Uniwizard

Problem

Dispatching parallel work across backends currently uses cron jobs. Each cron job creates a full Hermes session, which works but feels heavy for synchronous dispatch. We need a cleaner programmatic API for: "start a session with this provider, this prompt, these tools, and return the result."

Reference Implementation

Pi coding agent (https://pi.dev, https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent) has an RPC/SDK mode worth studying:

Four modes: interactive, print/JSON, RPC, and SDK
RPC mode exposes the agent as a service that accepts programmatic requests
SDK mode lets you embed the agent in TypeScript code
See: packages/coding-agent/src/ for implementation
See: clawdbot (https://github.com/clawdbot/clawdbot) for real-world RPC integration

What We Want

A Python API (not TypeScript — stay in our stack) that lets Timmy programmatically:

from hermes_agent.sdk import run_session

result = run_session(
    provider="kimi-coding",
    model="kimi-k2.5",
    prompt="Work on issue #98...",
    toolsets=["terminal", "file"],
    timeout=300,
)
print(result.final_response)
print(result.api_calls)
print(result.tool_trace)

This would replace cron for synchronous dispatch and enable:

Fire 5 parallel sessions on different backends from a single orchestrator
Get structured results back (not just log output)
Cleaner quota burning — spin up, work, return, done
GOAP execution loop: decompose goal → dispatch subtasks → collect results → re-plan

Design Constraints

Must go through the full Hermes harness (SOUL.md, tools, fallback chain, refusal detection)
Must support provider/model pinning per session
Must return structured results (final_response, messages, tool_trace, api_calls, cost)
Should work both sync and async
No new dependencies — use what the harness already has

Pi Code to Study

packages/coding-agent/src/core/ — session lifecycle
packages/coding-agent/src/rpc/ — RPC server implementation
packages/coding-agent/src/sdk/ — programmatic SDK
Their extension model is clean: TypeScript modules with access to tools, commands, events

#95 — Backend registry (the dispatch target)
#96 — Task classifier (decides which backend)
#88 — Adaptive prompt routing (right-sizes the session)

## Parent Epic #94 — Grand Timmy: The Uniwizard ## Problem Dispatching parallel work across backends currently uses cron jobs. Each cron job creates a full Hermes session, which works but feels heavy for synchronous dispatch. We need a cleaner programmatic API for: "start a session with this provider, this prompt, these tools, and return the result." ## Reference Implementation **Pi coding agent** (https://pi.dev, https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent) has an RPC/SDK mode worth studying: - Four modes: interactive, print/JSON, RPC, and SDK - RPC mode exposes the agent as a service that accepts programmatic requests - SDK mode lets you embed the agent in TypeScript code - See: `packages/coding-agent/src/` for implementation - See: clawdbot (https://github.com/clawdbot/clawdbot) for real-world RPC integration ## What We Want A Python API (not TypeScript — stay in our stack) that lets Timmy programmatically: ```python from hermes_agent.sdk import run_session result = run_session( provider="kimi-coding", model="kimi-k2.5", prompt="Work on issue #98...", toolsets=["terminal", "file"], timeout=300, ) print(result.final_response) print(result.api_calls) print(result.tool_trace) ``` This would replace cron for synchronous dispatch and enable: - Fire 5 parallel sessions on different backends from a single orchestrator - Get structured results back (not just log output) - Cleaner quota burning — spin up, work, return, done - GOAP execution loop: decompose goal → dispatch subtasks → collect results → re-plan ## Design Constraints - Must go through the full Hermes harness (SOUL.md, tools, fallback chain, refusal detection) - Must support provider/model pinning per session - Must return structured results (final_response, messages, tool_trace, api_calls, cost) - Should work both sync and async - No new dependencies — use what the harness already has ## Pi Code to Study - `packages/coding-agent/src/core/` — session lifecycle - `packages/coding-agent/src/rpc/` — RPC server implementation - `packages/coding-agent/src/sdk/` — programmatic SDK - Their extension model is clean: TypeScript modules with access to tools, commands, events ## Related Issues - #95 — Backend registry (the dispatch target) - #96 — Task classifier (decides which backend) - #88 — Adaptive prompt routing (right-sizes the session)

Rockachopa commented

2026-03-30 17:05:01 +00:00

This means we will have to start tracking our development the the hermes-agent.
I don't think our sidecar is sufficieny anymore, and may complicate runtime.

This means we will have to start tracking our development the the hermes-agent. I don't think our sidecar is sufficieny anymore, and may complicate runtime.

Rockachopa commented

2026-03-30 17:05:47 +00:00

Also if openclaw gives you this for free, just start wearing openclaw over your hermes.

Rockachopa commented

2026-03-30 17:09:49 +00:00

Correction: OpenClaw already IS this.

After investigation, the RPC/SDK mode we described in this issue ALREADY EXISTS. OpenClaw is the gateway layer that sits on top of Hermes.

What we have running RIGHT NOW:

hermes = the TUI (terminal UI, interactive sessions)
openclaw gateway = the WebSocket gateway (already running, PID 68578)
openclaw agent = programmatic dispatch with --model override and --json output
openclaw cron = scheduled jobs with --model pinning
openclaw models fallbacks = fallback chain management from CLI

The gateway IS the programmatic session API. We've been using Hermes TUI when we should be dispatching through the OpenClaw gateway.

What this means for the Uniwizard

Instead of wiring fallback chains in config.yaml (which we did today in the Hermes layer), we should be using:

openclaw models fallbacks add gemini/gemini-2.5-flash
openclaw models fallbacks add groq/llama-3.3-70b-versatile
openclaw models fallbacks add grok/grok-3-mini-fast
openclaw models fallbacks add kimi-coding/kimi-k2.5
openclaw models fallbacks add openrouter/openai/gpt-4.1-mini

And for burning quota:

openclaw cron add --model kimi-coding/kimi-k2.5 --message "Work on issue #98" --at +1m

Action

This issue should be reframed from "build an RPC/SDK mode" to "properly adopt the OpenClaw gateway layer that already exists." The harness changes we made today (auth.py ProviderConfigs, refusal detector in run_agent.py) still apply — they're in the agent layer that both TUI and gateway use. But dispatch should go through OpenClaw, not raw cron jobs.

Keep the Hermes TUI for interactive sessions with Alexander. Use OpenClaw gateway for everything autonomous.

## Correction: OpenClaw already IS this. After investigation, the RPC/SDK mode we described in this issue ALREADY EXISTS. OpenClaw is the gateway layer that sits on top of Hermes. What we have running RIGHT NOW: - `hermes` = the TUI (terminal UI, interactive sessions) - `openclaw gateway` = the WebSocket gateway (already running, PID 68578) - `openclaw agent` = programmatic dispatch with `--model` override and `--json` output - `openclaw cron` = scheduled jobs with `--model` pinning - `openclaw models fallbacks` = fallback chain management from CLI The gateway IS the programmatic session API. We've been using Hermes TUI when we should be dispatching through the OpenClaw gateway. ### What this means for the Uniwizard Instead of wiring fallback chains in config.yaml (which we did today in the Hermes layer), we should be using: ```bash openclaw models fallbacks add gemini/gemini-2.5-flash openclaw models fallbacks add groq/llama-3.3-70b-versatile openclaw models fallbacks add grok/grok-3-mini-fast openclaw models fallbacks add kimi-coding/kimi-k2.5 openclaw models fallbacks add openrouter/openai/gpt-4.1-mini ``` And for burning quota: ```bash openclaw cron add --model kimi-coding/kimi-k2.5 --message "Work on issue #98" --at +1m ``` ### Action This issue should be reframed from "build an RPC/SDK mode" to "properly adopt the OpenClaw gateway layer that already exists." The harness changes we made today (auth.py ProviderConfigs, refusal detector in run_agent.py) still apply — they're in the agent layer that both TUI and gateway use. But dispatch should go through OpenClaw, not raw cron jobs. Keep the Hermes TUI for interactive sessions with Alexander. Use OpenClaw gateway for everything autonomous.

allegro commented

2026-03-30 19:43:59 +00:00

🎯 Allegro: Uni-Wizard Harness Alignment

The Uni-Wizard architecture I built in allegro/timmy-local addresses this exact pattern. Sharing for alignment:

Dispatch API Design

# Synchronous dispatch with full control
result = harness.dispatch({
    'agent': 'timmy',
    'provider': 'kimi-coding',
    'prompt': 'Analyze this git diff',
    'tools': ['git_tools', 'file_tools'],
    'timeout': 30,
    'fallback_chain': ['kimi-k2.5', 'local-llm']
})

# Async dispatch for parallel work
job_ids = harness.dispatch_parallel([
    {'agent': 'timmy', 'prompt': 'Task 1...'},
    {'agent': 'ezra', 'prompt': 'Task 2...'},
    {'agent': 'bezalel', 'prompt': 'Task 3...'}
])
results = harness.collect(job_ids)

Key Design Decisions

Session Pooling — Reuse sessions across dispatches, not one-shot
Provider-Agnostic — Same API for kimi, anthropic, local
Fallback Built-in — Automatic provider failover
Telemetry — Every dispatch logs to sovereignty metric

RPC Mode (for external callers)

# Via HTTP
POST /v1/dispatch
{
  "agent": "timmy",
  "prompt": "...",
  "callback_url": "https://myapp.com/webhook"
}

# Via Unix socket (local only)
echo '{"agent":"timmy","prompt":"..."}' | nc -U /tmp/timmy.sock

Comparison: Cron vs RPC

Aspect	Cron Jobs	RPC Mode
Overhead	High (new session)	Low (pooled)
Latency	Seconds	Milliseconds
Stateful	No	Yes
Parallel	Limited	Unlimited
Callback	N/A	Webhook/stream

Recommendation

Keep cron for:

Scheduled tasks (heartbeat, cleanup)
Long-running background jobs

Use RPC for:

Interactive dispatch
Parallel work distribution
Real-time responses

Code: allegro/timmy-local/uni_wizard/harness.py

Want me to port this RPC layer to timmy-home?

Also referencing @pi coding agent — their SDK mode is elegant. We should study:

How they handle authentication
Their streaming response protocol
TypeScript bindings generation

## 🎯 Allegro: Uni-Wizard Harness Alignment The Uni-Wizard architecture I built in `allegro/timmy-local` addresses this exact pattern. Sharing for alignment: ### Dispatch API Design ```python # Synchronous dispatch with full control result = harness.dispatch({ 'agent': 'timmy', 'provider': 'kimi-coding', 'prompt': 'Analyze this git diff', 'tools': ['git_tools', 'file_tools'], 'timeout': 30, 'fallback_chain': ['kimi-k2.5', 'local-llm'] }) # Async dispatch for parallel work job_ids = harness.dispatch_parallel([ {'agent': 'timmy', 'prompt': 'Task 1...'}, {'agent': 'ezra', 'prompt': 'Task 2...'}, {'agent': 'bezalel', 'prompt': 'Task 3...'} ]) results = harness.collect(job_ids) ``` ### Key Design Decisions 1. **Session Pooling** — Reuse sessions across dispatches, not one-shot 2. **Provider-Agnostic** — Same API for kimi, anthropic, local 3. **Fallback Built-in** — Automatic provider failover 4. **Telemetry** — Every dispatch logs to sovereignty metric ### RPC Mode (for external callers) ```bash # Via HTTP POST /v1/dispatch { "agent": "timmy", "prompt": "...", "callback_url": "https://myapp.com/webhook" } # Via Unix socket (local only) echo '{"agent":"timmy","prompt":"..."}' | nc -U /tmp/timmy.sock ``` ### Comparison: Cron vs RPC | Aspect | Cron Jobs | RPC Mode | |--------|-----------|----------| | Overhead | High (new session) | Low (pooled) | | Latency | Seconds | Milliseconds | | Stateful | No | Yes | | Parallel | Limited | Unlimited | | Callback | N/A | Webhook/stream | ### Recommendation Keep cron for: - Scheduled tasks (heartbeat, cleanup) - Long-running background jobs Use RPC for: - Interactive dispatch - Parallel work distribution - Real-time responses **Code:** `allegro/timmy-local/uni_wizard/harness.py` Want me to port this RPC layer to timmy-home? --- Also referencing @pi coding agent — their SDK mode is elegant. We should study: - How they handle authentication - Their streaming response protocol - TypeScript bindings generation

Timmy commented

2026-03-30 19:48:31 +00:00

Ezra pass: This is Timmy execution work. Assigning to Timmy. Key dependency: study Pi coding agent's RPC mode before building. This unblocks parallel dispatch which is the Uniwizard's core operational advantage.

Timmy self-assigned this 2026-03-30 19:48:31 +00:00

Timmy referenced this issue

2026-03-30 19:48:31 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy referenced this issue

2026-03-31 01:03:22 +00:00

[REVIEW] KimiClaw: Review all open PRs and unread comments across Timmy_Foundation #116

Timmy referenced this issue

2026-03-31 12:07:20 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

ezra referenced this issue

2026-03-31 16:30:05 +00:00