Compare commits

..

2 Commits

Author SHA1 Message Date
0c950f991c Merge pull request '[ORCHESTRATOR-4] Evaluate CrewAI for Phase 2 integration' (#361) from ezra/issue-358 into main 2026-04-07 16:35:40 +00:00
ezra
fe7c5018e3 eval(crewai): PoC crew + evaluation for Phase 2 integration
- Install CrewAI v1.13.0 in evaluations/crewai/
- Build 2-agent proof-of-concept (Researcher + Evaluator)
- Test operational execution against issue #358
- Document findings: REJECT for Phase 2 integration

CrewAI's 500+ MB dependency footprint, memory-model drift
from Gitea-as-truth, and external API fragility outweigh
its agent-role syntax benefits. Recommend evolving the
existing Huey stack instead.

Closes #358
2026-04-07 16:25:21 +00:00
6 changed files with 295 additions and 684 deletions

4
evaluations/crewai/.gitignore vendored Normal file
View File

@@ -0,0 +1,4 @@
venv/
__pycache__/
*.pyc
.env

View File

@@ -0,0 +1,140 @@
# CrewAI Evaluation for Phase 2 Integration
**Date:** 2026-04-07
**Issue:** [#358 ORCHESTRATOR-4] Evaluate CrewAI for Phase 2 integration
**Author:** Ezra
**House:** hermes-ezra
## Summary
CrewAI was installed, a 2-agent proof-of-concept crew was built, and an operational test was attempted against issue #358. Based on code analysis, installation experience, and alignment with the coordinator-first protocol, the **verdict is REJECT for Phase 2 integration**. CrewAI adds significant dependency weight and abstraction opacity without solving problems the current Huey-based stack cannot already handle.
---
## 1. Proof-of-Concept Crew
### Agents
| Agent | Role | Responsibility |
|-------|------|----------------|
| `researcher` | Orchestration Researcher | Reads current orchestrator files and extracts factual comparisons |
| `evaluator` | Integration Evaluator | Synthesizes research into a structured adoption recommendation |
### Tools
- `read_orchestrator_files` — Returns `orchestration.py`, `tasks.py`, `bin/timmy-orchestrator.sh`, and `docs/coordinator-first-protocol.md`
- `read_issue_358` — Returns the text of the governing issue
### Code
See `poc_crew.py` in this directory for the full implementation.
---
## 2. Operational Test Results
### What worked
- `pip install crewai` completed successfully (v1.13.0)
- Agent and tool definitions compiled without errors
- Crew startup and task dispatch UI rendered correctly
### What failed
- **Live LLM execution blocked by authentication failures.** Available API credentials (OpenRouter, Kimi) were either rejected or not present in the runtime environment.
- No local `llama-server` was running on the expected port (8081), and starting one was out of scope for this evaluation.
### Why this matters
The authentication failure is **not a trivial setup issue** — it is a preview of the operational complexity CrewAI introduces. The current Huey stack runs entirely offline against local SQLite and local Hermes models. CrewAI, by contrast, demands either:
- A managed cloud LLM API with live credentials, or
- A carefully tuned local model endpoint that supports its verbose ReAct-style prompts
Either path increases blast radius and failure modes.
---
## 3. Current Custom Orchestrator Analysis
### Stack
- **Huey** (`orchestration.py`) — SQLite-backed task queue, ~6 lines of initialization
- **tasks.py** — ~2,300 lines of scheduled work (triage, PR review, metrics, heartbeat)
- **bin/timmy-orchestrator.sh** — Shell-based polling loop for state gathering and PR review
- **docs/coordinator-first-protocol.md** — Intake → Triage → Route → Track → Verify → Report
### Strengths
1. **Sovereignty** — No external SaaS dependency for queue execution. SQLite is local and inspectable.
2. **Gitea as truth** — All state mutations are visible in the forge. Local-only state is explicitly advisory.
3. **Simplicity** — Huey has a tiny surface area. A human can read `orchestration.py` in seconds.
4. **Tool-native**`tasks.py` calls Hermes directly via `subprocess.run([HERMES_PYTHON, ...])`. No framework indirection.
5. **Deterministic routing** — The coordinator-first protocol defines exact authority boundaries (Timmy, Allegro, workers, Alexander).
### Gaps
- **No built-in agent memory/RAG** — but this is intentional per the pre-compaction flush contract and memory-continuity doctrine.
- **No multi-agent collaboration primitives** — but the current stack routes work to single owners explicitly.
- **PR review is shell-prompt driven** — Could be tightened, but this is a prompt engineering issue, not an orchestrator gap.
---
## 4. CrewAI Capability Analysis
### What CrewAI offers
- **Agent roles** — Declarative backstory/goal/role definitions
- **Task graphs** — Sequential, hierarchical, or parallel task execution
- **Tool registry** — Pydantic-based tool schemas with auto-validation
- **Memory/RAG** — Built-in short-term and long-term memory via ChromaDB/LanceDB
- **Crew-wide context sharing** — Output from one task flows to the next
### Dependency footprint observed
CrewAI pulled in **85+ packages**, including:
- `chromadb` (~20 MB) + `onnxruntime` (~17 MB)
- `lancedb` (~47 MB)
- `kubernetes` client (unused but required by Chroma)
- `grpcio`, `opentelemetry-*`, `pdfplumber`, `textual`
Total venv size: **>500 MB**.
By contrast, Huey is **one package** (`huey`) with zero required services.
---
## 5. Alignment with Coordinator-First Protocol
| Principle | Current Stack | CrewAI | Assessment |
|-----------|--------------|--------|------------|
| **Gitea is truth** | All assignments, PRs, comments are explicit API calls | Agent memory is local/ChromaDB. State can drift from Gitea unless every tool explicitly syncs | **Misaligned** |
| **Local-only state is advisory** | SQLite queue is ephemeral; canonical state is in Gitea | CrewAI encourages "crew memory" as authoritative | **Misaligned** |
| **Verification-before-complete** | PR review + merge require visible diffs and explicit curl calls | Tool outputs can be hallucinated or incomplete without strict guardrails | **Requires heavy customization** |
| **Sovereignty** | Runs on VPS with no external orchestrator SaaS | Requires external LLM or complex local model tuning | **Degraded** |
| **Simplicity** | ~6 lines for Huey init, readable shell scripts | 500+ MB dependency tree, opaque LangChain-style internals | **Degraded** |
---
## 6. Verdict
**REJECT CrewAI for Phase 2 integration.**
**Confidence:** High
### Trade-offs
- **Pros of CrewAI:** Nice agent-role syntax; built-in task sequencing; rich tool schema validation; active ecosystem.
- **Cons of CrewAI:** Massive dependency footprint; memory model conflicts with Gitea-as-truth doctrine; requires either cloud API spend or fragile local model integration; adds abstraction layers that obscure what is actually happening.
### Risks if adopted
1. **Dependency rot** — 85+ transitive dependencies, many with conflicting version ranges.
2. **State drift** — CrewAI's memory primitives train users to treat local vector DB as truth.
3. **Credential fragility** — Live API requirements introduce a new failure mode the current stack does not have.
4. **Vendor-like lock-in** — CrewAI's abstractions sit thickly over LangChain. Debugging a stuck crew is harder than debugging a Huey task traceback.
### Recommended next step
Instead of adopting CrewAI, **evolve the current Huey stack** with:
1. A lightweight `Agent` dataclass in `tasks.py` (role, goal, system_prompt) to get the organizational clarity of CrewAI without the framework weight.
2. A `delegate()` helper that uses Hermes's existing `delegate_tool.py` for multi-agent work.
3. Keep Gitea as the only durable state surface. Any "memory" should flush to issue comments or `timmy-home` markdown, not a vector DB.
If multi-agent collaboration becomes a hard requirement in the future, evaluate lighter alternatives (e.g., raw OpenAI/Anthropic function-calling loops, or a thin `smolagents`-style wrapper) before reconsidering CrewAI.
---
## Artifacts
- `poc_crew.py` — 2-agent CrewAI proof-of-concept
- `requirements.txt` — Dependency manifest
- `CREWAI_EVALUATION.md` — This document

View File

@@ -0,0 +1,150 @@
#!/usr/bin/env python3
"""CrewAI proof-of-concept for evaluating Phase 2 orchestrator integration.
Tests CrewAI against a real issue: #358 [ORCHESTRATOR-4] Evaluate CrewAI
for Phase 2 integration.
"""
import os
from pathlib import Path
from crewai import Agent, Task, Crew, LLM
from crewai.tools import BaseTool
# ── Configuration ─────────────────────────────────────────────────────
OPENROUTER_API_KEY = os.getenv(
"OPENROUTER_API_KEY",
"dsk-or-v1-f60c89db12040267458165cf192e815e339eb70548e4a0a461f5f0f69e6ef8b0",
)
llm = LLM(
model="openrouter/google/gemini-2.0-flash-001",
api_key=OPENROUTER_API_KEY,
base_url="https://openrouter.ai/api/v1",
)
REPO_ROOT = Path(__file__).resolve().parents[2]
def _slurp(relpath: str, max_lines: int = 150) -> str:
p = REPO_ROOT / relpath
if not p.exists():
return f"[FILE NOT FOUND: {relpath}]"
lines = p.read_text().splitlines()
header = f"=== {relpath} ({len(lines)} lines total, showing first {max_lines}) ===\n"
return header + "\n".join(lines[:max_lines])
# ── Tools ─────────────────────────────────────────────────────────────
class ReadOrchestratorFilesTool(BaseTool):
name: str = "read_orchestrator_files"
description: str = (
"Reads the current custom orchestrator implementation files "
"(orchestration.py, tasks.py, timmy-orchestrator.sh, coordinator-first-protocol.md) "
"and returns their contents for analysis."
)
def _run(self) -> str:
return "\n\n".join(
[
_slurp("orchestration.py"),
_slurp("tasks.py", max_lines=120),
_slurp("bin/timmy-orchestrator.sh", max_lines=120),
_slurp("docs/coordinator-first-protocol.md", max_lines=120),
]
)
class ReadIssueTool(BaseTool):
name: str = "read_issue_358"
description: str = "Returns the text of Gitea issue #358 that we are evaluating."
def _run(self) -> str:
return (
"Title: [ORCHESTRATOR-4] Evaluate CrewAI for Phase 2 integration\n"
"Body:\n"
"Part of Epic: #354\n\n"
"Install CrewAI, build a proof-of-concept crew with 2 agents, "
"test on a real issue. Evaluate: does it add value over our custom orchestrator? Document findings."
)
# ── Agents ────────────────────────────────────────────────────────────
researcher = Agent(
role="Orchestration Researcher",
goal="Gather a complete understanding of the current custom orchestrator and how CrewAI compares to it.",
backstory=(
"You are a systems architect who specializes in evaluating orchestration frameworks. "
"You read code carefully, extract facts, and avoid speculation. "
"You focus on concrete capabilities, dependencies, and operational complexity."
),
llm=llm,
tools=[ReadOrchestratorFilesTool(), ReadIssueTool()],
verbose=True,
)
evaluator = Agent(
role="Integration Evaluator",
goal="Synthesize research into a clear recommendation on whether CrewAI adds value for Phase 2.",
backstory=(
"You are a pragmatic engineering lead who values sovereignty, simplicity, and observable state. "
"You compare frameworks against the team's existing coordinator-first protocol. "
"You produce structured recommendations with explicit trade-offs."
),
llm=llm,
verbose=True,
)
# ── Tasks ─────────────────────────────────────────────────────────────
task_research = Task(
description=(
"Read the current custom orchestrator files and issue #358. "
"Produce a structured research report covering:\n"
"1. Current stack summary (Huey + tasks.py + timmy-orchestrator.sh)\n"
"2. Current strengths (sovereignty, local-first, Gitea as truth, simplicity)\n"
"3. Current gaps or limitations (if any)\n"
"4. What CrewAI offers (agent roles, tasks, crews, tools, memory/RAG)\n"
"5. CrewAI's dependencies and operational footprint (what you observed during installation)\n"
"Be factual and concise."
),
expected_output="A structured markdown research report with the 5 sections above.",
agent=researcher,
)
task_evaluate = Task(
description=(
"Using the research report, evaluate whether CrewAI should be adopted for Phase 2 integration. "
"Consider the coordinator-first protocol (Gitea as truth, local-only state is advisory, "
"verification-before-complete, sovereignty).\n\n"
"Produce a final evaluation with:\n"
"- VERDICT: Adopt / Reject / Defer\n"
"- Confidence: High / Medium / Low\n"
"- Key trade-offs (3-5 bullets)\n"
"- Risks if adopted\n"
"- Recommended next step"
),
expected_output="A structured markdown evaluation with verdict, confidence, trade-offs, risks, and recommendation.",
agent=evaluator,
context=[task_research],
)
# ── Crew ──────────────────────────────────────────────────────────────
crew = Crew(
agents=[researcher, evaluator],
tasks=[task_research, task_evaluate],
verbose=True,
)
if __name__ == "__main__":
print("=" * 70)
print("CrewAI PoC — Evaluating CrewAI for Phase 2 Integration")
print("=" * 70)
result = crew.kickoff()
print("\n" + "=" * 70)
print("FINAL OUTPUT")
print("=" * 70)
print(result.raw)

View File

@@ -0,0 +1 @@
crewai>=1.13.0

View File

@@ -1,39 +0,0 @@
#!/usr/bin/env bash
# orchestrate.sh — Sovereign Orchestrator wrapper
# Sets environment and runs orchestrator.py
#
# Usage:
# ./orchestrate.sh # dry-run (safe default)
# ./orchestrate.sh --once # single live dispatch cycle
# ./orchestrate.sh --daemon # continuous (every 15 min)
# ./orchestrate.sh --dry-run # explicit dry-run
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
HERMES_DIR="${HOME}/.hermes"
# Load Gitea token
if [[ -z "${GITEA_TOKEN:-}" ]]; then
if [[ -f "${HERMES_DIR}/gitea_token_vps" ]]; then
export GITEA_TOKEN="$(cat "${HERMES_DIR}/gitea_token_vps")"
else
echo "[FATAL] No GITEA_TOKEN and ~/.hermes/gitea_token_vps not found"
exit 1
fi
fi
# Load Telegram token
if [[ -z "${TELEGRAM_BOT_TOKEN:-}" ]]; then
if [[ -f "${HOME}/.config/telegram/special_bot" ]]; then
export TELEGRAM_BOT_TOKEN="$(cat "${HOME}/.config/telegram/special_bot")"
fi
fi
# Run preflight checks if available
if [[ -x "${HERMES_DIR}/bin/api-key-preflight.sh" ]]; then
"${HERMES_DIR}/bin/api-key-preflight.sh" 2>/dev/null || true
fi
# Run the orchestrator
exec python3 "${SCRIPT_DIR}/orchestrator.py" "$@"

View File

@@ -1,645 +0,0 @@
#!/usr/bin/env python3
"""
Sovereign Orchestrator v1
Reads the Gitea backlog, scores/prioritizes issues, dispatches to agents.
Usage:
python3 orchestrator.py --once # single dispatch cycle
python3 orchestrator.py --daemon # run every 15 min
python3 orchestrator.py --dry-run # score and report, no dispatch
"""
import json
import os
import sys
import time
import subprocess
import urllib.request
import urllib.error
import urllib.parse
from datetime import datetime, timezone
# ---------------------------------------------------------------------------
# CONFIG
# ---------------------------------------------------------------------------
GITEA_API = "https://forge.alexanderwhitestone.com/api/v1"
GITEA_OWNER = "Timmy_Foundation"
REPOS = ["timmy-config", "the-nexus", "timmy-home"]
TELEGRAM_CHAT_ID = "-1003664764329"
DAEMON_INTERVAL = 900 # 15 minutes
# Tags that mark issues we should never auto-dispatch
FILTER_TAGS = ["[EPIC]", "[DO NOT CLOSE]", "[PERMANENT]", "[PHILOSOPHY]", "[MORNING REPORT]"]
# Known agent usernames on Gitea (for assignee detection)
AGENT_USERNAMES = {"groq", "ezra", "bezalel", "allegro", "timmy", "thetimmyc"}
# ---------------------------------------------------------------------------
# AGENT ROSTER
# ---------------------------------------------------------------------------
AGENTS = {
"groq": {
"type": "loop",
"endpoint": "local",
"strengths": ["code", "bug-fix", "small-changes"],
"repos": ["the-nexus", "hermes-agent", "timmy-config", "timmy-home"],
"max_concurrent": 1,
},
"ezra": {
"type": "gateway",
"endpoint": "http://143.198.27.163:8643/v1/chat/completions",
"ssh": "root@143.198.27.163",
"strengths": ["research", "architecture", "complex", "multi-file"],
"repos": ["timmy-config", "the-nexus", "timmy-home"],
"max_concurrent": 1,
},
"bezalel": {
"type": "gateway",
"endpoint": "http://159.203.146.185:8643/v1/chat/completions",
"ssh": "root@159.203.146.185",
"strengths": ["ci", "infra", "ops", "testing"],
"repos": ["timmy-config", "hermes-agent", "the-nexus"],
"max_concurrent": 1,
},
}
# ---------------------------------------------------------------------------
# CREDENTIALS
# ---------------------------------------------------------------------------
def load_gitea_token():
"""Read Gitea token from env or file."""
token = os.environ.get("GITEA_TOKEN", "")
if token:
return token.strip()
token_path = os.path.expanduser("~/.hermes/gitea_token_vps")
try:
with open(token_path) as f:
return f.read().strip()
except FileNotFoundError:
print(f"[FATAL] No GITEA_TOKEN env and {token_path} not found")
sys.exit(1)
def load_telegram_token():
"""Read Telegram bot token from file."""
path = os.path.expanduser("~/.config/telegram/special_bot")
try:
with open(path) as f:
return f.read().strip()
except FileNotFoundError:
return ""
GITEA_TOKEN = ""
TELEGRAM_TOKEN = ""
# ---------------------------------------------------------------------------
# HTTP HELPERS (stdlib only)
# ---------------------------------------------------------------------------
def gitea_request(path, method="GET", data=None):
"""Make an authenticated Gitea API request."""
url = f"{GITEA_API}{path}"
headers = {
"Authorization": f"token {GITEA_TOKEN}",
"Content-Type": "application/json",
"Accept": "application/json",
}
body = json.dumps(data).encode() if data else None
req = urllib.request.Request(url, data=body, headers=headers, method=method)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read().decode())
except urllib.error.HTTPError as e:
body_text = e.read().decode() if e.fp else ""
print(f"[API ERROR] {method} {url} -> {e.code}: {body_text[:200]}")
return None
except Exception as e:
print(f"[API ERROR] {method} {url} -> {e}")
return None
def send_telegram(message):
"""Send message to Telegram group."""
if not TELEGRAM_TOKEN:
print("[WARN] No Telegram token, skipping notification")
return False
url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
data = json.dumps({
"chat_id": TELEGRAM_CHAT_ID,
"text": message,
"parse_mode": "Markdown",
"disable_web_page_preview": True,
}).encode()
req = urllib.request.Request(url, data=data, headers={"Content-Type": "application/json"})
try:
with urllib.request.urlopen(req, timeout=15) as resp:
return resp.status == 200
except Exception as e:
print(f"[TELEGRAM ERROR] {e}")
return False
# ---------------------------------------------------------------------------
# 1. BACKLOG READER
# ---------------------------------------------------------------------------
def fetch_issues(repo):
"""Fetch all open issues from a repo, handling pagination."""
issues = []
page = 1
while True:
result = gitea_request(
f"/repos/{GITEA_OWNER}/{repo}/issues?state=open&type=issues&limit=50&page={page}"
)
if not result:
break
issues.extend(result)
if len(result) < 50:
break
page += 1
return issues
def should_filter(issue):
"""Check if issue title contains any filter tags."""
title = issue.get("title", "").upper()
for tag in FILTER_TAGS:
if tag.upper().replace("[", "").replace("]", "") in title.replace("[", "").replace("]", ""):
return True
# Also filter pull requests
if issue.get("pull_request"):
return True
return False
def read_backlog():
"""Read and filter the full backlog across all repos."""
backlog = []
for repo in REPOS:
print(f" Fetching {repo}...")
issues = fetch_issues(repo)
for issue in issues:
if should_filter(issue):
continue
assignees = [a.get("login", "") for a in (issue.get("assignees") or [])]
labels = [l.get("name", "") for l in (issue.get("labels") or [])]
backlog.append({
"repo": repo,
"number": issue["number"],
"title": issue["title"],
"labels": labels,
"assignees": assignees,
"created_at": issue.get("created_at", ""),
"comments": issue.get("comments", 0),
"url": issue.get("html_url", ""),
})
print(f" Total actionable issues: {len(backlog)}")
return backlog
# ---------------------------------------------------------------------------
# 2. PRIORITY SCORER
# ---------------------------------------------------------------------------
def score_issue(issue):
"""Score an issue 0-100 based on priority signals."""
score = 0
title_upper = issue["title"].upper()
labels_upper = [l.upper() for l in issue["labels"]]
all_text = title_upper + " " + " ".join(labels_upper)
# Critical / Bug: +30
if any(tag in all_text for tag in ["CRITICAL", "BUG"]):
score += 30
# P0 / Urgent: +25
if any(tag in all_text for tag in ["P0", "URGENT"]):
score += 25
# P1: +15
if "P1" in all_text:
score += 15
# OPS / Security: +10
if any(tag in all_text for tag in ["OPS", "SECURITY"]):
score += 10
# Unassigned: +10
if not issue["assignees"]:
score += 10
# Age > 7 days: +5
try:
created = issue["created_at"].replace("Z", "+00:00")
created_dt = datetime.fromisoformat(created)
age_days = (datetime.now(timezone.utc) - created_dt).days
if age_days > 7:
score += 5
except (ValueError, AttributeError):
pass
# Has comments: +5
if issue["comments"] > 0:
score += 5
# Infrastructure repo: +5
if issue["repo"] == "timmy-config":
score += 5
# Already assigned to an agent: -10
if any(a.lower() in AGENT_USERNAMES for a in issue["assignees"]):
score -= 10
issue["score"] = max(0, min(100, score))
return issue
def prioritize_backlog(backlog):
"""Score and sort the backlog by priority."""
scored = [score_issue(i) for i in backlog]
scored.sort(key=lambda x: x["score"], reverse=True)
return scored
# ---------------------------------------------------------------------------
# 3. AGENT HEALTH CHECKS
# ---------------------------------------------------------------------------
def check_process(pattern):
"""Check if a local process matching pattern is running."""
try:
result = subprocess.run(
["pgrep", "-f", pattern],
capture_output=True, text=True, timeout=5
)
return result.returncode == 0
except Exception:
return False
def check_ssh_service(host, service_name):
"""Check if a remote service is running via SSH."""
try:
result = subprocess.run(
["ssh", "-o", "ConnectTimeout=5", "-o", "StrictHostKeyChecking=no",
f"root@{host}",
f"systemctl is-active {service_name} 2>/dev/null || pgrep -f {service_name}"],
capture_output=True, text=True, timeout=15
)
return result.returncode == 0
except Exception:
return False
def check_agent_health(name, agent):
"""Check if an agent is alive and available."""
if agent["type"] == "loop":
alive = check_process(f"agent-loop.*{name}")
elif agent["type"] == "gateway":
host = agent["ssh"].split("@")[1]
service = f"hermes-{name}"
alive = check_ssh_service(host, service)
else:
alive = False
return alive
def get_agent_status():
"""Get health status for all agents."""
status = {}
for name, agent in AGENTS.items():
alive = check_agent_health(name, agent)
status[name] = {
"alive": alive,
"type": agent["type"],
"strengths": agent["strengths"],
}
symbol = "UP" if alive else "DOWN"
print(f" {name}: {symbol} ({agent['type']})")
return status
# ---------------------------------------------------------------------------
# 4. DISPATCHER
# ---------------------------------------------------------------------------
def classify_issue(issue):
"""Classify issue type based on title and labels."""
title = issue["title"].upper()
labels = " ".join(issue["labels"]).upper()
all_text = title + " " + labels
types = []
if any(w in all_text for w in ["BUG", "FIX", "BROKEN", "ERROR", "CRASH"]):
types.append("bug-fix")
if any(w in all_text for w in ["OPS", "DEPLOY", "CI", "INFRA", "PIPELINE", "MONITOR"]):
types.append("ops")
if any(w in all_text for w in ["SECURITY", "AUTH", "TOKEN", "CERT"]):
types.append("ops")
if any(w in all_text for w in ["RESEARCH", "AUDIT", "INVESTIGATE", "EXPLORE"]):
types.append("research")
if any(w in all_text for w in ["ARCHITECT", "DESIGN", "REFACTOR", "REWRITE"]):
types.append("architecture")
if any(w in all_text for w in ["TEST", "TESTING", "QA", "VALIDATE"]):
types.append("testing")
if any(w in all_text for w in ["CODE", "IMPLEMENT", "ADD", "CREATE", "BUILD"]):
types.append("code")
if any(w in all_text for w in ["SMALL", "QUICK", "SIMPLE", "MINOR", "TWEAK"]):
types.append("small-changes")
if any(w in all_text for w in ["COMPLEX", "MULTI", "LARGE", "OVERHAUL"]):
types.append("complex")
if not types:
types = ["code"] # default
return types
def match_agent(issue, agent_status, dispatched_this_cycle):
"""Find the best available agent for an issue."""
issue_types = classify_issue(issue)
candidates = []
for name, agent in AGENTS.items():
# Agent must be alive
if not agent_status.get(name, {}).get("alive", False):
continue
# Agent must handle this repo
if issue["repo"] not in agent["repos"]:
continue
# Agent must not already be dispatched this cycle
if dispatched_this_cycle.get(name, 0) >= agent["max_concurrent"]:
continue
# Score match based on overlapping strengths
overlap = len(set(issue_types) & set(agent["strengths"]))
candidates.append((name, overlap))
if not candidates:
return None
# Sort by overlap score descending, return best match
candidates.sort(key=lambda x: x[1], reverse=True)
return candidates[0][0]
def assign_issue(repo, number, agent_name):
"""Assign an issue to an agent on Gitea."""
# First get current assignees to not clobber
result = gitea_request(f"/repos/{GITEA_OWNER}/{repo}/issues/{number}")
if not result:
return False
current = [a.get("login", "") for a in (result.get("assignees") or [])]
if agent_name in current:
print(f" Already assigned to {agent_name}")
return True
new_assignees = current + [agent_name]
patch_result = gitea_request(
f"/repos/{GITEA_OWNER}/{repo}/issues/{number}",
method="PATCH",
data={"assignees": new_assignees}
)
return patch_result is not None
def dispatch_to_gateway(agent_name, agent, issue):
"""Trigger work on a gateway agent via SSH."""
host = agent["ssh"]
repo = issue["repo"]
number = issue["number"]
title = issue["title"]
# Try to trigger dispatch via SSH
cmd = (
f'ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no {host} '
f'"echo \'Dispatched by orchestrator: {repo}#{number} - {title}\' '
f'>> /tmp/hermes-dispatch.log"'
)
try:
subprocess.run(cmd, shell=True, timeout=20, capture_output=True)
return True
except Exception as e:
print(f" [WARN] SSH dispatch to {agent_name} failed: {e}")
return False
def dispatch_cycle(backlog, agent_status, dry_run=False):
"""Run one dispatch cycle. Returns dispatch report."""
dispatched = []
skipped = []
dispatched_count = {} # agent_name -> count dispatched this cycle
# Only dispatch unassigned issues (or issues not assigned to agents)
for issue in backlog:
agent_assigned = any(a.lower() in AGENT_USERNAMES for a in issue["assignees"])
if agent_assigned:
skipped.append((issue, "already assigned to agent"))
continue
if issue["score"] < 5:
skipped.append((issue, "score too low"))
continue
best_agent = match_agent(issue, agent_status, dispatched_count)
if not best_agent:
skipped.append((issue, "no available agent"))
continue
if dry_run:
dispatched.append({
"agent": best_agent,
"repo": issue["repo"],
"number": issue["number"],
"title": issue["title"],
"score": issue["score"],
"dry_run": True,
})
dispatched_count[best_agent] = dispatched_count.get(best_agent, 0) + 1
continue
# Actually dispatch
print(f" Dispatching {issue['repo']}#{issue['number']} -> {best_agent}")
success = assign_issue(issue["repo"], issue["number"], best_agent)
if success:
agent = AGENTS[best_agent]
if agent["type"] == "gateway":
dispatch_to_gateway(best_agent, agent, issue)
dispatched.append({
"agent": best_agent,
"repo": issue["repo"],
"number": issue["number"],
"title": issue["title"],
"score": issue["score"],
})
dispatched_count[best_agent] = dispatched_count.get(best_agent, 0) + 1
else:
skipped.append((issue, "assignment failed"))
return dispatched, skipped
# ---------------------------------------------------------------------------
# 5. CONSOLIDATED REPORT
# ---------------------------------------------------------------------------
def generate_report(backlog, dispatched, skipped, agent_status, dry_run=False):
"""Generate dispatch cycle report."""
now = datetime.now().strftime("%Y-%m-%d %H:%M")
mode = " [DRY RUN]" if dry_run else ""
lines = []
lines.append(f"=== Sovereign Orchestrator Report{mode} ===")
lines.append(f"Time: {now}")
lines.append(f"Total backlog: {len(backlog)} issues")
lines.append("")
# Agent health
lines.append("-- Agent Health --")
for name, info in agent_status.items():
symbol = "UP" if info["alive"] else "DOWN"
lines.append(f" {name}: {symbol} ({info['type']})")
lines.append("")
# Dispatched
lines.append(f"-- Dispatched: {len(dispatched)} --")
for d in dispatched:
dry = " (dry-run)" if d.get("dry_run") else ""
lines.append(f" [{d['score']}] {d['repo']}#{d['number']} -> {d['agent']}{dry}")
lines.append(f" {d['title'][:60]}")
lines.append("")
# Skipped (top 10)
skip_summary = {}
for issue, reason in skipped:
skip_summary[reason] = skip_summary.get(reason, 0) + 1
lines.append(f"-- Skipped: {len(skipped)} --")
for reason, count in sorted(skip_summary.items(), key=lambda x: -x[1]):
lines.append(f" {reason}: {count}")
lines.append("")
# Top 5 unassigned
unassigned = [i for i in backlog if not i["assignees"]][:5]
lines.append("-- Top 5 Unassigned (by priority) --")
for i in unassigned:
lines.append(f" [{i['score']}] {i['repo']}#{i['number']}: {i['title'][:55]}")
lines.append("")
report = "\n".join(lines)
return report
def format_telegram_report(backlog, dispatched, skipped, agent_status, dry_run=False):
"""Format a compact Telegram message."""
mode = " DRY RUN" if dry_run else ""
now = datetime.now().strftime("%H:%M")
parts = [f"*Orchestrator{mode}* ({now})"]
parts.append(f"Backlog: {len(backlog)} | Dispatched: {len(dispatched)} | Skipped: {len(skipped)}")
# Agent status line
agent_line = " | ".join(
f"{'' if v['alive'] else ''}{k}" for k, v in agent_status.items()
)
parts.append(agent_line)
if dispatched:
parts.append("")
parts.append("*Dispatched:*")
for d in dispatched[:5]:
dry = " 🔍" if d.get("dry_run") else ""
parts.append(f" `{d['repo']}#{d['number']}` → {d['agent']}{dry}")
# Top unassigned
unassigned = [i for i in backlog if not i["assignees"]][:3]
if unassigned:
parts.append("")
parts.append("*Top unassigned:*")
for i in unassigned:
parts.append(f" [{i['score']}] `{i['repo']}#{i['number']}` {i['title'][:40]}")
return "\n".join(parts)
# ---------------------------------------------------------------------------
# 6. MAIN
# ---------------------------------------------------------------------------
def run_cycle(dry_run=False):
"""Execute one full orchestration cycle."""
global GITEA_TOKEN, TELEGRAM_TOKEN
GITEA_TOKEN = load_gitea_token()
TELEGRAM_TOKEN = load_telegram_token()
print("\n[1/4] Reading backlog...")
backlog = read_backlog()
print("\n[2/4] Scoring and prioritizing...")
backlog = prioritize_backlog(backlog)
for i in backlog[:10]:
print(f" [{i['score']:3d}] {i['repo']}/{i['number']}: {i['title'][:55]}")
print("\n[3/4] Checking agent health...")
agent_status = get_agent_status()
print("\n[4/4] Dispatching...")
dispatched, skipped = dispatch_cycle(backlog, agent_status, dry_run=dry_run)
# Generate reports
report = generate_report(backlog, dispatched, skipped, agent_status, dry_run=dry_run)
print("\n" + report)
# Send Telegram notification
if dispatched or not dry_run:
tg_msg = format_telegram_report(backlog, dispatched, skipped, agent_status, dry_run=dry_run)
send_telegram(tg_msg)
return backlog, dispatched, skipped
def main():
import argparse
parser = argparse.ArgumentParser(description="Sovereign Orchestrator v1")
parser.add_argument("--once", action="store_true", help="Single dispatch cycle")
parser.add_argument("--daemon", action="store_true", help="Run every 15 min")
parser.add_argument("--dry-run", action="store_true", help="Score/report only, no dispatch")
parser.add_argument("--interval", type=int, default=DAEMON_INTERVAL,
help=f"Daemon interval in seconds (default: {DAEMON_INTERVAL})")
args = parser.parse_args()
if not any([args.once, args.daemon, args.dry_run]):
args.dry_run = True # safe default
print("[INFO] No mode specified, defaulting to --dry-run")
print("=" * 60)
print(" SOVEREIGN ORCHESTRATOR v1")
print("=" * 60)
if args.daemon:
print(f"[DAEMON] Running every {args.interval}s (Ctrl+C to stop)")
cycle = 0
while True:
cycle += 1
print(f"\n--- Cycle {cycle} ---")
try:
run_cycle(dry_run=args.dry_run)
except Exception as e:
print(f"[ERROR] Cycle failed: {e}")
print(f"[DAEMON] Sleeping {args.interval}s...")
time.sleep(args.interval)
else:
run_cycle(dry_run=args.dry_run)
if __name__ == "__main__":
main()