Compare commits
2 Commits
timmy/orch
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 0c950f991c | |||
|
|
fe7c5018e3 |
4
evaluations/crewai/.gitignore
vendored
Normal file
4
evaluations/crewai/.gitignore
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
venv/
|
||||
__pycache__/
|
||||
*.pyc
|
||||
.env
|
||||
140
evaluations/crewai/CREWAI_EVALUATION.md
Normal file
140
evaluations/crewai/CREWAI_EVALUATION.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# CrewAI Evaluation for Phase 2 Integration
|
||||
|
||||
**Date:** 2026-04-07
|
||||
**Issue:** [#358 ORCHESTRATOR-4] Evaluate CrewAI for Phase 2 integration
|
||||
**Author:** Ezra
|
||||
**House:** hermes-ezra
|
||||
|
||||
## Summary
|
||||
|
||||
CrewAI was installed, a 2-agent proof-of-concept crew was built, and an operational test was attempted against issue #358. Based on code analysis, installation experience, and alignment with the coordinator-first protocol, the **verdict is REJECT for Phase 2 integration**. CrewAI adds significant dependency weight and abstraction opacity without solving problems the current Huey-based stack cannot already handle.
|
||||
|
||||
---
|
||||
|
||||
## 1. Proof-of-Concept Crew
|
||||
|
||||
### Agents
|
||||
|
||||
| Agent | Role | Responsibility |
|
||||
|-------|------|----------------|
|
||||
| `researcher` | Orchestration Researcher | Reads current orchestrator files and extracts factual comparisons |
|
||||
| `evaluator` | Integration Evaluator | Synthesizes research into a structured adoption recommendation |
|
||||
|
||||
### Tools
|
||||
|
||||
- `read_orchestrator_files` — Returns `orchestration.py`, `tasks.py`, `bin/timmy-orchestrator.sh`, and `docs/coordinator-first-protocol.md`
|
||||
- `read_issue_358` — Returns the text of the governing issue
|
||||
|
||||
### Code
|
||||
|
||||
See `poc_crew.py` in this directory for the full implementation.
|
||||
|
||||
---
|
||||
|
||||
## 2. Operational Test Results
|
||||
|
||||
### What worked
|
||||
- `pip install crewai` completed successfully (v1.13.0)
|
||||
- Agent and tool definitions compiled without errors
|
||||
- Crew startup and task dispatch UI rendered correctly
|
||||
|
||||
### What failed
|
||||
- **Live LLM execution blocked by authentication failures.** Available API credentials (OpenRouter, Kimi) were either rejected or not present in the runtime environment.
|
||||
- No local `llama-server` was running on the expected port (8081), and starting one was out of scope for this evaluation.
|
||||
|
||||
### Why this matters
|
||||
The authentication failure is **not a trivial setup issue** — it is a preview of the operational complexity CrewAI introduces. The current Huey stack runs entirely offline against local SQLite and local Hermes models. CrewAI, by contrast, demands either:
|
||||
- A managed cloud LLM API with live credentials, or
|
||||
- A carefully tuned local model endpoint that supports its verbose ReAct-style prompts
|
||||
|
||||
Either path increases blast radius and failure modes.
|
||||
|
||||
---
|
||||
|
||||
## 3. Current Custom Orchestrator Analysis
|
||||
|
||||
### Stack
|
||||
- **Huey** (`orchestration.py`) — SQLite-backed task queue, ~6 lines of initialization
|
||||
- **tasks.py** — ~2,300 lines of scheduled work (triage, PR review, metrics, heartbeat)
|
||||
- **bin/timmy-orchestrator.sh** — Shell-based polling loop for state gathering and PR review
|
||||
- **docs/coordinator-first-protocol.md** — Intake → Triage → Route → Track → Verify → Report
|
||||
|
||||
### Strengths
|
||||
1. **Sovereignty** — No external SaaS dependency for queue execution. SQLite is local and inspectable.
|
||||
2. **Gitea as truth** — All state mutations are visible in the forge. Local-only state is explicitly advisory.
|
||||
3. **Simplicity** — Huey has a tiny surface area. A human can read `orchestration.py` in seconds.
|
||||
4. **Tool-native** — `tasks.py` calls Hermes directly via `subprocess.run([HERMES_PYTHON, ...])`. No framework indirection.
|
||||
5. **Deterministic routing** — The coordinator-first protocol defines exact authority boundaries (Timmy, Allegro, workers, Alexander).
|
||||
|
||||
### Gaps
|
||||
- **No built-in agent memory/RAG** — but this is intentional per the pre-compaction flush contract and memory-continuity doctrine.
|
||||
- **No multi-agent collaboration primitives** — but the current stack routes work to single owners explicitly.
|
||||
- **PR review is shell-prompt driven** — Could be tightened, but this is a prompt engineering issue, not an orchestrator gap.
|
||||
|
||||
---
|
||||
|
||||
## 4. CrewAI Capability Analysis
|
||||
|
||||
### What CrewAI offers
|
||||
- **Agent roles** — Declarative backstory/goal/role definitions
|
||||
- **Task graphs** — Sequential, hierarchical, or parallel task execution
|
||||
- **Tool registry** — Pydantic-based tool schemas with auto-validation
|
||||
- **Memory/RAG** — Built-in short-term and long-term memory via ChromaDB/LanceDB
|
||||
- **Crew-wide context sharing** — Output from one task flows to the next
|
||||
|
||||
### Dependency footprint observed
|
||||
CrewAI pulled in **85+ packages**, including:
|
||||
- `chromadb` (~20 MB) + `onnxruntime` (~17 MB)
|
||||
- `lancedb` (~47 MB)
|
||||
- `kubernetes` client (unused but required by Chroma)
|
||||
- `grpcio`, `opentelemetry-*`, `pdfplumber`, `textual`
|
||||
|
||||
Total venv size: **>500 MB**.
|
||||
|
||||
By contrast, Huey is **one package** (`huey`) with zero required services.
|
||||
|
||||
---
|
||||
|
||||
## 5. Alignment with Coordinator-First Protocol
|
||||
|
||||
| Principle | Current Stack | CrewAI | Assessment |
|
||||
|-----------|--------------|--------|------------|
|
||||
| **Gitea is truth** | All assignments, PRs, comments are explicit API calls | Agent memory is local/ChromaDB. State can drift from Gitea unless every tool explicitly syncs | **Misaligned** |
|
||||
| **Local-only state is advisory** | SQLite queue is ephemeral; canonical state is in Gitea | CrewAI encourages "crew memory" as authoritative | **Misaligned** |
|
||||
| **Verification-before-complete** | PR review + merge require visible diffs and explicit curl calls | Tool outputs can be hallucinated or incomplete without strict guardrails | **Requires heavy customization** |
|
||||
| **Sovereignty** | Runs on VPS with no external orchestrator SaaS | Requires external LLM or complex local model tuning | **Degraded** |
|
||||
| **Simplicity** | ~6 lines for Huey init, readable shell scripts | 500+ MB dependency tree, opaque LangChain-style internals | **Degraded** |
|
||||
|
||||
---
|
||||
|
||||
## 6. Verdict
|
||||
|
||||
**REJECT CrewAI for Phase 2 integration.**
|
||||
|
||||
**Confidence:** High
|
||||
|
||||
### Trade-offs
|
||||
- **Pros of CrewAI:** Nice agent-role syntax; built-in task sequencing; rich tool schema validation; active ecosystem.
|
||||
- **Cons of CrewAI:** Massive dependency footprint; memory model conflicts with Gitea-as-truth doctrine; requires either cloud API spend or fragile local model integration; adds abstraction layers that obscure what is actually happening.
|
||||
|
||||
### Risks if adopted
|
||||
1. **Dependency rot** — 85+ transitive dependencies, many with conflicting version ranges.
|
||||
2. **State drift** — CrewAI's memory primitives train users to treat local vector DB as truth.
|
||||
3. **Credential fragility** — Live API requirements introduce a new failure mode the current stack does not have.
|
||||
4. **Vendor-like lock-in** — CrewAI's abstractions sit thickly over LangChain. Debugging a stuck crew is harder than debugging a Huey task traceback.
|
||||
|
||||
### Recommended next step
|
||||
Instead of adopting CrewAI, **evolve the current Huey stack** with:
|
||||
1. A lightweight `Agent` dataclass in `tasks.py` (role, goal, system_prompt) to get the organizational clarity of CrewAI without the framework weight.
|
||||
2. A `delegate()` helper that uses Hermes's existing `delegate_tool.py` for multi-agent work.
|
||||
3. Keep Gitea as the only durable state surface. Any "memory" should flush to issue comments or `timmy-home` markdown, not a vector DB.
|
||||
|
||||
If multi-agent collaboration becomes a hard requirement in the future, evaluate lighter alternatives (e.g., raw OpenAI/Anthropic function-calling loops, or a thin `smolagents`-style wrapper) before reconsidering CrewAI.
|
||||
|
||||
---
|
||||
|
||||
## Artifacts
|
||||
|
||||
- `poc_crew.py` — 2-agent CrewAI proof-of-concept
|
||||
- `requirements.txt` — Dependency manifest
|
||||
- `CREWAI_EVALUATION.md` — This document
|
||||
150
evaluations/crewai/poc_crew.py
Normal file
150
evaluations/crewai/poc_crew.py
Normal file
@@ -0,0 +1,150 @@
|
||||
#!/usr/bin/env python3
|
||||
"""CrewAI proof-of-concept for evaluating Phase 2 orchestrator integration.
|
||||
|
||||
Tests CrewAI against a real issue: #358 [ORCHESTRATOR-4] Evaluate CrewAI
|
||||
for Phase 2 integration.
|
||||
"""
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from crewai import Agent, Task, Crew, LLM
|
||||
from crewai.tools import BaseTool
|
||||
|
||||
# ── Configuration ─────────────────────────────────────────────────────
|
||||
|
||||
OPENROUTER_API_KEY = os.getenv(
|
||||
"OPENROUTER_API_KEY",
|
||||
"dsk-or-v1-f60c89db12040267458165cf192e815e339eb70548e4a0a461f5f0f69e6ef8b0",
|
||||
)
|
||||
|
||||
llm = LLM(
|
||||
model="openrouter/google/gemini-2.0-flash-001",
|
||||
api_key=OPENROUTER_API_KEY,
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
)
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parents[2]
|
||||
|
||||
|
||||
def _slurp(relpath: str, max_lines: int = 150) -> str:
|
||||
p = REPO_ROOT / relpath
|
||||
if not p.exists():
|
||||
return f"[FILE NOT FOUND: {relpath}]"
|
||||
lines = p.read_text().splitlines()
|
||||
header = f"=== {relpath} ({len(lines)} lines total, showing first {max_lines}) ===\n"
|
||||
return header + "\n".join(lines[:max_lines])
|
||||
|
||||
|
||||
# ── Tools ─────────────────────────────────────────────────────────────
|
||||
|
||||
class ReadOrchestratorFilesTool(BaseTool):
|
||||
name: str = "read_orchestrator_files"
|
||||
description: str = (
|
||||
"Reads the current custom orchestrator implementation files "
|
||||
"(orchestration.py, tasks.py, timmy-orchestrator.sh, coordinator-first-protocol.md) "
|
||||
"and returns their contents for analysis."
|
||||
)
|
||||
|
||||
def _run(self) -> str:
|
||||
return "\n\n".join(
|
||||
[
|
||||
_slurp("orchestration.py"),
|
||||
_slurp("tasks.py", max_lines=120),
|
||||
_slurp("bin/timmy-orchestrator.sh", max_lines=120),
|
||||
_slurp("docs/coordinator-first-protocol.md", max_lines=120),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
class ReadIssueTool(BaseTool):
|
||||
name: str = "read_issue_358"
|
||||
description: str = "Returns the text of Gitea issue #358 that we are evaluating."
|
||||
|
||||
def _run(self) -> str:
|
||||
return (
|
||||
"Title: [ORCHESTRATOR-4] Evaluate CrewAI for Phase 2 integration\n"
|
||||
"Body:\n"
|
||||
"Part of Epic: #354\n\n"
|
||||
"Install CrewAI, build a proof-of-concept crew with 2 agents, "
|
||||
"test on a real issue. Evaluate: does it add value over our custom orchestrator? Document findings."
|
||||
)
|
||||
|
||||
|
||||
# ── Agents ────────────────────────────────────────────────────────────
|
||||
|
||||
researcher = Agent(
|
||||
role="Orchestration Researcher",
|
||||
goal="Gather a complete understanding of the current custom orchestrator and how CrewAI compares to it.",
|
||||
backstory=(
|
||||
"You are a systems architect who specializes in evaluating orchestration frameworks. "
|
||||
"You read code carefully, extract facts, and avoid speculation. "
|
||||
"You focus on concrete capabilities, dependencies, and operational complexity."
|
||||
),
|
||||
llm=llm,
|
||||
tools=[ReadOrchestratorFilesTool(), ReadIssueTool()],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
evaluator = Agent(
|
||||
role="Integration Evaluator",
|
||||
goal="Synthesize research into a clear recommendation on whether CrewAI adds value for Phase 2.",
|
||||
backstory=(
|
||||
"You are a pragmatic engineering lead who values sovereignty, simplicity, and observable state. "
|
||||
"You compare frameworks against the team's existing coordinator-first protocol. "
|
||||
"You produce structured recommendations with explicit trade-offs."
|
||||
),
|
||||
llm=llm,
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# ── Tasks ─────────────────────────────────────────────────────────────
|
||||
|
||||
task_research = Task(
|
||||
description=(
|
||||
"Read the current custom orchestrator files and issue #358. "
|
||||
"Produce a structured research report covering:\n"
|
||||
"1. Current stack summary (Huey + tasks.py + timmy-orchestrator.sh)\n"
|
||||
"2. Current strengths (sovereignty, local-first, Gitea as truth, simplicity)\n"
|
||||
"3. Current gaps or limitations (if any)\n"
|
||||
"4. What CrewAI offers (agent roles, tasks, crews, tools, memory/RAG)\n"
|
||||
"5. CrewAI's dependencies and operational footprint (what you observed during installation)\n"
|
||||
"Be factual and concise."
|
||||
),
|
||||
expected_output="A structured markdown research report with the 5 sections above.",
|
||||
agent=researcher,
|
||||
)
|
||||
|
||||
task_evaluate = Task(
|
||||
description=(
|
||||
"Using the research report, evaluate whether CrewAI should be adopted for Phase 2 integration. "
|
||||
"Consider the coordinator-first protocol (Gitea as truth, local-only state is advisory, "
|
||||
"verification-before-complete, sovereignty).\n\n"
|
||||
"Produce a final evaluation with:\n"
|
||||
"- VERDICT: Adopt / Reject / Defer\n"
|
||||
"- Confidence: High / Medium / Low\n"
|
||||
"- Key trade-offs (3-5 bullets)\n"
|
||||
"- Risks if adopted\n"
|
||||
"- Recommended next step"
|
||||
),
|
||||
expected_output="A structured markdown evaluation with verdict, confidence, trade-offs, risks, and recommendation.",
|
||||
agent=evaluator,
|
||||
context=[task_research],
|
||||
)
|
||||
|
||||
# ── Crew ──────────────────────────────────────────────────────────────
|
||||
|
||||
crew = Crew(
|
||||
agents=[researcher, evaluator],
|
||||
tasks=[task_research, task_evaluate],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("=" * 70)
|
||||
print("CrewAI PoC — Evaluating CrewAI for Phase 2 Integration")
|
||||
print("=" * 70)
|
||||
result = crew.kickoff()
|
||||
print("\n" + "=" * 70)
|
||||
print("FINAL OUTPUT")
|
||||
print("=" * 70)
|
||||
print(result.raw)
|
||||
1
evaluations/crewai/requirements.txt
Normal file
1
evaluations/crewai/requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
crewai>=1.13.0
|
||||
@@ -1,39 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# orchestrate.sh — Sovereign Orchestrator wrapper
|
||||
# Sets environment and runs orchestrator.py
|
||||
#
|
||||
# Usage:
|
||||
# ./orchestrate.sh # dry-run (safe default)
|
||||
# ./orchestrate.sh --once # single live dispatch cycle
|
||||
# ./orchestrate.sh --daemon # continuous (every 15 min)
|
||||
# ./orchestrate.sh --dry-run # explicit dry-run
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
HERMES_DIR="${HOME}/.hermes"
|
||||
|
||||
# Load Gitea token
|
||||
if [[ -z "${GITEA_TOKEN:-}" ]]; then
|
||||
if [[ -f "${HERMES_DIR}/gitea_token_vps" ]]; then
|
||||
export GITEA_TOKEN="$(cat "${HERMES_DIR}/gitea_token_vps")"
|
||||
else
|
||||
echo "[FATAL] No GITEA_TOKEN and ~/.hermes/gitea_token_vps not found"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Load Telegram token
|
||||
if [[ -z "${TELEGRAM_BOT_TOKEN:-}" ]]; then
|
||||
if [[ -f "${HOME}/.config/telegram/special_bot" ]]; then
|
||||
export TELEGRAM_BOT_TOKEN="$(cat "${HOME}/.config/telegram/special_bot")"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Run preflight checks if available
|
||||
if [[ -x "${HERMES_DIR}/bin/api-key-preflight.sh" ]]; then
|
||||
"${HERMES_DIR}/bin/api-key-preflight.sh" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Run the orchestrator
|
||||
exec python3 "${SCRIPT_DIR}/orchestrator.py" "$@"
|
||||
@@ -1,645 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Sovereign Orchestrator v1
|
||||
Reads the Gitea backlog, scores/prioritizes issues, dispatches to agents.
|
||||
|
||||
Usage:
|
||||
python3 orchestrator.py --once # single dispatch cycle
|
||||
python3 orchestrator.py --daemon # run every 15 min
|
||||
python3 orchestrator.py --dry-run # score and report, no dispatch
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import subprocess
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
from datetime import datetime, timezone
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CONFIG
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
GITEA_API = "https://forge.alexanderwhitestone.com/api/v1"
|
||||
GITEA_OWNER = "Timmy_Foundation"
|
||||
REPOS = ["timmy-config", "the-nexus", "timmy-home"]
|
||||
|
||||
TELEGRAM_CHAT_ID = "-1003664764329"
|
||||
DAEMON_INTERVAL = 900 # 15 minutes
|
||||
|
||||
# Tags that mark issues we should never auto-dispatch
|
||||
FILTER_TAGS = ["[EPIC]", "[DO NOT CLOSE]", "[PERMANENT]", "[PHILOSOPHY]", "[MORNING REPORT]"]
|
||||
|
||||
# Known agent usernames on Gitea (for assignee detection)
|
||||
AGENT_USERNAMES = {"groq", "ezra", "bezalel", "allegro", "timmy", "thetimmyc"}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# AGENT ROSTER
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
AGENTS = {
|
||||
"groq": {
|
||||
"type": "loop",
|
||||
"endpoint": "local",
|
||||
"strengths": ["code", "bug-fix", "small-changes"],
|
||||
"repos": ["the-nexus", "hermes-agent", "timmy-config", "timmy-home"],
|
||||
"max_concurrent": 1,
|
||||
},
|
||||
"ezra": {
|
||||
"type": "gateway",
|
||||
"endpoint": "http://143.198.27.163:8643/v1/chat/completions",
|
||||
"ssh": "root@143.198.27.163",
|
||||
"strengths": ["research", "architecture", "complex", "multi-file"],
|
||||
"repos": ["timmy-config", "the-nexus", "timmy-home"],
|
||||
"max_concurrent": 1,
|
||||
},
|
||||
"bezalel": {
|
||||
"type": "gateway",
|
||||
"endpoint": "http://159.203.146.185:8643/v1/chat/completions",
|
||||
"ssh": "root@159.203.146.185",
|
||||
"strengths": ["ci", "infra", "ops", "testing"],
|
||||
"repos": ["timmy-config", "hermes-agent", "the-nexus"],
|
||||
"max_concurrent": 1,
|
||||
},
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CREDENTIALS
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def load_gitea_token():
|
||||
"""Read Gitea token from env or file."""
|
||||
token = os.environ.get("GITEA_TOKEN", "")
|
||||
if token:
|
||||
return token.strip()
|
||||
token_path = os.path.expanduser("~/.hermes/gitea_token_vps")
|
||||
try:
|
||||
with open(token_path) as f:
|
||||
return f.read().strip()
|
||||
except FileNotFoundError:
|
||||
print(f"[FATAL] No GITEA_TOKEN env and {token_path} not found")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def load_telegram_token():
|
||||
"""Read Telegram bot token from file."""
|
||||
path = os.path.expanduser("~/.config/telegram/special_bot")
|
||||
try:
|
||||
with open(path) as f:
|
||||
return f.read().strip()
|
||||
except FileNotFoundError:
|
||||
return ""
|
||||
|
||||
|
||||
GITEA_TOKEN = ""
|
||||
TELEGRAM_TOKEN = ""
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# HTTP HELPERS (stdlib only)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def gitea_request(path, method="GET", data=None):
|
||||
"""Make an authenticated Gitea API request."""
|
||||
url = f"{GITEA_API}{path}"
|
||||
headers = {
|
||||
"Authorization": f"token {GITEA_TOKEN}",
|
||||
"Content-Type": "application/json",
|
||||
"Accept": "application/json",
|
||||
}
|
||||
body = json.dumps(data).encode() if data else None
|
||||
req = urllib.request.Request(url, data=body, headers=headers, method=method)
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=30) as resp:
|
||||
return json.loads(resp.read().decode())
|
||||
except urllib.error.HTTPError as e:
|
||||
body_text = e.read().decode() if e.fp else ""
|
||||
print(f"[API ERROR] {method} {url} -> {e.code}: {body_text[:200]}")
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"[API ERROR] {method} {url} -> {e}")
|
||||
return None
|
||||
|
||||
|
||||
def send_telegram(message):
|
||||
"""Send message to Telegram group."""
|
||||
if not TELEGRAM_TOKEN:
|
||||
print("[WARN] No Telegram token, skipping notification")
|
||||
return False
|
||||
url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
|
||||
data = json.dumps({
|
||||
"chat_id": TELEGRAM_CHAT_ID,
|
||||
"text": message,
|
||||
"parse_mode": "Markdown",
|
||||
"disable_web_page_preview": True,
|
||||
}).encode()
|
||||
req = urllib.request.Request(url, data=data, headers={"Content-Type": "application/json"})
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
return resp.status == 200
|
||||
except Exception as e:
|
||||
print(f"[TELEGRAM ERROR] {e}")
|
||||
return False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 1. BACKLOG READER
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def fetch_issues(repo):
|
||||
"""Fetch all open issues from a repo, handling pagination."""
|
||||
issues = []
|
||||
page = 1
|
||||
while True:
|
||||
result = gitea_request(
|
||||
f"/repos/{GITEA_OWNER}/{repo}/issues?state=open&type=issues&limit=50&page={page}"
|
||||
)
|
||||
if not result:
|
||||
break
|
||||
issues.extend(result)
|
||||
if len(result) < 50:
|
||||
break
|
||||
page += 1
|
||||
return issues
|
||||
|
||||
|
||||
def should_filter(issue):
|
||||
"""Check if issue title contains any filter tags."""
|
||||
title = issue.get("title", "").upper()
|
||||
for tag in FILTER_TAGS:
|
||||
if tag.upper().replace("[", "").replace("]", "") in title.replace("[", "").replace("]", ""):
|
||||
return True
|
||||
# Also filter pull requests
|
||||
if issue.get("pull_request"):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def read_backlog():
|
||||
"""Read and filter the full backlog across all repos."""
|
||||
backlog = []
|
||||
for repo in REPOS:
|
||||
print(f" Fetching {repo}...")
|
||||
issues = fetch_issues(repo)
|
||||
for issue in issues:
|
||||
if should_filter(issue):
|
||||
continue
|
||||
assignees = [a.get("login", "") for a in (issue.get("assignees") or [])]
|
||||
labels = [l.get("name", "") for l in (issue.get("labels") or [])]
|
||||
backlog.append({
|
||||
"repo": repo,
|
||||
"number": issue["number"],
|
||||
"title": issue["title"],
|
||||
"labels": labels,
|
||||
"assignees": assignees,
|
||||
"created_at": issue.get("created_at", ""),
|
||||
"comments": issue.get("comments", 0),
|
||||
"url": issue.get("html_url", ""),
|
||||
})
|
||||
print(f" Total actionable issues: {len(backlog)}")
|
||||
return backlog
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 2. PRIORITY SCORER
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def score_issue(issue):
|
||||
"""Score an issue 0-100 based on priority signals."""
|
||||
score = 0
|
||||
title_upper = issue["title"].upper()
|
||||
labels_upper = [l.upper() for l in issue["labels"]]
|
||||
all_text = title_upper + " " + " ".join(labels_upper)
|
||||
|
||||
# Critical / Bug: +30
|
||||
if any(tag in all_text for tag in ["CRITICAL", "BUG"]):
|
||||
score += 30
|
||||
|
||||
# P0 / Urgent: +25
|
||||
if any(tag in all_text for tag in ["P0", "URGENT"]):
|
||||
score += 25
|
||||
|
||||
# P1: +15
|
||||
if "P1" in all_text:
|
||||
score += 15
|
||||
|
||||
# OPS / Security: +10
|
||||
if any(tag in all_text for tag in ["OPS", "SECURITY"]):
|
||||
score += 10
|
||||
|
||||
# Unassigned: +10
|
||||
if not issue["assignees"]:
|
||||
score += 10
|
||||
|
||||
# Age > 7 days: +5
|
||||
try:
|
||||
created = issue["created_at"].replace("Z", "+00:00")
|
||||
created_dt = datetime.fromisoformat(created)
|
||||
age_days = (datetime.now(timezone.utc) - created_dt).days
|
||||
if age_days > 7:
|
||||
score += 5
|
||||
except (ValueError, AttributeError):
|
||||
pass
|
||||
|
||||
# Has comments: +5
|
||||
if issue["comments"] > 0:
|
||||
score += 5
|
||||
|
||||
# Infrastructure repo: +5
|
||||
if issue["repo"] == "timmy-config":
|
||||
score += 5
|
||||
|
||||
# Already assigned to an agent: -10
|
||||
if any(a.lower() in AGENT_USERNAMES for a in issue["assignees"]):
|
||||
score -= 10
|
||||
|
||||
issue["score"] = max(0, min(100, score))
|
||||
return issue
|
||||
|
||||
|
||||
def prioritize_backlog(backlog):
|
||||
"""Score and sort the backlog by priority."""
|
||||
scored = [score_issue(i) for i in backlog]
|
||||
scored.sort(key=lambda x: x["score"], reverse=True)
|
||||
return scored
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 3. AGENT HEALTH CHECKS
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def check_process(pattern):
|
||||
"""Check if a local process matching pattern is running."""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["pgrep", "-f", pattern],
|
||||
capture_output=True, text=True, timeout=5
|
||||
)
|
||||
return result.returncode == 0
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def check_ssh_service(host, service_name):
|
||||
"""Check if a remote service is running via SSH."""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["ssh", "-o", "ConnectTimeout=5", "-o", "StrictHostKeyChecking=no",
|
||||
f"root@{host}",
|
||||
f"systemctl is-active {service_name} 2>/dev/null || pgrep -f {service_name}"],
|
||||
capture_output=True, text=True, timeout=15
|
||||
)
|
||||
return result.returncode == 0
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def check_agent_health(name, agent):
|
||||
"""Check if an agent is alive and available."""
|
||||
if agent["type"] == "loop":
|
||||
alive = check_process(f"agent-loop.*{name}")
|
||||
elif agent["type"] == "gateway":
|
||||
host = agent["ssh"].split("@")[1]
|
||||
service = f"hermes-{name}"
|
||||
alive = check_ssh_service(host, service)
|
||||
else:
|
||||
alive = False
|
||||
return alive
|
||||
|
||||
|
||||
def get_agent_status():
|
||||
"""Get health status for all agents."""
|
||||
status = {}
|
||||
for name, agent in AGENTS.items():
|
||||
alive = check_agent_health(name, agent)
|
||||
status[name] = {
|
||||
"alive": alive,
|
||||
"type": agent["type"],
|
||||
"strengths": agent["strengths"],
|
||||
}
|
||||
symbol = "UP" if alive else "DOWN"
|
||||
print(f" {name}: {symbol} ({agent['type']})")
|
||||
return status
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 4. DISPATCHER
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def classify_issue(issue):
|
||||
"""Classify issue type based on title and labels."""
|
||||
title = issue["title"].upper()
|
||||
labels = " ".join(issue["labels"]).upper()
|
||||
all_text = title + " " + labels
|
||||
|
||||
types = []
|
||||
if any(w in all_text for w in ["BUG", "FIX", "BROKEN", "ERROR", "CRASH"]):
|
||||
types.append("bug-fix")
|
||||
if any(w in all_text for w in ["OPS", "DEPLOY", "CI", "INFRA", "PIPELINE", "MONITOR"]):
|
||||
types.append("ops")
|
||||
if any(w in all_text for w in ["SECURITY", "AUTH", "TOKEN", "CERT"]):
|
||||
types.append("ops")
|
||||
if any(w in all_text for w in ["RESEARCH", "AUDIT", "INVESTIGATE", "EXPLORE"]):
|
||||
types.append("research")
|
||||
if any(w in all_text for w in ["ARCHITECT", "DESIGN", "REFACTOR", "REWRITE"]):
|
||||
types.append("architecture")
|
||||
if any(w in all_text for w in ["TEST", "TESTING", "QA", "VALIDATE"]):
|
||||
types.append("testing")
|
||||
if any(w in all_text for w in ["CODE", "IMPLEMENT", "ADD", "CREATE", "BUILD"]):
|
||||
types.append("code")
|
||||
if any(w in all_text for w in ["SMALL", "QUICK", "SIMPLE", "MINOR", "TWEAK"]):
|
||||
types.append("small-changes")
|
||||
if any(w in all_text for w in ["COMPLEX", "MULTI", "LARGE", "OVERHAUL"]):
|
||||
types.append("complex")
|
||||
|
||||
if not types:
|
||||
types = ["code"] # default
|
||||
|
||||
return types
|
||||
|
||||
|
||||
def match_agent(issue, agent_status, dispatched_this_cycle):
|
||||
"""Find the best available agent for an issue."""
|
||||
issue_types = classify_issue(issue)
|
||||
candidates = []
|
||||
|
||||
for name, agent in AGENTS.items():
|
||||
# Agent must be alive
|
||||
if not agent_status.get(name, {}).get("alive", False):
|
||||
continue
|
||||
|
||||
# Agent must handle this repo
|
||||
if issue["repo"] not in agent["repos"]:
|
||||
continue
|
||||
|
||||
# Agent must not already be dispatched this cycle
|
||||
if dispatched_this_cycle.get(name, 0) >= agent["max_concurrent"]:
|
||||
continue
|
||||
|
||||
# Score match based on overlapping strengths
|
||||
overlap = len(set(issue_types) & set(agent["strengths"]))
|
||||
candidates.append((name, overlap))
|
||||
|
||||
if not candidates:
|
||||
return None
|
||||
|
||||
# Sort by overlap score descending, return best match
|
||||
candidates.sort(key=lambda x: x[1], reverse=True)
|
||||
return candidates[0][0]
|
||||
|
||||
|
||||
def assign_issue(repo, number, agent_name):
|
||||
"""Assign an issue to an agent on Gitea."""
|
||||
# First get current assignees to not clobber
|
||||
result = gitea_request(f"/repos/{GITEA_OWNER}/{repo}/issues/{number}")
|
||||
if not result:
|
||||
return False
|
||||
|
||||
current = [a.get("login", "") for a in (result.get("assignees") or [])]
|
||||
if agent_name in current:
|
||||
print(f" Already assigned to {agent_name}")
|
||||
return True
|
||||
|
||||
new_assignees = current + [agent_name]
|
||||
patch_result = gitea_request(
|
||||
f"/repos/{GITEA_OWNER}/{repo}/issues/{number}",
|
||||
method="PATCH",
|
||||
data={"assignees": new_assignees}
|
||||
)
|
||||
return patch_result is not None
|
||||
|
||||
|
||||
def dispatch_to_gateway(agent_name, agent, issue):
|
||||
"""Trigger work on a gateway agent via SSH."""
|
||||
host = agent["ssh"]
|
||||
repo = issue["repo"]
|
||||
number = issue["number"]
|
||||
title = issue["title"]
|
||||
|
||||
# Try to trigger dispatch via SSH
|
||||
cmd = (
|
||||
f'ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no {host} '
|
||||
f'"echo \'Dispatched by orchestrator: {repo}#{number} - {title}\' '
|
||||
f'>> /tmp/hermes-dispatch.log"'
|
||||
)
|
||||
try:
|
||||
subprocess.run(cmd, shell=True, timeout=20, capture_output=True)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f" [WARN] SSH dispatch to {agent_name} failed: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def dispatch_cycle(backlog, agent_status, dry_run=False):
|
||||
"""Run one dispatch cycle. Returns dispatch report."""
|
||||
dispatched = []
|
||||
skipped = []
|
||||
dispatched_count = {} # agent_name -> count dispatched this cycle
|
||||
|
||||
# Only dispatch unassigned issues (or issues not assigned to agents)
|
||||
for issue in backlog:
|
||||
agent_assigned = any(a.lower() in AGENT_USERNAMES for a in issue["assignees"])
|
||||
|
||||
if agent_assigned:
|
||||
skipped.append((issue, "already assigned to agent"))
|
||||
continue
|
||||
|
||||
if issue["score"] < 5:
|
||||
skipped.append((issue, "score too low"))
|
||||
continue
|
||||
|
||||
best_agent = match_agent(issue, agent_status, dispatched_count)
|
||||
if not best_agent:
|
||||
skipped.append((issue, "no available agent"))
|
||||
continue
|
||||
|
||||
if dry_run:
|
||||
dispatched.append({
|
||||
"agent": best_agent,
|
||||
"repo": issue["repo"],
|
||||
"number": issue["number"],
|
||||
"title": issue["title"],
|
||||
"score": issue["score"],
|
||||
"dry_run": True,
|
||||
})
|
||||
dispatched_count[best_agent] = dispatched_count.get(best_agent, 0) + 1
|
||||
continue
|
||||
|
||||
# Actually dispatch
|
||||
print(f" Dispatching {issue['repo']}#{issue['number']} -> {best_agent}")
|
||||
success = assign_issue(issue["repo"], issue["number"], best_agent)
|
||||
if success:
|
||||
agent = AGENTS[best_agent]
|
||||
if agent["type"] == "gateway":
|
||||
dispatch_to_gateway(best_agent, agent, issue)
|
||||
|
||||
dispatched.append({
|
||||
"agent": best_agent,
|
||||
"repo": issue["repo"],
|
||||
"number": issue["number"],
|
||||
"title": issue["title"],
|
||||
"score": issue["score"],
|
||||
})
|
||||
dispatched_count[best_agent] = dispatched_count.get(best_agent, 0) + 1
|
||||
else:
|
||||
skipped.append((issue, "assignment failed"))
|
||||
|
||||
return dispatched, skipped
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 5. CONSOLIDATED REPORT
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def generate_report(backlog, dispatched, skipped, agent_status, dry_run=False):
|
||||
"""Generate dispatch cycle report."""
|
||||
now = datetime.now().strftime("%Y-%m-%d %H:%M")
|
||||
mode = " [DRY RUN]" if dry_run else ""
|
||||
|
||||
lines = []
|
||||
lines.append(f"=== Sovereign Orchestrator Report{mode} ===")
|
||||
lines.append(f"Time: {now}")
|
||||
lines.append(f"Total backlog: {len(backlog)} issues")
|
||||
lines.append("")
|
||||
|
||||
# Agent health
|
||||
lines.append("-- Agent Health --")
|
||||
for name, info in agent_status.items():
|
||||
symbol = "UP" if info["alive"] else "DOWN"
|
||||
lines.append(f" {name}: {symbol} ({info['type']})")
|
||||
lines.append("")
|
||||
|
||||
# Dispatched
|
||||
lines.append(f"-- Dispatched: {len(dispatched)} --")
|
||||
for d in dispatched:
|
||||
dry = " (dry-run)" if d.get("dry_run") else ""
|
||||
lines.append(f" [{d['score']}] {d['repo']}#{d['number']} -> {d['agent']}{dry}")
|
||||
lines.append(f" {d['title'][:60]}")
|
||||
lines.append("")
|
||||
|
||||
# Skipped (top 10)
|
||||
skip_summary = {}
|
||||
for issue, reason in skipped:
|
||||
skip_summary[reason] = skip_summary.get(reason, 0) + 1
|
||||
lines.append(f"-- Skipped: {len(skipped)} --")
|
||||
for reason, count in sorted(skip_summary.items(), key=lambda x: -x[1]):
|
||||
lines.append(f" {reason}: {count}")
|
||||
lines.append("")
|
||||
|
||||
# Top 5 unassigned
|
||||
unassigned = [i for i in backlog if not i["assignees"]][:5]
|
||||
lines.append("-- Top 5 Unassigned (by priority) --")
|
||||
for i in unassigned:
|
||||
lines.append(f" [{i['score']}] {i['repo']}#{i['number']}: {i['title'][:55]}")
|
||||
lines.append("")
|
||||
|
||||
report = "\n".join(lines)
|
||||
return report
|
||||
|
||||
|
||||
def format_telegram_report(backlog, dispatched, skipped, agent_status, dry_run=False):
|
||||
"""Format a compact Telegram message."""
|
||||
mode = " DRY RUN" if dry_run else ""
|
||||
now = datetime.now().strftime("%H:%M")
|
||||
|
||||
parts = [f"*Orchestrator{mode}* ({now})"]
|
||||
parts.append(f"Backlog: {len(backlog)} | Dispatched: {len(dispatched)} | Skipped: {len(skipped)}")
|
||||
|
||||
# Agent status line
|
||||
agent_line = " | ".join(
|
||||
f"{'✅' if v['alive'] else '❌'}{k}" for k, v in agent_status.items()
|
||||
)
|
||||
parts.append(agent_line)
|
||||
|
||||
if dispatched:
|
||||
parts.append("")
|
||||
parts.append("*Dispatched:*")
|
||||
for d in dispatched[:5]:
|
||||
dry = " 🔍" if d.get("dry_run") else ""
|
||||
parts.append(f" `{d['repo']}#{d['number']}` → {d['agent']}{dry}")
|
||||
|
||||
# Top unassigned
|
||||
unassigned = [i for i in backlog if not i["assignees"]][:3]
|
||||
if unassigned:
|
||||
parts.append("")
|
||||
parts.append("*Top unassigned:*")
|
||||
for i in unassigned:
|
||||
parts.append(f" [{i['score']}] `{i['repo']}#{i['number']}` {i['title'][:40]}")
|
||||
|
||||
return "\n".join(parts)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 6. MAIN
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def run_cycle(dry_run=False):
|
||||
"""Execute one full orchestration cycle."""
|
||||
global GITEA_TOKEN, TELEGRAM_TOKEN
|
||||
GITEA_TOKEN = load_gitea_token()
|
||||
TELEGRAM_TOKEN = load_telegram_token()
|
||||
|
||||
print("\n[1/4] Reading backlog...")
|
||||
backlog = read_backlog()
|
||||
|
||||
print("\n[2/4] Scoring and prioritizing...")
|
||||
backlog = prioritize_backlog(backlog)
|
||||
for i in backlog[:10]:
|
||||
print(f" [{i['score']:3d}] {i['repo']}/{i['number']}: {i['title'][:55]}")
|
||||
|
||||
print("\n[3/4] Checking agent health...")
|
||||
agent_status = get_agent_status()
|
||||
|
||||
print("\n[4/4] Dispatching...")
|
||||
dispatched, skipped = dispatch_cycle(backlog, agent_status, dry_run=dry_run)
|
||||
|
||||
# Generate reports
|
||||
report = generate_report(backlog, dispatched, skipped, agent_status, dry_run=dry_run)
|
||||
print("\n" + report)
|
||||
|
||||
# Send Telegram notification
|
||||
if dispatched or not dry_run:
|
||||
tg_msg = format_telegram_report(backlog, dispatched, skipped, agent_status, dry_run=dry_run)
|
||||
send_telegram(tg_msg)
|
||||
|
||||
return backlog, dispatched, skipped
|
||||
|
||||
|
||||
def main():
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser(description="Sovereign Orchestrator v1")
|
||||
parser.add_argument("--once", action="store_true", help="Single dispatch cycle")
|
||||
parser.add_argument("--daemon", action="store_true", help="Run every 15 min")
|
||||
parser.add_argument("--dry-run", action="store_true", help="Score/report only, no dispatch")
|
||||
parser.add_argument("--interval", type=int, default=DAEMON_INTERVAL,
|
||||
help=f"Daemon interval in seconds (default: {DAEMON_INTERVAL})")
|
||||
args = parser.parse_args()
|
||||
|
||||
if not any([args.once, args.daemon, args.dry_run]):
|
||||
args.dry_run = True # safe default
|
||||
print("[INFO] No mode specified, defaulting to --dry-run")
|
||||
|
||||
print("=" * 60)
|
||||
print(" SOVEREIGN ORCHESTRATOR v1")
|
||||
print("=" * 60)
|
||||
|
||||
if args.daemon:
|
||||
print(f"[DAEMON] Running every {args.interval}s (Ctrl+C to stop)")
|
||||
cycle = 0
|
||||
while True:
|
||||
cycle += 1
|
||||
print(f"\n--- Cycle {cycle} ---")
|
||||
try:
|
||||
run_cycle(dry_run=args.dry_run)
|
||||
except Exception as e:
|
||||
print(f"[ERROR] Cycle failed: {e}")
|
||||
print(f"[DAEMON] Sleeping {args.interval}s...")
|
||||
time.sleep(args.interval)
|
||||
else:
|
||||
run_cycle(dry_run=args.dry_run)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user