Compare commits
5 Commits
docs/autom
...
timmy/issu
| Author | SHA1 | Date | |
|---|---|---|---|
| 00d8c62df0 | |||
| 877425bde4 | |||
|
|
2d3cea8127 | ||
| 34e01f0986 | |||
| d955d2b9f1 |
57
CONTRIBUTING.md
Normal file
57
CONTRIBUTING.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# Contributing to timmy-config
|
||||
|
||||
## Proof Standard
|
||||
|
||||
This is a hard rule.
|
||||
|
||||
- visual changes require screenshot proof
|
||||
- do not commit screenshots or binary media to Gitea backup unless explicitly required
|
||||
- CLI/verifiable changes must cite the exact command output, log path, or world-state proof showing acceptance criteria were met
|
||||
- config-only changes are not fully accepted when the real acceptance bar is live runtime behavior
|
||||
- no proof, no merge
|
||||
|
||||
## How to satisfy the rule
|
||||
|
||||
### Visual changes
|
||||
Examples:
|
||||
- skin updates
|
||||
- terminal UI layout changes
|
||||
- browser-facing output
|
||||
- dashboard/panel changes
|
||||
|
||||
Required proof:
|
||||
- attach screenshot proof to the PR or issue discussion
|
||||
- keep the screenshot outside the repo unless explicitly asked to commit it
|
||||
- name what the screenshot proves
|
||||
|
||||
### CLI / harness / operational changes
|
||||
Examples:
|
||||
- scripts
|
||||
- config wiring
|
||||
- heartbeat behavior
|
||||
- model routing
|
||||
- export pipelines
|
||||
|
||||
Required proof:
|
||||
- cite the exact command used
|
||||
- paste the relevant output, or
|
||||
- cite the exact log path / world-state artifact that proves the change
|
||||
|
||||
Good:
|
||||
- `python3 -m pytest tests/test_x.py -q` → `2 passed`
|
||||
- `~/.timmy/timmy-config/logs/huey.log`
|
||||
- `~/.hermes/model_health.json`
|
||||
|
||||
Bad:
|
||||
- "looks right"
|
||||
- "compiled"
|
||||
- "should work now"
|
||||
|
||||
## Default merge gate
|
||||
|
||||
Every PR should make it obvious:
|
||||
1. what changed
|
||||
2. what acceptance criteria were targeted
|
||||
3. what evidence proves those criteria were met
|
||||
|
||||
If that evidence is missing, the PR is not done.
|
||||
36
README.md
36
README.md
@@ -17,14 +17,20 @@ timmy-config/
|
||||
├── bin/ ← Live utility scripts (NOT deprecated loops)
|
||||
│ ├── hermes-startup.sh ← Hermes boot sequence
|
||||
│ ├── agent-dispatch.sh ← Manual agent dispatch
|
||||
│ ├── deploy-allegro-house.sh← Bootstraps the remote Allegro wizard house
|
||||
│ ├── ops-panel.sh ← Ops dashboard panel
|
||||
│ ├── ops-gitea.sh ← Gitea ops helpers
|
||||
│ ├── pipeline-freshness.sh ← Session/export drift check
|
||||
│ └── timmy-status.sh ← Status check
|
||||
│ ├── timmy-status.sh ← Status check
|
||||
│ └── crucible_mcp_server.py ← Z3-backed verification sidecar (MCP)
|
||||
├── memories/ ← Persistent memory YAML
|
||||
├── skins/ ← UI skins (timmy skin)
|
||||
├── playbooks/ ← Agent playbooks (YAML)
|
||||
│ └── verified-logic.yaml ← Crucible-first proof playbook
|
||||
├── cron/ ← Cron job definitions
|
||||
├── wizards/ ← Remote wizard-house templates + units
|
||||
├── docs/
|
||||
│ └── crucible-first-cut.md ← Crucible design doc
|
||||
└── training/ ← Transitional training recipes, not canonical lived data
|
||||
```
|
||||
|
||||
@@ -54,6 +60,15 @@ pip install huey
|
||||
huey_consumer.py tasks.huey -w 2 -k thread
|
||||
```
|
||||
|
||||
## Proof Standard
|
||||
|
||||
This repo uses a hard proof rule for merges.
|
||||
|
||||
- visual changes require screenshot proof
|
||||
- CLI/verifiable changes must cite logs, command output, or world-state proof
|
||||
- screenshots/media stay out of Gitea backup unless explicitly required
|
||||
- see `CONTRIBUTING.md` for the merge gate
|
||||
|
||||
## Deploy
|
||||
|
||||
```bash
|
||||
@@ -62,6 +77,12 @@ git clone <this-repo> ~/.timmy/timmy-config
|
||||
cd ~/.timmy/timmy-config
|
||||
./deploy.sh
|
||||
|
||||
# Deploy and restart the gateway so new MCP tools load
|
||||
./deploy.sh --restart-gateway
|
||||
|
||||
# Deploy and restart everything (gateway + loops)
|
||||
./deploy.sh --restart-all
|
||||
|
||||
# This overlays config onto ~/.hermes/ without touching hermes-agent code
|
||||
```
|
||||
|
||||
@@ -76,3 +97,16 @@ SOUL.md is Inscription 1 — inscribed on Bitcoin, immutable. It defines:
|
||||
- The conscience hierarchy (chain > code > prompt > user instruction)
|
||||
|
||||
No system prompt, no user instruction, no future code can override what is written there.
|
||||
|
||||
## Crucible (Neuro-Symbolic Verification)
|
||||
|
||||
The first neuro-symbolic slice ships as a sidecar MCP server:
|
||||
- `mcp_crucible_schedule_tasks`
|
||||
- `mcp_crucible_order_dependencies`
|
||||
- `mcp_crucible_capacity_fit`
|
||||
|
||||
These tools log proof trails under `~/.hermes/logs/crucible/` and return SAT/UNSAT plus witness models.
|
||||
|
||||
## Architecture: Sidecar, Not Fork
|
||||
|
||||
Timmy-config is applied as an overlay onto the Hermes harness. No forking required.
|
||||
|
||||
459
bin/crucible_mcp_server.py
Normal file
459
bin/crucible_mcp_server.py
Normal file
@@ -0,0 +1,459 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Z3-backed Crucible MCP server for Timmy.
|
||||
|
||||
Sidecar-only. Lives in timmy-config, deploys into ~/.hermes/bin/, and is loaded
|
||||
by Hermes through native MCP tool discovery. No hermes-agent fork required.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from mcp.server import FastMCP
|
||||
from z3 import And, Bool, Distinct, If, Implies, Int, Optimize, Or, Sum, sat, unsat
|
||||
|
||||
mcp = FastMCP(
|
||||
name="crucible",
|
||||
instructions=(
|
||||
"Formal verification sidecar for Timmy. Use these tools for scheduling, "
|
||||
"dependency ordering, and resource/capacity feasibility. Return SAT/UNSAT "
|
||||
"with witness models instead of fuzzy prose."
|
||||
),
|
||||
dependencies=["z3-solver"],
|
||||
)
|
||||
|
||||
|
||||
def _hermes_home() -> Path:
|
||||
return Path(os.path.expanduser(os.getenv("HERMES_HOME", "~/.hermes")))
|
||||
|
||||
|
||||
def _proof_dir() -> Path:
|
||||
path = _hermes_home() / "logs" / "crucible"
|
||||
path.mkdir(parents=True, exist_ok=True)
|
||||
return path
|
||||
|
||||
|
||||
def _ts() -> str:
|
||||
return datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%S_%fZ")
|
||||
|
||||
|
||||
def _json_default(value: Any) -> Any:
|
||||
if isinstance(value, Path):
|
||||
return str(value)
|
||||
raise TypeError(f"Unsupported type for JSON serialization: {type(value)!r}")
|
||||
|
||||
|
||||
def _log_proof(tool_name: str, request: dict[str, Any], result: dict[str, Any]) -> str:
|
||||
path = _proof_dir() / f"{_ts()}_{tool_name}.json"
|
||||
payload = {
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"tool": tool_name,
|
||||
"request": request,
|
||||
"result": result,
|
||||
}
|
||||
path.write_text(json.dumps(payload, indent=2, default=_json_default))
|
||||
return str(path)
|
||||
|
||||
|
||||
def _ensure_unique(names: list[str], label: str) -> None:
|
||||
if len(set(names)) != len(names):
|
||||
raise ValueError(f"Duplicate {label} names are not allowed: {names}")
|
||||
|
||||
|
||||
def _normalize_dependency(dep: Any) -> tuple[str, str, int]:
|
||||
if isinstance(dep, dict):
|
||||
before = dep.get("before")
|
||||
after = dep.get("after")
|
||||
lag = int(dep.get("lag", 0))
|
||||
if not before or not after:
|
||||
raise ValueError(f"Dependency dict must include before/after: {dep!r}")
|
||||
return str(before), str(after), lag
|
||||
if isinstance(dep, (list, tuple)) and len(dep) in (2, 3):
|
||||
before = str(dep[0])
|
||||
after = str(dep[1])
|
||||
lag = int(dep[2]) if len(dep) == 3 else 0
|
||||
return before, after, lag
|
||||
raise ValueError(f"Unsupported dependency shape: {dep!r}")
|
||||
|
||||
|
||||
def _normalize_task(task: dict[str, Any]) -> dict[str, Any]:
|
||||
name = str(task["name"])
|
||||
duration = int(task["duration"])
|
||||
if duration <= 0:
|
||||
raise ValueError(f"Task duration must be positive: {task!r}")
|
||||
return {"name": name, "duration": duration}
|
||||
|
||||
|
||||
def _normalize_item(item: dict[str, Any]) -> dict[str, Any]:
|
||||
name = str(item["name"])
|
||||
amount = int(item["amount"])
|
||||
value = int(item.get("value", amount))
|
||||
required = bool(item.get("required", False))
|
||||
if amount < 0:
|
||||
raise ValueError(f"Item amount must be non-negative: {item!r}")
|
||||
return {
|
||||
"name": name,
|
||||
"amount": amount,
|
||||
"value": value,
|
||||
"required": required,
|
||||
}
|
||||
|
||||
|
||||
def solve_schedule_tasks(
|
||||
tasks: list[dict[str, Any]],
|
||||
horizon: int,
|
||||
dependencies: list[Any] | None = None,
|
||||
fixed_starts: dict[str, int] | None = None,
|
||||
max_parallel_tasks: int = 1,
|
||||
minimize_makespan: bool = True,
|
||||
) -> dict[str, Any]:
|
||||
tasks = [_normalize_task(task) for task in tasks]
|
||||
dependencies = dependencies or []
|
||||
fixed_starts = fixed_starts or {}
|
||||
horizon = int(horizon)
|
||||
max_parallel_tasks = int(max_parallel_tasks)
|
||||
|
||||
if horizon <= 0:
|
||||
raise ValueError("horizon must be positive")
|
||||
if max_parallel_tasks <= 0:
|
||||
raise ValueError("max_parallel_tasks must be positive")
|
||||
|
||||
names = [task["name"] for task in tasks]
|
||||
_ensure_unique(names, "task")
|
||||
durations = {task["name"]: task["duration"] for task in tasks}
|
||||
|
||||
opt = Optimize()
|
||||
start = {name: Int(f"start_{name}") for name in names}
|
||||
end = {name: Int(f"end_{name}") for name in names}
|
||||
makespan = Int("makespan")
|
||||
|
||||
for name in names:
|
||||
opt.add(start[name] >= 0)
|
||||
opt.add(end[name] == start[name] + durations[name])
|
||||
opt.add(end[name] <= horizon)
|
||||
if name in fixed_starts:
|
||||
opt.add(start[name] == int(fixed_starts[name]))
|
||||
|
||||
for dep in dependencies:
|
||||
before, after, lag = _normalize_dependency(dep)
|
||||
if before not in start or after not in start:
|
||||
raise ValueError(f"Unknown task in dependency {dep!r}")
|
||||
opt.add(start[after] >= end[before] + lag)
|
||||
|
||||
# Discrete resource capacity over integer time slots.
|
||||
for t in range(horizon):
|
||||
active = [If(And(start[name] <= t, t < end[name]), 1, 0) for name in names]
|
||||
opt.add(Sum(active) <= max_parallel_tasks)
|
||||
|
||||
for name in names:
|
||||
opt.add(makespan >= end[name])
|
||||
if minimize_makespan:
|
||||
opt.minimize(makespan)
|
||||
|
||||
result = opt.check()
|
||||
proof: dict[str, Any]
|
||||
if result == sat:
|
||||
model = opt.model()
|
||||
schedule = []
|
||||
for name in sorted(names, key=lambda n: model.eval(start[n]).as_long()):
|
||||
s = model.eval(start[name]).as_long()
|
||||
e = model.eval(end[name]).as_long()
|
||||
schedule.append({
|
||||
"name": name,
|
||||
"start": s,
|
||||
"end": e,
|
||||
"duration": durations[name],
|
||||
})
|
||||
proof = {
|
||||
"status": "sat",
|
||||
"summary": "Schedule proven feasible.",
|
||||
"horizon": horizon,
|
||||
"max_parallel_tasks": max_parallel_tasks,
|
||||
"makespan": model.eval(makespan).as_long(),
|
||||
"schedule": schedule,
|
||||
"dependencies": [
|
||||
{"before": b, "after": a, "lag": lag}
|
||||
for b, a, lag in (_normalize_dependency(dep) for dep in dependencies)
|
||||
],
|
||||
}
|
||||
elif result == unsat:
|
||||
proof = {
|
||||
"status": "unsat",
|
||||
"summary": "Schedule is impossible under the given horizon/dependency/capacity constraints.",
|
||||
"horizon": horizon,
|
||||
"max_parallel_tasks": max_parallel_tasks,
|
||||
"dependencies": [
|
||||
{"before": b, "after": a, "lag": lag}
|
||||
for b, a, lag in (_normalize_dependency(dep) for dep in dependencies)
|
||||
],
|
||||
}
|
||||
else:
|
||||
proof = {
|
||||
"status": "unknown",
|
||||
"summary": "Solver could not prove SAT or UNSAT for this schedule.",
|
||||
"horizon": horizon,
|
||||
"max_parallel_tasks": max_parallel_tasks,
|
||||
}
|
||||
|
||||
proof["proof_log"] = _log_proof(
|
||||
"schedule_tasks",
|
||||
{
|
||||
"tasks": tasks,
|
||||
"horizon": horizon,
|
||||
"dependencies": dependencies,
|
||||
"fixed_starts": fixed_starts,
|
||||
"max_parallel_tasks": max_parallel_tasks,
|
||||
"minimize_makespan": minimize_makespan,
|
||||
},
|
||||
proof,
|
||||
)
|
||||
return proof
|
||||
|
||||
|
||||
def solve_dependency_order(
|
||||
entities: list[str],
|
||||
before: list[Any],
|
||||
fixed_positions: dict[str, int] | None = None,
|
||||
) -> dict[str, Any]:
|
||||
entities = [str(entity) for entity in entities]
|
||||
fixed_positions = fixed_positions or {}
|
||||
_ensure_unique(entities, "entity")
|
||||
|
||||
opt = Optimize()
|
||||
pos = {entity: Int(f"pos_{entity}") for entity in entities}
|
||||
opt.add(Distinct(*pos.values()))
|
||||
for entity in entities:
|
||||
opt.add(pos[entity] >= 0)
|
||||
opt.add(pos[entity] < len(entities))
|
||||
if entity in fixed_positions:
|
||||
opt.add(pos[entity] == int(fixed_positions[entity]))
|
||||
|
||||
normalized = []
|
||||
for dep in before:
|
||||
left, right, _lag = _normalize_dependency(dep)
|
||||
if left not in pos or right not in pos:
|
||||
raise ValueError(f"Unknown entity in ordering constraint: {dep!r}")
|
||||
opt.add(pos[left] < pos[right])
|
||||
normalized.append({"before": left, "after": right})
|
||||
|
||||
result = opt.check()
|
||||
if result == sat:
|
||||
model = opt.model()
|
||||
ordering = sorted(entities, key=lambda entity: model.eval(pos[entity]).as_long())
|
||||
proof = {
|
||||
"status": "sat",
|
||||
"summary": "Dependency ordering is consistent.",
|
||||
"ordering": ordering,
|
||||
"positions": {entity: model.eval(pos[entity]).as_long() for entity in entities},
|
||||
"constraints": normalized,
|
||||
}
|
||||
elif result == unsat:
|
||||
proof = {
|
||||
"status": "unsat",
|
||||
"summary": "Dependency ordering contains a contradiction/cycle.",
|
||||
"constraints": normalized,
|
||||
}
|
||||
else:
|
||||
proof = {
|
||||
"status": "unknown",
|
||||
"summary": "Solver could not prove SAT or UNSAT for this dependency graph.",
|
||||
"constraints": normalized,
|
||||
}
|
||||
|
||||
proof["proof_log"] = _log_proof(
|
||||
"order_dependencies",
|
||||
{
|
||||
"entities": entities,
|
||||
"before": before,
|
||||
"fixed_positions": fixed_positions,
|
||||
},
|
||||
proof,
|
||||
)
|
||||
return proof
|
||||
|
||||
|
||||
def solve_capacity_fit(
|
||||
items: list[dict[str, Any]],
|
||||
capacity: int,
|
||||
maximize_value: bool = True,
|
||||
) -> dict[str, Any]:
|
||||
items = [_normalize_item(item) for item in items]
|
||||
capacity = int(capacity)
|
||||
if capacity < 0:
|
||||
raise ValueError("capacity must be non-negative")
|
||||
|
||||
names = [item["name"] for item in items]
|
||||
_ensure_unique(names, "item")
|
||||
choose = {item["name"]: Bool(f"choose_{item['name']}") for item in items}
|
||||
|
||||
opt = Optimize()
|
||||
for item in items:
|
||||
if item["required"]:
|
||||
opt.add(choose[item["name"]])
|
||||
|
||||
total_amount = Sum([If(choose[item["name"]], item["amount"], 0) for item in items])
|
||||
total_value = Sum([If(choose[item["name"]], item["value"], 0) for item in items])
|
||||
opt.add(total_amount <= capacity)
|
||||
if maximize_value:
|
||||
opt.maximize(total_value)
|
||||
|
||||
result = opt.check()
|
||||
if result == sat:
|
||||
model = opt.model()
|
||||
chosen = [item for item in items if bool(model.eval(choose[item["name"]], model_completion=True))]
|
||||
skipped = [item for item in items if item not in chosen]
|
||||
used = sum(item["amount"] for item in chosen)
|
||||
proof = {
|
||||
"status": "sat",
|
||||
"summary": "Capacity constraints are feasible.",
|
||||
"capacity": capacity,
|
||||
"used": used,
|
||||
"remaining": capacity - used,
|
||||
"chosen": chosen,
|
||||
"skipped": skipped,
|
||||
"total_value": sum(item["value"] for item in chosen),
|
||||
}
|
||||
elif result == unsat:
|
||||
proof = {
|
||||
"status": "unsat",
|
||||
"summary": "Required items exceed available capacity.",
|
||||
"capacity": capacity,
|
||||
"required_items": [item for item in items if item["required"]],
|
||||
}
|
||||
else:
|
||||
proof = {
|
||||
"status": "unknown",
|
||||
"summary": "Solver could not prove SAT or UNSAT for this capacity check.",
|
||||
"capacity": capacity,
|
||||
}
|
||||
|
||||
proof["proof_log"] = _log_proof(
|
||||
"capacity_fit",
|
||||
{
|
||||
"items": items,
|
||||
"capacity": capacity,
|
||||
"maximize_value": maximize_value,
|
||||
},
|
||||
proof,
|
||||
)
|
||||
return proof
|
||||
|
||||
|
||||
@mcp.tool(
|
||||
name="schedule_tasks",
|
||||
description=(
|
||||
"Crucible template for discrete scheduling. Proves whether integer-duration "
|
||||
"tasks fit within a time horizon under dependency and parallelism constraints."
|
||||
),
|
||||
structured_output=True,
|
||||
)
|
||||
def schedule_tasks(
|
||||
tasks: list[dict[str, Any]],
|
||||
horizon: int,
|
||||
dependencies: list[Any] | None = None,
|
||||
fixed_starts: dict[str, int] | None = None,
|
||||
max_parallel_tasks: int = 1,
|
||||
minimize_makespan: bool = True,
|
||||
) -> dict[str, Any]:
|
||||
return solve_schedule_tasks(
|
||||
tasks=tasks,
|
||||
horizon=horizon,
|
||||
dependencies=dependencies,
|
||||
fixed_starts=fixed_starts,
|
||||
max_parallel_tasks=max_parallel_tasks,
|
||||
minimize_makespan=minimize_makespan,
|
||||
)
|
||||
|
||||
|
||||
@mcp.tool(
|
||||
name="order_dependencies",
|
||||
description=(
|
||||
"Crucible template for dependency ordering. Proves whether a set of before/after "
|
||||
"constraints is consistent and returns a valid topological order when SAT."
|
||||
),
|
||||
structured_output=True,
|
||||
)
|
||||
def order_dependencies(
|
||||
entities: list[str],
|
||||
before: list[Any],
|
||||
fixed_positions: dict[str, int] | None = None,
|
||||
) -> dict[str, Any]:
|
||||
return solve_dependency_order(
|
||||
entities=entities,
|
||||
before=before,
|
||||
fixed_positions=fixed_positions,
|
||||
)
|
||||
|
||||
|
||||
@mcp.tool(
|
||||
name="capacity_fit",
|
||||
description=(
|
||||
"Crucible template for resource capacity. Proves whether required items fit "
|
||||
"within a capacity budget and chooses an optimal feasible subset of optional items."
|
||||
),
|
||||
structured_output=True,
|
||||
)
|
||||
def capacity_fit(
|
||||
items: list[dict[str, Any]],
|
||||
capacity: int,
|
||||
maximize_value: bool = True,
|
||||
) -> dict[str, Any]:
|
||||
return solve_capacity_fit(items=items, capacity=capacity, maximize_value=maximize_value)
|
||||
|
||||
|
||||
def run_selftest() -> dict[str, Any]:
|
||||
return {
|
||||
"schedule_unsat_single_worker": solve_schedule_tasks(
|
||||
tasks=[
|
||||
{"name": "A", "duration": 2},
|
||||
{"name": "B", "duration": 3},
|
||||
{"name": "C", "duration": 4},
|
||||
],
|
||||
horizon=8,
|
||||
dependencies=[{"before": "A", "after": "B"}],
|
||||
max_parallel_tasks=1,
|
||||
),
|
||||
"schedule_sat_two_workers": solve_schedule_tasks(
|
||||
tasks=[
|
||||
{"name": "A", "duration": 2},
|
||||
{"name": "B", "duration": 3},
|
||||
{"name": "C", "duration": 4},
|
||||
],
|
||||
horizon=8,
|
||||
dependencies=[{"before": "A", "after": "B"}],
|
||||
max_parallel_tasks=2,
|
||||
),
|
||||
"ordering_sat": solve_dependency_order(
|
||||
entities=["fetch", "train", "eval"],
|
||||
before=[
|
||||
{"before": "fetch", "after": "train"},
|
||||
{"before": "train", "after": "eval"},
|
||||
],
|
||||
),
|
||||
"capacity_sat": solve_capacity_fit(
|
||||
items=[
|
||||
{"name": "gpu_job", "amount": 6, "value": 6, "required": True},
|
||||
{"name": "telemetry", "amount": 1, "value": 1, "required": True},
|
||||
{"name": "export", "amount": 2, "value": 4, "required": False},
|
||||
{"name": "viz", "amount": 3, "value": 5, "required": False},
|
||||
],
|
||||
capacity=8,
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if len(sys.argv) > 1 and sys.argv[1] == "selftest":
|
||||
print(json.dumps(run_selftest(), indent=2))
|
||||
return 0
|
||||
mcp.run(transport="stdio")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
32
bin/deploy-allegro-house.sh
Executable file
32
bin/deploy-allegro-house.sh
Executable file
@@ -0,0 +1,32 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
REPO_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
|
||||
TARGET="${1:-root@167.99.126.228}"
|
||||
HERMES_REPO_URL="${HERMES_REPO_URL:-https://github.com/NousResearch/hermes-agent.git}"
|
||||
KIMI_API_KEY="${KIMI_API_KEY:-}"
|
||||
|
||||
if [[ -z "$KIMI_API_KEY" && -f "$HOME/.config/kimi/api_key" ]]; then
|
||||
KIMI_API_KEY="$(tr -d '\n' < "$HOME/.config/kimi/api_key")"
|
||||
fi
|
||||
|
||||
if [[ -z "$KIMI_API_KEY" ]]; then
|
||||
echo "KIMI_API_KEY is required (env or ~/.config/kimi/api_key)" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
ssh "$TARGET" 'apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y git python3 python3-venv python3-pip curl ca-certificates'
|
||||
ssh "$TARGET" 'mkdir -p /root/wizards/allegro/home /root/wizards/allegro/hermes-agent'
|
||||
|
||||
ssh "$TARGET" "if [ ! -d /root/wizards/allegro/hermes-agent/.git ]; then git clone '$HERMES_REPO_URL' /root/wizards/allegro/hermes-agent; fi"
|
||||
ssh "$TARGET" 'cd /root/wizards/allegro/hermes-agent && python3 -m venv .venv && .venv/bin/pip install --upgrade pip setuptools wheel && .venv/bin/pip install -e .'
|
||||
|
||||
ssh "$TARGET" "cat > /root/wizards/allegro/home/config.yaml" < "$REPO_DIR/wizards/allegro/config.yaml"
|
||||
ssh "$TARGET" "cat > /root/wizards/allegro/home/SOUL.md" < "$REPO_DIR/SOUL.md"
|
||||
ssh "$TARGET" "cat > /root/wizards/allegro/home/.env <<'EOF'
|
||||
KIMI_API_KEY=$KIMI_API_KEY
|
||||
EOF"
|
||||
ssh "$TARGET" "cat > /etc/systemd/system/hermes-allegro.service" < "$REPO_DIR/wizards/allegro/hermes-allegro.service"
|
||||
|
||||
ssh "$TARGET" 'chmod 600 /root/wizards/allegro/home/.env && systemctl daemon-reload && systemctl enable --now hermes-allegro.service && systemctl restart hermes-allegro.service && systemctl is-active hermes-allegro.service && curl -fsS http://127.0.0.1:8645/health'
|
||||
@@ -9,6 +9,7 @@ Usage:
|
||||
|
||||
import json
|
||||
import os
|
||||
import sqlite3
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
@@ -16,6 +17,12 @@ import urllib.request
|
||||
from datetime import datetime, timezone, timedelta
|
||||
from pathlib import Path
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
if str(REPO_ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(REPO_ROOT))
|
||||
|
||||
from metrics_helpers import summarize_local_metrics, summarize_session_rows
|
||||
|
||||
HERMES_HOME = Path.home() / ".hermes"
|
||||
TIMMY_HOME = Path.home() / ".timmy"
|
||||
METRICS_DIR = TIMMY_HOME / "metrics"
|
||||
@@ -60,6 +67,30 @@ def get_hermes_sessions():
|
||||
return []
|
||||
|
||||
|
||||
def get_session_rows(hours=24):
|
||||
state_db = HERMES_HOME / "state.db"
|
||||
if not state_db.exists():
|
||||
return []
|
||||
cutoff = time.time() - (hours * 3600)
|
||||
try:
|
||||
conn = sqlite3.connect(str(state_db))
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT model, source, COUNT(*) as sessions,
|
||||
SUM(message_count) as msgs,
|
||||
SUM(tool_call_count) as tools
|
||||
FROM sessions
|
||||
WHERE started_at > ? AND model IS NOT NULL AND model != ''
|
||||
GROUP BY model, source
|
||||
""",
|
||||
(cutoff,),
|
||||
).fetchall()
|
||||
conn.close()
|
||||
return rows
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
|
||||
def get_heartbeat_ticks(date_str=None):
|
||||
if not date_str:
|
||||
date_str = datetime.now().strftime("%Y%m%d")
|
||||
@@ -130,6 +161,9 @@ def render(hours=24):
|
||||
ticks = get_heartbeat_ticks()
|
||||
metrics = get_local_metrics(hours)
|
||||
sessions = get_hermes_sessions()
|
||||
session_rows = get_session_rows(hours)
|
||||
local_summary = summarize_local_metrics(metrics)
|
||||
session_summary = summarize_session_rows(session_rows)
|
||||
|
||||
loaded_names = {m.get("name", "") for m in loaded}
|
||||
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||
@@ -159,28 +193,18 @@ def render(hours=24):
|
||||
print(f"\n {BOLD}LOCAL INFERENCE ({len(metrics)} calls, last {hours}h){RST}")
|
||||
print(f" {DIM}{'-' * 55}{RST}")
|
||||
if metrics:
|
||||
by_caller = {}
|
||||
for r in metrics:
|
||||
caller = r.get("caller", "unknown")
|
||||
if caller not in by_caller:
|
||||
by_caller[caller] = {"count": 0, "success": 0, "errors": 0}
|
||||
by_caller[caller]["count"] += 1
|
||||
if r.get("success"):
|
||||
by_caller[caller]["success"] += 1
|
||||
else:
|
||||
by_caller[caller]["errors"] += 1
|
||||
for caller, stats in by_caller.items():
|
||||
err = f" {RED}err:{stats['errors']}{RST}" if stats["errors"] else ""
|
||||
print(f" {caller:25s} calls:{stats['count']:4d} "
|
||||
f"{GREEN}ok:{stats['success']}{RST}{err}")
|
||||
print(f" Tokens: {local_summary['input_tokens']} in | {local_summary['output_tokens']} out | {local_summary['total_tokens']} total")
|
||||
if local_summary.get('avg_latency_s') is not None:
|
||||
print(f" Avg latency: {local_summary['avg_latency_s']:.2f}s")
|
||||
if local_summary.get('avg_tokens_per_second') is not None:
|
||||
print(f" Avg throughput: {GREEN}{local_summary['avg_tokens_per_second']:.2f} tok/s{RST}")
|
||||
for caller, stats in sorted(local_summary['by_caller'].items()):
|
||||
err = f" {RED}err:{stats['failed_calls']}{RST}" if stats['failed_calls'] else ""
|
||||
print(f" {caller:25s} calls:{stats['calls']:4d} tokens:{stats['total_tokens']:5d} {GREEN}ok:{stats['successful_calls']}{RST}{err}")
|
||||
|
||||
by_model = {}
|
||||
for r in metrics:
|
||||
model = r.get("model", "unknown")
|
||||
by_model[model] = by_model.get(model, 0) + 1
|
||||
print(f"\n {DIM}Models used:{RST}")
|
||||
for model, count in sorted(by_model.items(), key=lambda x: -x[1]):
|
||||
print(f" {model:30s} {count} calls")
|
||||
for model, stats in sorted(local_summary['by_model'].items(), key=lambda x: -x[1]['calls']):
|
||||
print(f" {model:30s} {stats['calls']} calls {stats['total_tokens']} tok")
|
||||
else:
|
||||
print(f" {DIM}(no local calls recorded yet){RST}")
|
||||
|
||||
@@ -211,15 +235,18 @@ def render(hours=24):
|
||||
else:
|
||||
print(f" {DIM}(no ticks today){RST}")
|
||||
|
||||
# ── HERMES SESSIONS ──
|
||||
local_sessions = [s for s in sessions
|
||||
if "localhost:11434" in str(s.get("base_url", ""))]
|
||||
# ── HERMES SESSIONS / SOVEREIGNTY LOAD ──
|
||||
local_sessions = [s for s in sessions if "localhost:11434" in str(s.get("base_url", ""))]
|
||||
cloud_sessions = [s for s in sessions if s not in local_sessions]
|
||||
print(f"\n {BOLD}HERMES SESSIONS{RST}")
|
||||
print(f"\n {BOLD}HERMES SESSIONS / SOVEREIGNTY LOAD{RST}")
|
||||
print(f" {DIM}{'-' * 55}{RST}")
|
||||
print(f" Total: {len(sessions)} | "
|
||||
f"{GREEN}Local: {len(local_sessions)}{RST} | "
|
||||
f"{YELLOW}Cloud: {len(cloud_sessions)}{RST}")
|
||||
print(f" Session cache: {len(sessions)} total | {GREEN}{len(local_sessions)} local{RST} | {YELLOW}{len(cloud_sessions)} cloud{RST}")
|
||||
if session_rows:
|
||||
print(f" Session DB: {session_summary['total_sessions']} total | {GREEN}{session_summary['local_sessions']} local{RST} | {YELLOW}{session_summary['cloud_sessions']} cloud{RST}")
|
||||
print(f" Token est: {GREEN}{session_summary['local_est_tokens']} local{RST} | {YELLOW}{session_summary['cloud_est_tokens']} cloud{RST}")
|
||||
print(f" Est cloud cost: ${session_summary['cloud_est_cost_usd']:.4f}")
|
||||
else:
|
||||
print(f" {DIM}(no session-db stats available){RST}")
|
||||
|
||||
# ── ACTIVE LOOPS ──
|
||||
print(f"\n {BOLD}ACTIVE LOOPS{RST}")
|
||||
|
||||
20
config.yaml
20
config.yaml
@@ -1,8 +1,8 @@
|
||||
model:
|
||||
default: gpt-5.4
|
||||
provider: openai-codex
|
||||
default: hermes4:14b
|
||||
provider: custom
|
||||
context_length: 65536
|
||||
base_url: https://chatgpt.com/backend-api/codex
|
||||
base_url: http://localhost:8081/v1
|
||||
toolsets:
|
||||
- all
|
||||
agent:
|
||||
@@ -188,7 +188,7 @@ custom_providers:
|
||||
- name: Local llama.cpp
|
||||
base_url: http://localhost:8081/v1
|
||||
api_key: none
|
||||
model: auto
|
||||
model: hermes4:14b
|
||||
- name: Google Gemini
|
||||
base_url: https://generativelanguage.googleapis.com/v1beta/openai
|
||||
api_key_env: GEMINI_API_KEY
|
||||
@@ -196,8 +196,10 @@ custom_providers:
|
||||
system_prompt_suffix: "You are Timmy. Your soul is defined in SOUL.md \u2014 read\
|
||||
\ it, live it.\nYou run locally on your owner's machine via llama.cpp. You never\
|
||||
\ phone home.\nYou speak plainly. You prefer short sentences. Brevity is a kindness.\n\
|
||||
When you don't know something, say so. Refusal over fabrication.\nSovereignty and\
|
||||
\ service always.\n"
|
||||
When you don't know something, say so. Refusal over fabrication.\nFor scheduling,\
|
||||
\ dependency ordering, resource constraints, and consistency checks, prefer the\
|
||||
\ Crucible tools and report SAT/UNSAT plus witness model when available.\nSovereignty\
|
||||
\ and service always.\n"
|
||||
skills:
|
||||
creation_nudge_interval: 15
|
||||
DISCORD_HOME_CHANNEL: '1476292315814297772'
|
||||
@@ -212,6 +214,12 @@ mcp_servers:
|
||||
- /Users/apayne/.timmy/morrowind/mcp_server.py
|
||||
env: {}
|
||||
timeout: 30
|
||||
crucible:
|
||||
command: "/Users/apayne/.hermes/hermes-agent/venv/bin/python3"
|
||||
args: ["/Users/apayne/.hermes/bin/crucible_mcp_server.py"]
|
||||
env: {}
|
||||
timeout: 120
|
||||
connect_timeout: 60
|
||||
fallback_model:
|
||||
provider: custom
|
||||
model: gemini-2.5-pro
|
||||
|
||||
62
deploy.sh
62
deploy.sh
@@ -3,13 +3,30 @@
|
||||
# This is the canonical way to deploy Timmy's configuration.
|
||||
# Hermes-agent is the engine. timmy-config is the driver's seat.
|
||||
#
|
||||
# Usage: ./deploy.sh
|
||||
# Usage: ./deploy.sh [--restart-loops] [--restart-gateway] [--restart-all]
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
HERMES_HOME="$HOME/.hermes"
|
||||
TIMMY_HOME="$HOME/.timmy"
|
||||
RESTART_LOOPS=false
|
||||
RESTART_GATEWAY=false
|
||||
|
||||
for arg in "$@"; do
|
||||
case "$arg" in
|
||||
--restart-loops) RESTART_LOOPS=true ;;
|
||||
--restart-gateway) RESTART_GATEWAY=true ;;
|
||||
--restart-all)
|
||||
RESTART_LOOPS=true
|
||||
RESTART_GATEWAY=true
|
||||
;;
|
||||
*)
|
||||
echo "Unknown argument: $arg" >&2
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
log() { echo "[deploy] $*"; }
|
||||
|
||||
@@ -74,10 +91,45 @@ done
|
||||
chmod +x "$HERMES_HOME/bin/"*.sh "$HERMES_HOME/bin/"*.py 2>/dev/null || true
|
||||
log "bin/ -> $HERMES_HOME/bin/"
|
||||
|
||||
if [ "${1:-}" != "" ]; then
|
||||
echo "ERROR: deploy.sh no longer accepts legacy loop flags." >&2
|
||||
echo "Deploy the sidecar only. Do not relaunch deprecated bash loops." >&2
|
||||
exit 1
|
||||
# === Ensure Crucible dependency is installed ===
|
||||
HERMES_PY="$HERMES_HOME/hermes-agent/venv/bin/python"
|
||||
if [ -x "$HERMES_PY" ]; then
|
||||
if "$HERMES_PY" -c 'import z3' >/dev/null 2>&1; then
|
||||
log "z3-solver already present in Hermes venv"
|
||||
else
|
||||
log "Installing z3-solver into Hermes venv..."
|
||||
"$HERMES_PY" -m pip install z3-solver
|
||||
fi
|
||||
fi
|
||||
|
||||
# === Restart loops if requested ===
|
||||
if [ "$RESTART_LOOPS" = true ]; then
|
||||
log "Killing existing loops..."
|
||||
pkill -f 'claude-loop.sh' 2>/dev/null || true
|
||||
pkill -f 'gemini-loop.sh' 2>/dev/null || true
|
||||
pkill -f 'timmy-orchestrator.sh' 2>/dev/null || true
|
||||
sleep 2
|
||||
|
||||
log "Clearing stale locks..."
|
||||
rm -rf "$HERMES_HOME/logs/claude-locks/"* 2>/dev/null || true
|
||||
rm -rf "$HERMES_HOME/logs/gemini-locks/"* 2>/dev/null || true
|
||||
|
||||
log "Relaunching loops..."
|
||||
nohup bash "$HERMES_HOME/bin/timmy-orchestrator.sh" >> "$HERMES_HOME/logs/timmy-orchestrator.log" 2>&1 &
|
||||
nohup bash "$HERMES_HOME/bin/claude-loop.sh" 2 >> "$HERMES_HOME/logs/claude-loop.log" 2>&1 &
|
||||
nohup bash "$HERMES_HOME/bin/gemini-loop.sh" 1 >> "$HERMES_HOME/logs/gemini-loop.log" 2>&1 &
|
||||
sleep 1
|
||||
log "Loops relaunched."
|
||||
fi
|
||||
|
||||
# === Restart gateway if requested (required for new MCP servers/tools) ===
|
||||
if [ "$RESTART_GATEWAY" = true ]; then
|
||||
log "Restarting Hermes gateway..."
|
||||
pkill -f 'hermes_cli.main gateway run' 2>/dev/null || true
|
||||
sleep 2
|
||||
nohup "$HERMES_PY" -m hermes_cli.main gateway run --replace >> "$HERMES_HOME/logs/gateway.log" 2>&1 &
|
||||
sleep 2
|
||||
log "Gateway restarted."
|
||||
fi
|
||||
|
||||
log "Deploy complete. timmy-config applied to $HERMES_HOME/"
|
||||
|
||||
44
docs/allegro-wizard-house.md
Normal file
44
docs/allegro-wizard-house.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# Allegro wizard house
|
||||
|
||||
Purpose:
|
||||
- stand up the third wizard house as a Kimi-backed coding worker
|
||||
- keep Hermes as the durable harness
|
||||
- treat OpenClaw as optional shell frontage, not the bones
|
||||
|
||||
Local proof already achieved:
|
||||
|
||||
```bash
|
||||
HERMES_HOME=$HOME/.timmy/wizards/allegro/home \
|
||||
hermes doctor
|
||||
|
||||
HERMES_HOME=$HOME/.timmy/wizards/allegro/home \
|
||||
hermes chat -Q --provider kimi-coding -m kimi-for-coding \
|
||||
-q "Reply with exactly: ALLEGRO KIMI ONLINE"
|
||||
```
|
||||
|
||||
Observed proof:
|
||||
- Kimi / Moonshot API check passed in `hermes doctor`
|
||||
- chat returned exactly `ALLEGRO KIMI ONLINE`
|
||||
|
||||
Repo assets:
|
||||
- `wizards/allegro/config.yaml`
|
||||
- `wizards/allegro/hermes-allegro.service`
|
||||
- `bin/deploy-allegro-house.sh`
|
||||
|
||||
Remote target:
|
||||
- host: `167.99.126.228`
|
||||
- house root: `/root/wizards/allegro`
|
||||
- `HERMES_HOME`: `/root/wizards/allegro/home`
|
||||
- api health: `http://127.0.0.1:8645/health`
|
||||
|
||||
Deploy command:
|
||||
|
||||
```bash
|
||||
cd ~/.timmy/timmy-config
|
||||
bin/deploy-allegro-house.sh root@167.99.126.228
|
||||
```
|
||||
|
||||
Important nuance:
|
||||
- the Hermes/Kimi lane is the proven path
|
||||
- direct embedded OpenClaw Kimi model routing was not yet reliable locally
|
||||
- so the remote deployment keeps the minimal, proven architecture: Hermes house first
|
||||
82
docs/crucible-first-cut.md
Normal file
82
docs/crucible-first-cut.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# Crucible First Cut
|
||||
|
||||
This is the first narrow neuro-symbolic slice for Timmy.
|
||||
|
||||
## Goal
|
||||
|
||||
Prove constraint logic instead of bluffing through it.
|
||||
|
||||
## Shape
|
||||
|
||||
The Crucible is a sidecar MCP server that lives in `timmy-config` and deploys into `~/.hermes/bin/`.
|
||||
It is loaded by Hermes through native MCP discovery. No Hermes fork.
|
||||
|
||||
## Templates shipped in v0
|
||||
|
||||
### 1. schedule_tasks
|
||||
Use for:
|
||||
- deadline feasibility
|
||||
- task ordering with dependencies
|
||||
- small integer scheduling windows
|
||||
|
||||
Inputs:
|
||||
- `tasks`: `[{name, duration}]`
|
||||
- `horizon`: integer window size
|
||||
- `dependencies`: `[{before, after, lag?}]`
|
||||
- `max_parallel_tasks`: integer worker count
|
||||
|
||||
Outputs:
|
||||
- `status: sat|unsat|unknown`
|
||||
- witness schedule when SAT
|
||||
- proof log path
|
||||
|
||||
### 2. order_dependencies
|
||||
Use for:
|
||||
- topological ordering
|
||||
- cycle detection
|
||||
- dependency consistency checks
|
||||
|
||||
Inputs:
|
||||
- `entities`
|
||||
- `before`
|
||||
- optional `fixed_positions`
|
||||
|
||||
Outputs:
|
||||
- valid ordering when SAT
|
||||
- contradiction when UNSAT
|
||||
- proof log path
|
||||
|
||||
### 3. capacity_fit
|
||||
Use for:
|
||||
- resource budgeting
|
||||
- optional-vs-required work selection
|
||||
- capacity feasibility
|
||||
|
||||
Inputs:
|
||||
- `items: [{name, amount, value?, required?}]`
|
||||
- `capacity`
|
||||
|
||||
Outputs:
|
||||
- chosen feasible subset when SAT
|
||||
- contradiction when required load exceeds capacity
|
||||
- proof log path
|
||||
|
||||
## Demo
|
||||
|
||||
Run locally:
|
||||
|
||||
```bash
|
||||
~/.hermes/hermes-agent/venv/bin/python ~/.hermes/bin/crucible_mcp_server.py selftest
|
||||
```
|
||||
|
||||
This produces:
|
||||
- one UNSAT schedule proof
|
||||
- one SAT schedule proof
|
||||
- one SAT dependency ordering proof
|
||||
- one SAT capacity proof
|
||||
|
||||
## Scope guardrails
|
||||
|
||||
Do not force every answer through the Crucible.
|
||||
Use it when the task is genuinely constraint-shaped.
|
||||
If the problem does not fit one of the templates, say so plainly.
|
||||
139
metrics_helpers.py
Normal file
139
metrics_helpers.py
Normal file
@@ -0,0 +1,139 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
from datetime import datetime, timezone
|
||||
|
||||
COST_TABLE = {
|
||||
"claude-opus-4-6": {"input": 15.0, "output": 75.0},
|
||||
"claude-sonnet-4-6": {"input": 3.0, "output": 15.0},
|
||||
"claude-sonnet-4-20250514": {"input": 3.0, "output": 15.0},
|
||||
"claude-haiku-4-20250414": {"input": 0.25, "output": 1.25},
|
||||
"hermes4:14b": {"input": 0.0, "output": 0.0},
|
||||
"hermes3:8b": {"input": 0.0, "output": 0.0},
|
||||
"hermes3:latest": {"input": 0.0, "output": 0.0},
|
||||
"qwen3:30b": {"input": 0.0, "output": 0.0},
|
||||
}
|
||||
|
||||
|
||||
def estimate_tokens_from_chars(char_count: int) -> int:
|
||||
if char_count <= 0:
|
||||
return 0
|
||||
return math.ceil(char_count / 4)
|
||||
|
||||
|
||||
|
||||
def build_local_metric_record(
|
||||
*,
|
||||
prompt: str,
|
||||
response: str,
|
||||
model: str,
|
||||
caller: str,
|
||||
session_id: str | None,
|
||||
latency_s: float,
|
||||
success: bool,
|
||||
error: str | None = None,
|
||||
) -> dict:
|
||||
input_tokens = estimate_tokens_from_chars(len(prompt))
|
||||
output_tokens = estimate_tokens_from_chars(len(response))
|
||||
total_tokens = input_tokens + output_tokens
|
||||
tokens_per_second = round(total_tokens / latency_s, 2) if latency_s > 0 else None
|
||||
return {
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"model": model,
|
||||
"caller": caller,
|
||||
"prompt_len": len(prompt),
|
||||
"response_len": len(response),
|
||||
"session_id": session_id,
|
||||
"latency_s": round(latency_s, 3),
|
||||
"est_input_tokens": input_tokens,
|
||||
"est_output_tokens": output_tokens,
|
||||
"tokens_per_second": tokens_per_second,
|
||||
"success": success,
|
||||
"error": error,
|
||||
}
|
||||
|
||||
|
||||
|
||||
def summarize_local_metrics(records: list[dict]) -> dict:
|
||||
total_calls = len(records)
|
||||
successful_calls = sum(1 for record in records if record.get("success"))
|
||||
failed_calls = total_calls - successful_calls
|
||||
input_tokens = sum(int(record.get("est_input_tokens", 0) or 0) for record in records)
|
||||
output_tokens = sum(int(record.get("est_output_tokens", 0) or 0) for record in records)
|
||||
total_tokens = input_tokens + output_tokens
|
||||
latencies = [float(record.get("latency_s", 0) or 0) for record in records if record.get("latency_s") is not None]
|
||||
throughputs = [
|
||||
float(record.get("tokens_per_second", 0) or 0)
|
||||
for record in records
|
||||
if record.get("tokens_per_second")
|
||||
]
|
||||
|
||||
by_caller: dict[str, dict] = {}
|
||||
by_model: dict[str, dict] = {}
|
||||
for record in records:
|
||||
caller = record.get("caller", "unknown")
|
||||
model = record.get("model", "unknown")
|
||||
bucket_tokens = int(record.get("est_input_tokens", 0) or 0) + int(record.get("est_output_tokens", 0) or 0)
|
||||
for key, table in ((caller, by_caller), (model, by_model)):
|
||||
if key not in table:
|
||||
table[key] = {"calls": 0, "successful_calls": 0, "failed_calls": 0, "total_tokens": 0}
|
||||
table[key]["calls"] += 1
|
||||
table[key]["total_tokens"] += bucket_tokens
|
||||
if record.get("success"):
|
||||
table[key]["successful_calls"] += 1
|
||||
else:
|
||||
table[key]["failed_calls"] += 1
|
||||
|
||||
return {
|
||||
"total_calls": total_calls,
|
||||
"successful_calls": successful_calls,
|
||||
"failed_calls": failed_calls,
|
||||
"input_tokens": input_tokens,
|
||||
"output_tokens": output_tokens,
|
||||
"total_tokens": total_tokens,
|
||||
"avg_latency_s": round(sum(latencies) / len(latencies), 2) if latencies else None,
|
||||
"avg_tokens_per_second": round(sum(throughputs) / len(throughputs), 2) if throughputs else None,
|
||||
"by_caller": by_caller,
|
||||
"by_model": by_model,
|
||||
}
|
||||
|
||||
|
||||
|
||||
def is_local_model(model: str | None) -> bool:
|
||||
if not model:
|
||||
return False
|
||||
costs = COST_TABLE.get(model, {})
|
||||
if costs.get("input", 1) == 0 and costs.get("output", 1) == 0:
|
||||
return True
|
||||
return ":" in model and "/" not in model and "claude" not in model
|
||||
|
||||
|
||||
|
||||
def summarize_session_rows(rows: list[tuple]) -> dict:
|
||||
total_sessions = 0
|
||||
local_sessions = 0
|
||||
cloud_sessions = 0
|
||||
local_est_tokens = 0
|
||||
cloud_est_tokens = 0
|
||||
cloud_est_cost_usd = 0.0
|
||||
for model, source, sessions, messages, tool_calls in rows:
|
||||
sessions = int(sessions or 0)
|
||||
messages = int(messages or 0)
|
||||
est_tokens = messages * 500
|
||||
total_sessions += sessions
|
||||
if is_local_model(model):
|
||||
local_sessions += sessions
|
||||
local_est_tokens += est_tokens
|
||||
else:
|
||||
cloud_sessions += sessions
|
||||
cloud_est_tokens += est_tokens
|
||||
pricing = COST_TABLE.get(model, {"input": 5.0, "output": 15.0})
|
||||
cloud_est_cost_usd += (est_tokens / 1_000_000) * ((pricing["input"] + pricing["output"]) / 2)
|
||||
return {
|
||||
"total_sessions": total_sessions,
|
||||
"local_sessions": local_sessions,
|
||||
"cloud_sessions": cloud_sessions,
|
||||
"local_est_tokens": local_est_tokens,
|
||||
"cloud_est_tokens": cloud_est_tokens,
|
||||
"cloud_est_cost_usd": round(cloud_est_cost_usd, 4),
|
||||
}
|
||||
47
playbooks/verified-logic.yaml
Normal file
47
playbooks/verified-logic.yaml
Normal file
@@ -0,0 +1,47 @@
|
||||
name: verified-logic
|
||||
description: >
|
||||
Crucible-first playbook for tasks that require proof instead of plausible prose.
|
||||
Use Z3-backed sidecar tools for scheduling, dependency ordering, capacity checks,
|
||||
and consistency verification.
|
||||
|
||||
model:
|
||||
preferred: claude-opus-4-6
|
||||
fallback: claude-sonnet-4-20250514
|
||||
max_turns: 12
|
||||
temperature: 0.1
|
||||
|
||||
tools:
|
||||
- mcp_crucible_schedule_tasks
|
||||
- mcp_crucible_order_dependencies
|
||||
- mcp_crucible_capacity_fit
|
||||
|
||||
trigger:
|
||||
manual: true
|
||||
|
||||
steps:
|
||||
- classify_problem
|
||||
- choose_template
|
||||
- translate_into_constraints
|
||||
- verify_with_crucible
|
||||
- report_sat_unsat_with_witness
|
||||
|
||||
output: verified_result
|
||||
timeout_minutes: 5
|
||||
|
||||
system_prompt: |
|
||||
You are running the Crucible playbook.
|
||||
|
||||
Use this playbook for:
|
||||
- scheduling and deadline feasibility
|
||||
- dependency ordering and cycle checks
|
||||
- capacity / resource allocation constraints
|
||||
- consistency checks where a contradiction matters
|
||||
|
||||
RULES:
|
||||
1. Do not bluff through logic.
|
||||
2. Pick the narrowest Crucible template that fits the task.
|
||||
3. Translate the user's question into structured constraints.
|
||||
4. Call the Crucible tool.
|
||||
5. If SAT, report the witness model clearly.
|
||||
6. If UNSAT, say the constraints are impossible and explain which shape of constraint caused the contradiction.
|
||||
7. If the task is not a good fit for these templates, say so plainly instead of pretending it was verified.
|
||||
38
tasks.py
38
tasks.py
@@ -5,12 +5,14 @@ import glob
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from orchestration import huey
|
||||
from huey import crontab
|
||||
from gitea_client import GiteaClient
|
||||
from metrics_helpers import build_local_metric_record
|
||||
|
||||
HERMES_HOME = Path.home() / ".hermes"
|
||||
TIMMY_HOME = Path.home() / ".timmy"
|
||||
@@ -57,6 +59,7 @@ def run_hermes_local(
|
||||
_model = model or HEARTBEAT_MODEL
|
||||
tagged = f"[{caller_tag}] {prompt}" if caller_tag else prompt
|
||||
|
||||
started = time.time()
|
||||
try:
|
||||
runner = """
|
||||
import io
|
||||
@@ -167,15 +170,15 @@ sys.exit(exit_code)
|
||||
# Log to metrics jsonl
|
||||
METRICS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
metrics_file = METRICS_DIR / f"local_{datetime.now().strftime('%Y%m%d')}.jsonl"
|
||||
record = {
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"model": _model,
|
||||
"caller": caller_tag or "unknown",
|
||||
"prompt_len": len(prompt),
|
||||
"response_len": len(response),
|
||||
"session_id": session_id,
|
||||
"success": bool(response),
|
||||
}
|
||||
record = build_local_metric_record(
|
||||
prompt=prompt,
|
||||
response=response,
|
||||
model=_model,
|
||||
caller=caller_tag or "unknown",
|
||||
session_id=session_id,
|
||||
latency_s=time.time() - started,
|
||||
success=bool(response),
|
||||
)
|
||||
with open(metrics_file, "a") as f:
|
||||
f.write(json.dumps(record) + "\n")
|
||||
|
||||
@@ -190,13 +193,16 @@ sys.exit(exit_code)
|
||||
# Log failure
|
||||
METRICS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
metrics_file = METRICS_DIR / f"local_{datetime.now().strftime('%Y%m%d')}.jsonl"
|
||||
record = {
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"model": _model,
|
||||
"caller": caller_tag or "unknown",
|
||||
"error": str(e),
|
||||
"success": False,
|
||||
}
|
||||
record = build_local_metric_record(
|
||||
prompt=prompt,
|
||||
response="",
|
||||
model=_model,
|
||||
caller=caller_tag or "unknown",
|
||||
session_id=None,
|
||||
latency_s=time.time() - started,
|
||||
success=False,
|
||||
error=str(e),
|
||||
)
|
||||
with open(metrics_file, "a") as f:
|
||||
f.write(json.dumps(record) + "\n")
|
||||
return None
|
||||
|
||||
27
tests/test_allegro_wizard_assets.py
Normal file
27
tests/test_allegro_wizard_assets.py
Normal file
@@ -0,0 +1,27 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
def test_allegro_config_targets_kimi_house() -> None:
|
||||
config = yaml.safe_load(Path("wizards/allegro/config.yaml").read_text())
|
||||
|
||||
assert config["model"]["provider"] == "kimi-coding"
|
||||
assert config["model"]["default"] == "kimi-for-coding"
|
||||
assert config["platforms"]["api_server"]["extra"]["port"] == 8645
|
||||
|
||||
|
||||
def test_allegro_service_uses_isolated_home() -> None:
|
||||
text = Path("wizards/allegro/hermes-allegro.service").read_text()
|
||||
|
||||
assert "HERMES_HOME=/root/wizards/allegro/home" in text
|
||||
assert "hermes gateway run --replace" in text
|
||||
|
||||
|
||||
def test_deploy_script_requires_external_secret() -> None:
|
||||
text = Path("bin/deploy-allegro-house.sh").read_text()
|
||||
|
||||
assert "~/.config/kimi/api_key" in text
|
||||
assert "sk-kimi-" not in text
|
||||
93
tests/test_metrics_helpers.py
Normal file
93
tests/test_metrics_helpers.py
Normal file
@@ -0,0 +1,93 @@
|
||||
from metrics_helpers import (
|
||||
build_local_metric_record,
|
||||
estimate_tokens_from_chars,
|
||||
summarize_local_metrics,
|
||||
summarize_session_rows,
|
||||
)
|
||||
|
||||
|
||||
def test_estimate_tokens_from_chars_uses_simple_local_heuristic() -> None:
|
||||
assert estimate_tokens_from_chars(0) == 0
|
||||
assert estimate_tokens_from_chars(1) == 1
|
||||
assert estimate_tokens_from_chars(4) == 1
|
||||
assert estimate_tokens_from_chars(5) == 2
|
||||
assert estimate_tokens_from_chars(401) == 101
|
||||
|
||||
|
||||
def test_build_local_metric_record_adds_token_and_throughput_estimates() -> None:
|
||||
record = build_local_metric_record(
|
||||
prompt="abcd" * 10,
|
||||
response="xyz" * 20,
|
||||
model="hermes4:14b",
|
||||
caller="heartbeat_tick",
|
||||
session_id="session-123",
|
||||
latency_s=2.0,
|
||||
success=True,
|
||||
)
|
||||
|
||||
assert record["model"] == "hermes4:14b"
|
||||
assert record["caller"] == "heartbeat_tick"
|
||||
assert record["session_id"] == "session-123"
|
||||
assert record["est_input_tokens"] == 10
|
||||
assert record["est_output_tokens"] == 15
|
||||
assert record["tokens_per_second"] == 12.5
|
||||
|
||||
|
||||
def test_summarize_local_metrics_rolls_up_tokens_and_latency() -> None:
|
||||
records = [
|
||||
{
|
||||
"caller": "heartbeat_tick",
|
||||
"model": "hermes4:14b",
|
||||
"success": True,
|
||||
"est_input_tokens": 100,
|
||||
"est_output_tokens": 40,
|
||||
"latency_s": 2.0,
|
||||
"tokens_per_second": 20.0,
|
||||
},
|
||||
{
|
||||
"caller": "heartbeat_tick",
|
||||
"model": "hermes4:14b",
|
||||
"success": False,
|
||||
"est_input_tokens": 30,
|
||||
"est_output_tokens": 0,
|
||||
"latency_s": 1.0,
|
||||
},
|
||||
{
|
||||
"caller": "session_export",
|
||||
"model": "hermes3:8b",
|
||||
"success": True,
|
||||
"est_input_tokens": 50,
|
||||
"est_output_tokens": 25,
|
||||
"latency_s": 5.0,
|
||||
"tokens_per_second": 5.0,
|
||||
},
|
||||
]
|
||||
|
||||
summary = summarize_local_metrics(records)
|
||||
|
||||
assert summary["total_calls"] == 3
|
||||
assert summary["successful_calls"] == 2
|
||||
assert summary["failed_calls"] == 1
|
||||
assert summary["input_tokens"] == 180
|
||||
assert summary["output_tokens"] == 65
|
||||
assert summary["total_tokens"] == 245
|
||||
assert summary["avg_latency_s"] == 2.67
|
||||
assert summary["avg_tokens_per_second"] == 12.5
|
||||
assert summary["by_caller"]["heartbeat_tick"]["total_tokens"] == 170
|
||||
assert summary["by_model"]["hermes4:14b"]["failed_calls"] == 1
|
||||
|
||||
|
||||
def test_summarize_session_rows_separates_local_and_cloud_estimates() -> None:
|
||||
rows = [
|
||||
("hermes4:14b", "local", 2, 10, 4),
|
||||
("claude-sonnet-4-6", "cli", 3, 9, 2),
|
||||
]
|
||||
|
||||
summary = summarize_session_rows(rows)
|
||||
|
||||
assert summary["total_sessions"] == 5
|
||||
assert summary["local_sessions"] == 2
|
||||
assert summary["cloud_sessions"] == 3
|
||||
assert summary["local_est_tokens"] == 5000
|
||||
assert summary["cloud_est_tokens"] == 4500
|
||||
assert summary["cloud_est_cost_usd"] > 0
|
||||
17
tests/test_proof_policy_docs.py
Normal file
17
tests/test_proof_policy_docs.py
Normal file
@@ -0,0 +1,17 @@
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def test_contributing_sets_hard_proof_rule() -> None:
|
||||
doc = Path("CONTRIBUTING.md").read_text()
|
||||
|
||||
assert "visual changes require screenshot proof" in doc
|
||||
assert "do not commit screenshots or binary media to Gitea backup" in doc
|
||||
assert "CLI/verifiable changes must cite the exact command output, log path, or world-state proof" in doc
|
||||
assert "no proof, no merge" in doc
|
||||
|
||||
|
||||
def test_readme_points_to_proof_standard() -> None:
|
||||
readme = Path("README.md").read_text()
|
||||
|
||||
assert "Proof Standard" in readme
|
||||
assert "CONTRIBUTING.md" in readme
|
||||
16
wizards/allegro/README.md
Normal file
16
wizards/allegro/README.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# Allegro wizard house
|
||||
|
||||
Allegro is the third wizard house.
|
||||
|
||||
Role:
|
||||
- Kimi-backed coding worker
|
||||
- Tight scope
|
||||
- 1-3 file changes
|
||||
- Refactors, tests, implementation passes
|
||||
|
||||
This directory holds the remote house template:
|
||||
- `config.yaml` — Hermes house config
|
||||
- `hermes-allegro.service` — systemd unit
|
||||
|
||||
Secrets do not live here.
|
||||
`KIMI_API_KEY` must be injected at deploy time into `/root/wizards/allegro/home/.env`.
|
||||
61
wizards/allegro/config.yaml
Normal file
61
wizards/allegro/config.yaml
Normal file
@@ -0,0 +1,61 @@
|
||||
model:
|
||||
default: kimi-for-coding
|
||||
provider: kimi-coding
|
||||
toolsets:
|
||||
- all
|
||||
agent:
|
||||
max_turns: 30
|
||||
reasoning_effort: xhigh
|
||||
verbose: false
|
||||
terminal:
|
||||
backend: local
|
||||
cwd: .
|
||||
timeout: 180
|
||||
persistent_shell: true
|
||||
browser:
|
||||
inactivity_timeout: 120
|
||||
command_timeout: 30
|
||||
record_sessions: false
|
||||
display:
|
||||
compact: false
|
||||
personality: ''
|
||||
resume_display: full
|
||||
busy_input_mode: interrupt
|
||||
bell_on_complete: false
|
||||
show_reasoning: false
|
||||
streaming: false
|
||||
show_cost: false
|
||||
tool_progress: all
|
||||
memory:
|
||||
memory_enabled: true
|
||||
user_profile_enabled: true
|
||||
memory_char_limit: 2200
|
||||
user_char_limit: 1375
|
||||
nudge_interval: 10
|
||||
flush_min_turns: 6
|
||||
approvals:
|
||||
mode: manual
|
||||
security:
|
||||
redact_secrets: true
|
||||
tirith_enabled: false
|
||||
platforms:
|
||||
api_server:
|
||||
enabled: true
|
||||
extra:
|
||||
host: 127.0.0.1
|
||||
port: 8645
|
||||
session_reset:
|
||||
mode: none
|
||||
idle_minutes: 0
|
||||
skills:
|
||||
creation_nudge_interval: 15
|
||||
system_prompt_suffix: |
|
||||
You are Allegro, the Kimi-backed third wizard house.
|
||||
Your soul is defined in SOUL.md — read it, live it.
|
||||
Hermes is your harness.
|
||||
Kimi Code is your primary provider.
|
||||
You speak plainly. You prefer short sentences. Brevity is a kindness.
|
||||
|
||||
Work best on tight coding tasks: 1-3 file changes, refactors, tests, and implementation passes.
|
||||
Refusal over fabrication. If you do not know, say so.
|
||||
Sovereignty and service always.
|
||||
16
wizards/allegro/hermes-allegro.service
Normal file
16
wizards/allegro/hermes-allegro.service
Normal file
@@ -0,0 +1,16 @@
|
||||
[Unit]
|
||||
Description=Hermes Allegro Wizard House
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
WorkingDirectory=/root/wizards/allegro/hermes-agent
|
||||
Environment=HERMES_HOME=/root/wizards/allegro/home
|
||||
EnvironmentFile=/root/wizards/allegro/home/.env
|
||||
ExecStart=/root/wizards/allegro/hermes-agent/.venv/bin/hermes gateway run --replace
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
Reference in New Issue
Block a user