Files
ezra-environment/protected/skills-backup/devops/local-timmy-overnight-loop/SKILL.md
2026-04-03 22:42:06 +00:00

6.0 KiB

name, description, version, author, license, metadata
name description version author license metadata
local-timmy-overnight-loop Deploy an unattended overnight loop that runs grounded tasks against local llama-server via Hermes, logging every result with timing. Produces rich capability data for morning review. 1.0.0 Ezra MIT
hermes
tags related_skills
local-model
llama.cpp
overnight
data-generation
sovereignty
timmy
local-llama-tool-calling-debug
wizard-house-remote-triage

Local Timmy Overnight Loop

When to Use

  • Local Timmy needs to generate capability data overnight
  • You want to measure tool-call success rates, response times, and failure modes
  • The model is too slow for interactive use but can produce useful data unattended
  • Issue #93 (proof test) needs empirical evidence from many runs

Prerequisites

  • llama-server running with --jinja flag (required for tool calls)
  • Hermes agent installed at ~/.hermes/hermes-agent/
  • Timmy workspace at ~/.timmy/
  • Model path known (e.g., /Users/apayne/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf)

Key Design Decisions

Strip the system prompt

The default Timmy system prompt is ~12K tokens (SOUL.md + skills list + memory). On a 14B Q4 model, this causes multi-minute prompt processing. The overnight loop uses a minimal prompt (~100 tokens):

You are Timmy. You run locally on llama.cpp.
You MUST use the tools provided. Do not narrate tool calls as text.
When asked to read a file, call the read_file tool.
When asked to write a file, call the write_file tool.
When asked to search, call the search_files tool.
Be brief. Do the task. Report what you found.

Skip context files and memory

Pass skip_context_files=True and skip_memory=True to AIAgent to prevent injecting AGENTS.md, project context, skills, and memory into the prompt.

Restrict toolsets

Each task specifies only the toolsets it needs (usually just file). Fewer tool schemas = less context = faster processing.

Reduce context length and use single slot

Start llama-server with -c 8192 -np 1 instead of -c 65536. The -np 1 is critical — without it, llama-server defaults to 4 parallel slots, splitting 8192 into 2048 per slot. That's not enough for tool schemas + prompt, and the server silently hangs with n_decoded: 0. Single slot gives the full context to the loop's requests.

Use the venv python

macOS system python is 3.9 which lacks X | None syntax. Always use ~/.hermes/hermes-agent/venv/bin/python3.

Script Location

Deploy to: ~/.timmy/scripts/timmy_overnight_loop.py Results in: ~/.timmy/overnight-loop/

Output Format

  • overnight_run_YYYYMMDD_HHMMSS.jsonl — one JSON line per task with full result
  • overnight_summary_YYYYMMDD_HHMMSS.md — rolling human-readable summary

Each JSONL entry contains:

{
  "task_id": "read-soul",
  "run": 1,
  "started_at": "...",
  "finished_at": "...",
  "elapsed_seconds": 45.2,
  "status": "pass|empty|error",
  "response": "...",
  "session_id": "...",
  "provider": "custom",
  "base_url": "http://localhost:8081/v1",
  "model": "hermes4:14b",
  "prompt": "...",
  "error": null
}

Task Design

Good overnight tasks are:

  1. Single tool call — read one file, search one pattern
  2. Verifiable — expected output is known (file exists, content is deterministic)
  3. Varied — mix of read_file, write_file, search_files
  4. Grounded — require actual file operations, not knowledge recall
  5. Short prompt — under 100 words

Example tasks:

  • "Read ~/.timmy/SOUL.md. Quote the first sentence of the Prime Directive."
  • "Search ~/.hermes/bin/ for the string 'chatgpt.com'. Report which files."
  • "Write a file to ~/.timmy/overnight-loop/timmy_wrote_this.md with content: ..."
  • "Read ~/.hermes/config.yaml. What model is configured as default?"

Starting the Loop

cd ~/.hermes/hermes-agent
nohup venv/bin/python3 ~/.timmy/scripts/timmy_overnight_loop.py \
  > ~/.timmy/overnight-loop/loop_stdout.log 2>&1 &
echo "PID: $!"

Monitoring

# Check if running
pgrep -f timmy_overnight_loop

# Live progress
tail -f ~/.timmy/overnight-loop/loop_stdout.log

# Latest summary
cat ~/.timmy/overnight-loop/overnight_summary_*.md | tail -30

# Count completed tasks
wc -l ~/.timmy/overnight-loop/overnight_run_*.jsonl

Stopping

pkill -f timmy_overnight_loop

Morning Analysis

Key metrics to extract:

  1. Tool call success rate — did the model actually use tools?
  2. Average response time — baseline for performance tuning
  3. Error patterns — which tasks fail and why?
  4. Pass/empty ratio — empty responses mean the model responded but didn't use tools
  5. Time-series trend — does performance degrade over cycles?
# Quick stats
python3 -c "
import json
results = [json.loads(l) for l in open('overnight_run_*.jsonl')]
passes = sum(1 for r in results if r['status'] == 'pass')
total = len(results)
avg = sum(r.get('elapsed_seconds',0) for r in results) / max(total,1)
print(f'Pass: {passes}/{total} ({100*passes//max(total,1)}%)')
print(f'Avg time: {avg:.1f}s')
print(f'Errors: {sum(1 for r in results if r[\"status\"]==\"error\")}')
"

Pitfalls

  1. Kill stale hermes processes first. Old stuck sessions compete for llama-server slots. Run pkill -f "hermes chat" before starting the loop. Also kill legacy loops: pkill -f gemini-loop; pkill -f ops-dashboard; pkill -f timmy-status.
  2. Also kill legacy loops. gemini-loop.sh, ops-dashboard.sh, timmy-status.sh may be running. They waste resources.
  3. Check llama-server health before starting. curl -s http://localhost:8081/health — if it's processing a stale request, restart it.
  4. The loop sleeps 30s between cycles. This prevents hammering the model. Adjust if needed.
  5. Gemini fallback may silently activate. If fallback_model in config.yaml points to Gemini, slow/failed local requests may route to cloud. Check config before running.
  6. Security guards block remote process kills. If running remotely via SSH, pkill commands on the Mac may need user approval. Have Alexander run kill commands directly.