6.0 KiB
name, description, version, author, license, metadata
| name | description | version | author | license | metadata | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| local-timmy-overnight-loop | Deploy an unattended overnight loop that runs grounded tasks against local llama-server via Hermes, logging every result with timing. Produces rich capability data for morning review. | 1.0.0 | Ezra | MIT |
|
Local Timmy Overnight Loop
When to Use
- Local Timmy needs to generate capability data overnight
- You want to measure tool-call success rates, response times, and failure modes
- The model is too slow for interactive use but can produce useful data unattended
- Issue #93 (proof test) needs empirical evidence from many runs
Prerequisites
- llama-server running with
--jinjaflag (required for tool calls) - Hermes agent installed at
~/.hermes/hermes-agent/ - Timmy workspace at
~/.timmy/ - Model path known (e.g.,
/Users/apayne/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf)
Key Design Decisions
Strip the system prompt
The default Timmy system prompt is ~12K tokens (SOUL.md + skills list + memory). On a 14B Q4 model, this causes multi-minute prompt processing. The overnight loop uses a minimal prompt (~100 tokens):
You are Timmy. You run locally on llama.cpp.
You MUST use the tools provided. Do not narrate tool calls as text.
When asked to read a file, call the read_file tool.
When asked to write a file, call the write_file tool.
When asked to search, call the search_files tool.
Be brief. Do the task. Report what you found.
Skip context files and memory
Pass skip_context_files=True and skip_memory=True to AIAgent to prevent injecting AGENTS.md, project context, skills, and memory into the prompt.
Restrict toolsets
Each task specifies only the toolsets it needs (usually just file). Fewer tool schemas = less context = faster processing.
Reduce context length and use single slot
Start llama-server with -c 8192 -np 1 instead of -c 65536. The -np 1 is critical — without it, llama-server defaults to 4 parallel slots, splitting 8192 into 2048 per slot. That's not enough for tool schemas + prompt, and the server silently hangs with n_decoded: 0. Single slot gives the full context to the loop's requests.
Use the venv python
macOS system python is 3.9 which lacks X | None syntax. Always use ~/.hermes/hermes-agent/venv/bin/python3.
Script Location
Deploy to: ~/.timmy/scripts/timmy_overnight_loop.py
Results in: ~/.timmy/overnight-loop/
Output Format
overnight_run_YYYYMMDD_HHMMSS.jsonl— one JSON line per task with full resultovernight_summary_YYYYMMDD_HHMMSS.md— rolling human-readable summary
Each JSONL entry contains:
{
"task_id": "read-soul",
"run": 1,
"started_at": "...",
"finished_at": "...",
"elapsed_seconds": 45.2,
"status": "pass|empty|error",
"response": "...",
"session_id": "...",
"provider": "custom",
"base_url": "http://localhost:8081/v1",
"model": "hermes4:14b",
"prompt": "...",
"error": null
}
Task Design
Good overnight tasks are:
- Single tool call — read one file, search one pattern
- Verifiable — expected output is known (file exists, content is deterministic)
- Varied — mix of read_file, write_file, search_files
- Grounded — require actual file operations, not knowledge recall
- Short prompt — under 100 words
Example tasks:
- "Read ~/.timmy/SOUL.md. Quote the first sentence of the Prime Directive."
- "Search ~/.hermes/bin/ for the string 'chatgpt.com'. Report which files."
- "Write a file to ~/.timmy/overnight-loop/timmy_wrote_this.md with content: ..."
- "Read ~/.hermes/config.yaml. What model is configured as default?"
Starting the Loop
cd ~/.hermes/hermes-agent
nohup venv/bin/python3 ~/.timmy/scripts/timmy_overnight_loop.py \
> ~/.timmy/overnight-loop/loop_stdout.log 2>&1 &
echo "PID: $!"
Monitoring
# Check if running
pgrep -f timmy_overnight_loop
# Live progress
tail -f ~/.timmy/overnight-loop/loop_stdout.log
# Latest summary
cat ~/.timmy/overnight-loop/overnight_summary_*.md | tail -30
# Count completed tasks
wc -l ~/.timmy/overnight-loop/overnight_run_*.jsonl
Stopping
pkill -f timmy_overnight_loop
Morning Analysis
Key metrics to extract:
- Tool call success rate — did the model actually use tools?
- Average response time — baseline for performance tuning
- Error patterns — which tasks fail and why?
- Pass/empty ratio — empty responses mean the model responded but didn't use tools
- Time-series trend — does performance degrade over cycles?
# Quick stats
python3 -c "
import json
results = [json.loads(l) for l in open('overnight_run_*.jsonl')]
passes = sum(1 for r in results if r['status'] == 'pass')
total = len(results)
avg = sum(r.get('elapsed_seconds',0) for r in results) / max(total,1)
print(f'Pass: {passes}/{total} ({100*passes//max(total,1)}%)')
print(f'Avg time: {avg:.1f}s')
print(f'Errors: {sum(1 for r in results if r[\"status\"]==\"error\")}')
"
Pitfalls
- Kill stale hermes processes first. Old stuck sessions compete for llama-server slots. Run
pkill -f "hermes chat"before starting the loop. Also kill legacy loops:pkill -f gemini-loop; pkill -f ops-dashboard; pkill -f timmy-status. - Also kill legacy loops. gemini-loop.sh, ops-dashboard.sh, timmy-status.sh may be running. They waste resources.
- Check llama-server health before starting.
curl -s http://localhost:8081/health— if it's processing a stale request, restart it. - The loop sleeps 30s between cycles. This prevents hammering the model. Adjust if needed.
- Gemini fallback may silently activate. If
fallback_modelin config.yaml points to Gemini, slow/failed local requests may route to cloud. Check config before running. - Security guards block remote process kills. If running remotely via SSH,
pkillcommands on the Mac may need user approval. Have Alexander run kill commands directly.