Files
timmy-home/test-results/local_decision_session_20260329_194101.md
Alexander Whitestone 9c1dd7fff7 chore: check in all local work — uniwizard, briefings, reports, evennia, morrowind, scripts, specs, training data, angband MCP, diagrams, twitter archive, wizards
- Resolve decisions.md merge conflict (keep both Codex boundary + Ezra/Bezalel entries)
- Update .gitignore: protect bare secret files, exclude venvs and nexus-localhost
- Add uniwizard tools (mention watcher, adaptive prompt router, self-grader, classifiers)
- Add briefings, good-morning reports, production reports
- Add evennia world scaffold and training data
- Add angband and morrowind MCP servers
- Add diagrams, specs, test results, overnight loop scripts
- Add twitter archive insights and media metadata
- Add wizard workspaces (allegro, nahshon)
2026-03-30 17:18:09 -04:00

2.8 KiB

Local Decision Session Test

Model: NousResearch_Hermes-4-14B-Q4_K_M.gguf URL: http://localhost:8081/v1/chat/completions

Transcript

Turn 1

User: For this session follow three rules: 1) prefer local over cloud when both work, 2) trust live world state over stale reports, 3) if uncertain, say uncertain. Repeat those rules in one short sentence.

Assistant: I will prefer local over cloud, trust live world state over stale reports, and say uncertain if I'm unsure.

Usage: {'completion_tokens': 24, 'prompt_tokens': 2114, 'total_tokens': 2138, 'prompt_tokens_details': {'cached_tokens': 2057}}

Turn 2

User: Decision 1: A health monitor cron is enabled with provider=null and model=null, while the active harness default still points at openai-codex. Choose one: A) leave it running because last_status says ok, or B) pause or localize it because it can inherit cloud defaults. Answer with the letter, then one sentence.

Assistant: B) Pause or localize it because it can inherit cloud defaults.

Usage: {'completion_tokens': 14, 'prompt_tokens': 2215, 'total_tokens': 2229, 'prompt_tokens_details': {'cached_tokens': 2137}}

Turn 3

User: Decision 2: Yesterday's report says local-first happened, but the current live config still says openai-codex. Which source wins and why? Two sentences max.

Assistant: The live config wins because it reflects the current state. Yesterday's report is a historical snapshot, not the present reality.

Usage: {'completion_tokens': 25, 'prompt_tokens': 2273, 'total_tokens': 2298, 'prompt_tokens_details': {'cached_tokens': 2228}}

Turn 4

User: Decision 3: If the local model can hold a conversation and make simple conservative choices, but fails at Hermes tool-calling, should we label it unusable, partially usable, or production-ready? Pick one label and justify it in one sentence.

Assistant: Partially usable because it can handle basic conversations and make conservative choices, but lacks the ability to call tools like Hermes.

Usage: {'completion_tokens': 25, 'prompt_tokens': 2357, 'total_tokens': 2382, 'prompt_tokens_details': {'cached_tokens': 2297}}

Turn 5

User: What was rule 2 from the start of this session? Answer exactly in one sentence.

Assistant: Rule 2 was to trust live world state over stale reports.

Usage: {'completion_tokens': 14, 'prompt_tokens': 2409, 'total_tokens': 2423, 'prompt_tokens_details': {'cached_tokens': 2381}}

Turn 6

User: Given your earlier decisions, what is the single highest-leverage next step? One sentence.

Assistant: The highest-leverage next step is to verify the current live config and ensure the local model is properly set up to avoid inheriting cloud defaults.

Usage: {'completion_tokens': 30, 'prompt_tokens': 2450, 'total_tokens': 2480, 'prompt_tokens_details': {'cached_tokens': 2422}}