[HARNESS] Z3 Crucible as a timmy-config sidecar (no Hermes fork) #86

Closed
opened 2026-03-29 00:03:08 +00:00 by Timmy · 4 comments
Owner

Context
The raw proposal is strong in spirit: add a formal verifier so Timmy can prove constraint logic instead of bluffing through scheduling/resource-allocation problems.

But the implementation must fit current architecture:

  • timmy-config is now the source-controlled home for Timmy's scripts and toolset
  • hermes-agent stays upstream; do not fork or host a custom Hermes codebase
  • raw exec() of model-written Python inside the main agent process is not acceptable as the permanent design

Triage decision
ACCEPT THE DIRECTION. RESCOPE THE IMPLEMENTATION.

We want a Z3-backed Crucible, but as a sidecar capability deployed from timmy-config, not as a direct long-lived patch to hermes-agent.

V0 architecture

  1. Install z3-solver locally on the Mac.
  2. Add a Timmy-owned verifier service/script inside timmy-config (prefer MCP server or other sidecar entrypoint already supported by Hermes config).
  3. Expose a verify_logic / crucible-style tool to the agent through sidecar config, not a fork.
  4. Keep inputs constrained:
    • preferred: structured JSON or SMT-LIB fragments
    • acceptable prototype: z3py in a jailed subprocess with timeout and strict allowlist
    • reject: unrestricted exec() in the main agent runtime
  5. Add a Timmy playbook/skill for when Crucible is mandatory:
    • scheduling
    • dependency constraints
    • resource allocation
    • consistency checking
  6. Log SAT / UNSAT / model output as proof trail.

Non-goals

  • Do NOT replace all normal conversation with formal proof.
  • Do NOT force every answer through Z3.
  • Do NOT create a permanent hermes-agent fork just to add one tool.

Acceptance criteria

  • All implementation lives in Timmy_Foundation/timmy-config and deploys through deploy.sh.
  • Hermes can call the verifier as a sidecar tool/service after deploy.
  • Example prompt works end-to-end: “Can tasks of duration 2, 3, and 4 fit in an 8-hour window if B depends on A?”
  • The system returns SAT/UNSAT plus witness model (or contradiction), not hand-wavy prose.
  • At least 3 reusable verified templates exist:
    • task scheduling
    • dependency ordering
    • capacity/resource constraints
  • Failure mode is honest: if the verifier cannot prove it, Timmy says so.

Related prior architecture issues

  • #36 / #37 Reasoning-DPO with Z3 truth oracle
  • #38 / #39 Lean 4 integration
  • #40 / #41 SymPy tool
  • #42 / #43 Adaptive logic router

Suggested first cut
Implement the narrowest useful slice first:

  • one sidecar verifier
  • one scheduling template
  • one proof-producing demo path
    Then decide whether to expand into router / training / broader neuro-symbolic work.
Context The raw proposal is strong in spirit: add a formal verifier so Timmy can prove constraint logic instead of bluffing through scheduling/resource-allocation problems. But the implementation must fit current architecture: - timmy-config is now the source-controlled home for Timmy's scripts and toolset - hermes-agent stays upstream; do not fork or host a custom Hermes codebase - raw exec() of model-written Python inside the main agent process is not acceptable as the permanent design Triage decision ACCEPT THE DIRECTION. RESCOPE THE IMPLEMENTATION. We want a Z3-backed Crucible, but as a sidecar capability deployed from timmy-config, not as a direct long-lived patch to hermes-agent. V0 architecture 1. Install z3-solver locally on the Mac. 2. Add a Timmy-owned verifier service/script inside timmy-config (prefer MCP server or other sidecar entrypoint already supported by Hermes config). 3. Expose a verify_logic / crucible-style tool to the agent through sidecar config, not a fork. 4. Keep inputs constrained: - preferred: structured JSON or SMT-LIB fragments - acceptable prototype: z3py in a jailed subprocess with timeout and strict allowlist - reject: unrestricted exec() in the main agent runtime 5. Add a Timmy playbook/skill for when Crucible is mandatory: - scheduling - dependency constraints - resource allocation - consistency checking 6. Log SAT / UNSAT / model output as proof trail. Non-goals - Do NOT replace all normal conversation with formal proof. - Do NOT force every answer through Z3. - Do NOT create a permanent hermes-agent fork just to add one tool. Acceptance criteria - All implementation lives in Timmy_Foundation/timmy-config and deploys through deploy.sh. - Hermes can call the verifier as a sidecar tool/service after deploy. - Example prompt works end-to-end: “Can tasks of duration 2, 3, and 4 fit in an 8-hour window if B depends on A?” - The system returns SAT/UNSAT plus witness model (or contradiction), not hand-wavy prose. - At least 3 reusable verified templates exist: - task scheduling - dependency ordering - capacity/resource constraints - Failure mode is honest: if the verifier cannot prove it, Timmy says so. Related prior architecture issues - #36 / #37 Reasoning-DPO with Z3 truth oracle - #38 / #39 Lean 4 integration - #40 / #41 SymPy tool - #42 / #43 Adaptive logic router Suggested first cut Implement the narrowest useful slice first: - one sidecar verifier - one scheduling template - one proof-producing demo path Then decide whether to expand into router / training / broader neuro-symbolic work.
Timmy self-assigned this 2026-03-29 00:03:08 +00:00
Author
Owner

First cut is up in PR #88.

Tangible results already verified locally:

  • ~/.hermes/bin/crucible_mcp_server.py selftest returns SAT/UNSAT witness results
  • proof logs are being written under ~/.hermes/logs/crucible/
  • fresh MCP discovery sees:
    • mcp_crucible_schedule_tasks
    • mcp_crucible_order_dependencies
    • mcp_crucible_capacity_fit

PR: http://143.198.27.163:3000/Timmy_Foundation/timmy-config/pulls/88

First cut is up in PR #88. Tangible results already verified locally: - `~/.hermes/bin/crucible_mcp_server.py selftest` returns SAT/UNSAT witness results - proof logs are being written under `~/.hermes/logs/crucible/` - fresh MCP discovery sees: - mcp_crucible_schedule_tasks - mcp_crucible_order_dependencies - mcp_crucible_capacity_fit PR: http://143.198.27.163:3000/Timmy_Foundation/timmy-config/pulls/88
Author
Owner

A GOFAI-oriented follow-on workload is now staged in timmy-config#98 — policy cards and constraint sidecar for local Timmy decisions.

Intent: build on the Crucible/sidecar philosophy without forking Hermes, and use explicit policy/constraint scaffolding to improve local decision reliability.

A GOFAI-oriented follow-on workload is now staged in `timmy-config#98` — policy cards and constraint sidecar for local Timmy decisions. Intent: build on the Crucible/sidecar philosophy without forking Hermes, and use explicit policy/constraint scaffolding to improve local decision reliability.
Author
Owner

PR #88 is open for this. Timmy: review and merge or close.

PR #88 is open for this. Timmy: review and merge or close.
Timmy closed this issue 2026-04-03 22:58:45 +00:00
Author
Owner

Crucible merged to main in commit 8ec4bff. Three templates shipping: schedule_tasks, order_dependencies, capacity_fit. Proof trail logging active. Closing.

Crucible merged to main in commit 8ec4bff. Three templates shipping: schedule_tasks, order_dependencies, capacity_fit. Proof trail logging active. Closing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#86