168 lines
6.5 KiB
Markdown
168 lines
6.5 KiB
Markdown
|
|
# Successor Fork Specification
|
|||
|
|
|
|||
|
|
**Parent:** Hermes v2.0 Architecture — `docs/hermes-v2.0-architecture.md`
|
|||
|
|
**Epic:** #421 — The Autogenesis Protocol
|
|||
|
|
**Author:** Allegro
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Purpose
|
|||
|
|
|
|||
|
|
The Successor Fork is the mechanism by which a Hermes v2.0 instance evaluates changes to its own architecture without risking the live runtime. It is not a subagent solving a user task. It is a **sandboxed clone of the runtime** that exists solely to answer the question:
|
|||
|
|
|
|||
|
|
> *"If I applied this architecture patch, would the result be better?"*
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Definitions
|
|||
|
|
|
|||
|
|
| Term | Definition |
|
|||
|
|
|------|------------|
|
|||
|
|
| **Parent** | The live Hermes v2.0 runtime currently serving users. |
|
|||
|
|
| **Successor** | A temporary, isolated fork of the Parent created for architectural evaluation. |
|
|||
|
|
| **Architecture Patch** | A proposed change to one or more runtime components (loop, router, memory layer, etc.). |
|
|||
|
|
| **Evaluation Task** | A bounded test or benchmark the Successor must run to validate the patch. |
|
|||
|
|
| **Verdict** | The Successor's final judgment: `APPROVE`, `REJECT`, or `NEEDS_REVISION`. |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Successor Lifecycle
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────┐ fork() ┌───────────┐ apply patch ┌─────────────┐
|
|||
|
|
│ Parent │ ──────────────> │ Successor │ ─────────────────> │ Patched │
|
|||
|
|
│ Runtime │ │ (clone) │ │ Successor │
|
|||
|
|
└─────────┘ └───────────┘ └──────┬──────┘
|
|||
|
|
▲ │
|
|||
|
|
│ verdict │
|
|||
|
|
│ + artifact bundle │
|
|||
|
|
│<─────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.1 Fork Phase
|
|||
|
|
The Parent creates a Successor by:
|
|||
|
|
1. Copying the current **state store** to a temporary SQLite file.
|
|||
|
|
2. Copying the current **tool registry** into a temporary namespace.
|
|||
|
|
3. Cloning the **conversation loop** into an isolated `asyncio` event loop.
|
|||
|
|
4. Generating a **temporary identity keypair** (never persisted to Bitcoin).
|
|||
|
|
|
|||
|
|
### 3.2 Patch Phase
|
|||
|
|
The Successor applies the Architecture Patch. This may involve:
|
|||
|
|
- Replacing a Python module
|
|||
|
|
- Adding a new schema migration
|
|||
|
|
- Changing a configuration constant
|
|||
|
|
- Swapping a tool implementation
|
|||
|
|
|
|||
|
|
**Rule:** The Successor has no network access except to localhost services required for the Evaluation Task.
|
|||
|
|
|
|||
|
|
### 3.3 Evaluation Phase
|
|||
|
|
The Successor runs the Evaluation Task. Examples:
|
|||
|
|
- "Run the full test suite. Report pass rate and execution time."
|
|||
|
|
- "Compress 100 sessions and measure memory usage vs. baseline."
|
|||
|
|
- "Handle 1,000 concurrent tool calls and report latency p99."
|
|||
|
|
|
|||
|
|
**Budget constraints:**
|
|||
|
|
- Max wall time: 1 hour
|
|||
|
|
- Max API cost: $5.00 equivalent
|
|||
|
|
- Max disk usage: 10 GB
|
|||
|
|
- Max CPU percentage: 200% (2 cores)
|
|||
|
|
|
|||
|
|
If any budget is exceeded, the Successor auto-terminates and returns `REJECT`.
|
|||
|
|
|
|||
|
|
### 3.4 Verdict Phase
|
|||
|
|
The Successor produces a Verdict document:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"verdict": "APPROVE",
|
|||
|
|
"confidence": 0.94,
|
|||
|
|
"patch_hash": "sha256:abc123...",
|
|||
|
|
"evaluation_summary": "Test suite passed (312/312). p99 latency improved 18%. Memory usage reduced 12%.",
|
|||
|
|
"caveats": ["One flaky test (test_gateway_reconnect) required 3 retries."],
|
|||
|
|
"artifacts": [
|
|||
|
|
{"type": "test_report", "path": "/tmp/successor/reports/tests.html"},
|
|||
|
|
{"type": "benchmark", "path": "/tmp/successor/reports/latency.json"}
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.5 Termination Phase
|
|||
|
|
The Successor is destroyed. Its temporary state store is deleted. Its identity keypair is zeroed. Only the Verdict document and artifact bundle survive.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Security Boundaries
|
|||
|
|
|
|||
|
|
| Resource | Parent Access | Successor Access |
|
|||
|
|
|----------|---------------|------------------|
|
|||
|
|
| Live state store | Read/Write | Read-only copy |
|
|||
|
|
| Live tool registry | Modify | Temporary copy |
|
|||
|
|
| Network (proprietary APIs) | Allowed | Blocked |
|
|||
|
|
| Network (localhost tests) | Allowed | Allowed |
|
|||
|
|
| Bitcoin identity key | Full | None |
|
|||
|
|
| User messages | Receives | None |
|
|||
|
|
| File system (outside /tmp) | Full | Read-only |
|
|||
|
|
|
|||
|
|
**Enforcement:** The Successor runs inside a lightweight container (Docker or `systemd-nspawn`) with seccomp-bpf rules.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Human and Mesh Approval Gates
|
|||
|
|
|
|||
|
|
A positive Verdict does **not** auto-deploy the patch. The patch must pass one additional gate:
|
|||
|
|
|
|||
|
|
### Phase I–III: Human Gate
|
|||
|
|
Alexander (or designated human steward) must approve the patch.
|
|||
|
|
|
|||
|
|
### Phase IV+: Mesh Consensus Gate
|
|||
|
|
The patch is gossiped to a quorum of mesh nodes. Each node may spawn its own Successor to independently verify the patch. If ≥2/3 of responding nodes return `APPROVE`, the patch is applied automatically.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. The Autogenesis Loop
|
|||
|
|
|
|||
|
|
In Phase V, the Successor Fork becomes the engine of continuous self-improvement:
|
|||
|
|
|
|||
|
|
1. **Bottleneck detector** monitors the Parent runtime.
|
|||
|
|
2. **Patch generator** drafts an architecture patch.
|
|||
|
|
3. **Successor fork** evaluates the patch.
|
|||
|
|
4. **Verifier agent** (a second, independent Successor) audits the first Successor's methodology.
|
|||
|
|
5. If both approve, the patch is gossiped for mesh consensus.
|
|||
|
|
6. If consensus passes, the Parent applies the patch during a scheduled maintenance window.
|
|||
|
|
7. The new Parent now has a new SOUL.md hash, which is signed and attested.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Interface Definition
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class SuccessorFork:
|
|||
|
|
def __init__(self, parent_runtime: HermesRuntime, patch: ArchitecturePatch):
|
|||
|
|
...
|
|||
|
|
|
|||
|
|
async def evaluate(self, task: EvaluationTask, budget: Budget) -> Verdict:
|
|||
|
|
"""
|
|||
|
|
Spawn the successor, apply the patch, run the evaluation,
|
|||
|
|
and return a Verdict. Never modifies the parent.
|
|||
|
|
"""
|
|||
|
|
...
|
|||
|
|
|
|||
|
|
def destroy(self):
|
|||
|
|
"""Clean up all temporary state. Idempotent."""
|
|||
|
|
...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Acceptance Criteria
|
|||
|
|
|
|||
|
|
- [ ] Successor can be spawned from a running Hermes v2.0 instance in <30 seconds.
|
|||
|
|
- [ ] Successor cannot modify Parent state, filesystem, or identity.
|
|||
|
|
- [ ] Successor returns a structured Verdict with confidence score and artifacts.
|
|||
|
|
- [ ] Budget enforcement auto-terminates runaway Successors.
|
|||
|
|
- [ ] At least one demo patch (e.g., "swap context compressor algorithm") is evaluated end-to-end.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*The Successor Fork is the recursive engine. It is how Hermes learns to outgrow itself.*
|