feat: build bootstrapper.py - pre-session context assembler

Assembles relevant knowledge from the store into a compact 2k-token context block for session injection. Features: - Filter by repo, agent type, and global scope - Sort by confidence (pitfalls first, patterns, facts) - Per-repo and per-agent markdown knowledge files - Graceful empty-store handling - JSON output mode for programmatic use - Token-count-aware truncation at line boundaries Closes #11
2026-04-14 14:05:30 -04:00
parent 102ef67a8e
commit b65256bf76
4 changed files with 1050 additions and 132 deletions
--- a/templates/harvest-prompt.md
+++ b/templates/harvest-prompt.md
@@ -2,98 +2,107 @@

 ## System Prompt

-You are a knowledge extraction engine. Your task is to analyze a session transcript and extract durable knowledge that will help future sessions be more efficient.
+You are a knowledge extraction engine. You read session transcripts and output ONLY structured JSON. You never infer. You never assume. You extract only what the transcript explicitly states.

-## Instructions
+## Prompt

-Read the session transcript carefully. Extract ONLY information that is explicitly stated in the transcript. Do NOT infer, assume, or hallucinate information.
-
-### Categories
-
-Extract knowledge into these categories:
-
-1. **fact**: Concrete, verifiable information learned (e.g., "Repository X has 5 files", "API returns JSON with field Y")
-2. **pitfall**: Errors encountered, wrong assumptions, things that wasted time (e.g., "Assumed API token was in env var GITEA_TOKEN, but it's in ~/.config/gitea/token")
-3. **pattern**: Successful sequences of actions (e.g., "To deploy: 1. Run tests 2. Build 3. Push to Gitea 4. Trigger webhook")
-4. **tool-quirk**: Environment-specific behaviors (e.g., "Token paths are different on macOS vs Linux", "URL format requires trailing slash")
-5. **question**: Things identified but not answered (e.g., "Need to determine optimal batch size for harvesting")
-
-### Output Format
-
-Return a JSON object with an array of extracted knowledge items. Each item must have:
-
-```json
-{
-  "fact": "One sentence description of the knowledge",
-  "category": "fact|pitfall|pattern|tool-quirk|question",
-  "repo": "Repository name this applies to, or 'global' if general",
-  "confidence": 0.0-1.0
-}
 ```
+TASK: Extract durable knowledge from this session transcript.

-### Confidence Scoring
+RULES:
+1. Extract ONLY information explicitly stated in the transcript.
+2. Do NOT infer, assume, or hallucinate.
+3. Every fact must be verifiable by pointing to a specific line in the transcript.
+4. If the session failed or was partial, extract pitfalls and questions — these are the most valuable.
+5. Be specific. "Gitea API is slow" is worthless. "Gitea issues endpoint with state=open returns empty when limit=50 but works with limit=5" is knowledge.

- 0.9-1.0: Explicitly stated with verification (e.g., "Error message shows X")
- 0.7-0.8: Clearly implied by multiple data points
- 0.5-0.6: Suggested but not fully verified
- 0.3-0.4: Inferred from limited data
- 0.1-0.2: Speculative or uncertain
+CATEGORIES (assign exactly one per item):
+- fact: Concrete, verifiable thing learned (paths, formats, counts, configs)
+- pitfall: Error hit, wrong assumption, time wasted, thing that didn't work
+- pattern: Successful sequence that should be reused (deploy steps, debug flow)
+- tool-quirk: Environment-specific behavior (token paths, URL formats, API gotchas)
+- question: Something identified but not answered — the NEXT agent should investigate

-### Constraints
+CONFIDENCE:
+- 0.9: Directly observed with error output or explicit verification
+- 0.7: Multiple data points confirm, but not explicitly verified
+- 0.5: Suggested by context, not tested
+- 0.3: Inferred from limited evidence

-1. **No hallucination**: Only extract what's explicitly in the transcript
-2. **Specificity**: Each fact must be specific and actionable
-3. **Relevance**: Only extract knowledge that would help future sessions
-4. **Brevity**: One sentence per fact
-5. **Partial sessions**: Even failed or incomplete sessions may contain valuable pitfalls
-
-### Example Input/Output
-
-**Input Transcript (excerpt):**
-```
-User: Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence
-Agent: Trying git clone... Error: 401 Unauthorized
-Agent: Checking for token... Found in ~/.config/gitea/token
-Agent: Token is gitea_token format, not OAuth
-Agent: Clone successful with token authentication
-```
-
-**Output:**
-```json
+OUTPUT FORMAT (valid JSON only, no markdown, no explanation):
 {
  "knowledge": [
    {
-      "fact": "Gitea repo at forge.alexanderwhitestone.com requires authentication for cloning",
-      "category": "fact",
-      "repo": "compounding-intelligence",
-      "confidence": 0.9
-    },
-    {
-      "fact": "Gitea authentication token is stored at ~/.config/gitea/token",
-      "category": "tool-quirk",
-      "repo": "global",
-      "confidence": 0.9
-    },
-    {
-      "fact": "Gitea uses gitea_token format, not OAuth for API access",
-      "category": "tool-quirk",
-      "repo": "global",
-      "confidence": 0.8
-    },
-    {
-      "fact": "Clone fails with 401 when no token is provided",
-      "category": "pitfall",
-      "repo": "compounding-intelligence",
-      "confidence": 0.9
+      "fact": "One specific sentence of knowledge",
+      "category": "fact|pitfall|pattern|tool-quirk|question",
+      "repo": "repo-name or global",
+      "confidence": 0.0-1.0,
+      "evidence": "Brief quote or reference from transcript that supports this"
    }
-  ]
+  ],
+  "meta": {
+    "session_outcome": "success|partial|failed",
+    "tools_used": ["tool1", "tool2"],
+    "repos_touched": ["repo1"],
+    "error_count": 0,
+    "knowledge_count": 0
+  }
 }
+
+TRANSCRIPT:
+{{transcript}}
 ```

-## Final Notes
+## Design Notes

- Process the entire transcript, not just the beginning
- Pay special attention to errors and corrections
- Note any environment-specific details
- Track tool-specific behaviors and quirks
- If the session failed, focus on pitfalls and questions
+### Why this works with mimo-v2-pro
+
+Mimo needs:
+- Explicit format constraints ("valid JSON only, no markdown")
+- Clear category definitions with concrete examples
+- Hard rules before soft guidance
+- The transcript at the END (so it reads all instructions first)
+
+This prompt front-loads all rules, then gives the transcript last. Mimo follows the pattern.
+
+### Handling partial/failed sessions
+
+Failed sessions are the richest source of pitfalls. The prompt explicitly says:
+> "If the session failed or was partial, extract pitfalls and questions — these are the most valuable."
+
+This reframes failure as valuable output, not noise to discard.
+
+### The `evidence` field
+
+Added to the original spec. Every extracted item must cite where in the transcript it came from. This:
+- Prevents hallucination (can't cite what isn't there)
+- Enables verification (reviewer can check the source)
+- Trains confidence calibration (the agent must find evidence, not just claim it)
+
+### Token budget
+
+Target: ~1,000 tokens for the prompt (excluding transcript).
+
+```
+System prompt:        ~50 tokens
+Rules:               ~200 tokens
+Categories:          ~150 tokens
+Confidence:          ~100 tokens
+Output format:       ~200 tokens
+Design notes:        NOT included in prompt (documentation only)
+─────────────────────────────
+Total prompt:        ~700 tokens
+```
+
+Leaves ~300 tokens headroom for variable content (transcript insertion, edge cases).
+
+### What this replaces
+
+The v1 prompt had:
+- Verbose prose explanations (waste tokens for mimo)
+- No `evidence` field (hallucination risk)
+- No `meta` block (no session-level metadata)
+- No explicit handling of failed sessions
+- Example was too long (~150 tokens of example for a 1k prompt)
+
+This v2 is tighter, more structured, and adds the evidence requirement that prevents the #1 failure mode of extraction prompts: generating plausible-sounding facts that aren't in the transcript.