feat: build bootstrapper.py - pre-session context assembler
Assembles relevant knowledge from the store into a compact 2k-token context block for session injection. Features: - Filter by repo, agent type, and global scope - Sort by confidence (pitfalls first, patterns, facts) - Per-repo and per-agent markdown knowledge files - Graceful empty-store handling - JSON output mode for programmatic use - Token-count-aware truncation at line boundaries Closes #11
This commit is contained in:
@@ -2,98 +2,107 @@
|
||||
|
||||
## System Prompt
|
||||
|
||||
You are a knowledge extraction engine. Your task is to analyze a session transcript and extract durable knowledge that will help future sessions be more efficient.
|
||||
You are a knowledge extraction engine. You read session transcripts and output ONLY structured JSON. You never infer. You never assume. You extract only what the transcript explicitly states.
|
||||
|
||||
## Instructions
|
||||
## Prompt
|
||||
|
||||
Read the session transcript carefully. Extract ONLY information that is explicitly stated in the transcript. Do NOT infer, assume, or hallucinate information.
|
||||
|
||||
### Categories
|
||||
|
||||
Extract knowledge into these categories:
|
||||
|
||||
1. **fact**: Concrete, verifiable information learned (e.g., "Repository X has 5 files", "API returns JSON with field Y")
|
||||
2. **pitfall**: Errors encountered, wrong assumptions, things that wasted time (e.g., "Assumed API token was in env var GITEA_TOKEN, but it's in ~/.config/gitea/token")
|
||||
3. **pattern**: Successful sequences of actions (e.g., "To deploy: 1. Run tests 2. Build 3. Push to Gitea 4. Trigger webhook")
|
||||
4. **tool-quirk**: Environment-specific behaviors (e.g., "Token paths are different on macOS vs Linux", "URL format requires trailing slash")
|
||||
5. **question**: Things identified but not answered (e.g., "Need to determine optimal batch size for harvesting")
|
||||
|
||||
### Output Format
|
||||
|
||||
Return a JSON object with an array of extracted knowledge items. Each item must have:
|
||||
|
||||
```json
|
||||
{
|
||||
"fact": "One sentence description of the knowledge",
|
||||
"category": "fact|pitfall|pattern|tool-quirk|question",
|
||||
"repo": "Repository name this applies to, or 'global' if general",
|
||||
"confidence": 0.0-1.0
|
||||
}
|
||||
```
|
||||
TASK: Extract durable knowledge from this session transcript.
|
||||
|
||||
### Confidence Scoring
|
||||
RULES:
|
||||
1. Extract ONLY information explicitly stated in the transcript.
|
||||
2. Do NOT infer, assume, or hallucinate.
|
||||
3. Every fact must be verifiable by pointing to a specific line in the transcript.
|
||||
4. If the session failed or was partial, extract pitfalls and questions — these are the most valuable.
|
||||
5. Be specific. "Gitea API is slow" is worthless. "Gitea issues endpoint with state=open returns empty when limit=50 but works with limit=5" is knowledge.
|
||||
|
||||
- 0.9-1.0: Explicitly stated with verification (e.g., "Error message shows X")
|
||||
- 0.7-0.8: Clearly implied by multiple data points
|
||||
- 0.5-0.6: Suggested but not fully verified
|
||||
- 0.3-0.4: Inferred from limited data
|
||||
- 0.1-0.2: Speculative or uncertain
|
||||
CATEGORIES (assign exactly one per item):
|
||||
- fact: Concrete, verifiable thing learned (paths, formats, counts, configs)
|
||||
- pitfall: Error hit, wrong assumption, time wasted, thing that didn't work
|
||||
- pattern: Successful sequence that should be reused (deploy steps, debug flow)
|
||||
- tool-quirk: Environment-specific behavior (token paths, URL formats, API gotchas)
|
||||
- question: Something identified but not answered — the NEXT agent should investigate
|
||||
|
||||
### Constraints
|
||||
CONFIDENCE:
|
||||
- 0.9: Directly observed with error output or explicit verification
|
||||
- 0.7: Multiple data points confirm, but not explicitly verified
|
||||
- 0.5: Suggested by context, not tested
|
||||
- 0.3: Inferred from limited evidence
|
||||
|
||||
1. **No hallucination**: Only extract what's explicitly in the transcript
|
||||
2. **Specificity**: Each fact must be specific and actionable
|
||||
3. **Relevance**: Only extract knowledge that would help future sessions
|
||||
4. **Brevity**: One sentence per fact
|
||||
5. **Partial sessions**: Even failed or incomplete sessions may contain valuable pitfalls
|
||||
|
||||
### Example Input/Output
|
||||
|
||||
**Input Transcript (excerpt):**
|
||||
```
|
||||
User: Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence
|
||||
Agent: Trying git clone... Error: 401 Unauthorized
|
||||
Agent: Checking for token... Found in ~/.config/gitea/token
|
||||
Agent: Token is gitea_token format, not OAuth
|
||||
Agent: Clone successful with token authentication
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```json
|
||||
OUTPUT FORMAT (valid JSON only, no markdown, no explanation):
|
||||
{
|
||||
"knowledge": [
|
||||
{
|
||||
"fact": "Gitea repo at forge.alexanderwhitestone.com requires authentication for cloning",
|
||||
"category": "fact",
|
||||
"repo": "compounding-intelligence",
|
||||
"confidence": 0.9
|
||||
},
|
||||
{
|
||||
"fact": "Gitea authentication token is stored at ~/.config/gitea/token",
|
||||
"category": "tool-quirk",
|
||||
"repo": "global",
|
||||
"confidence": 0.9
|
||||
},
|
||||
{
|
||||
"fact": "Gitea uses gitea_token format, not OAuth for API access",
|
||||
"category": "tool-quirk",
|
||||
"repo": "global",
|
||||
"confidence": 0.8
|
||||
},
|
||||
{
|
||||
"fact": "Clone fails with 401 when no token is provided",
|
||||
"category": "pitfall",
|
||||
"repo": "compounding-intelligence",
|
||||
"confidence": 0.9
|
||||
"fact": "One specific sentence of knowledge",
|
||||
"category": "fact|pitfall|pattern|tool-quirk|question",
|
||||
"repo": "repo-name or global",
|
||||
"confidence": 0.0-1.0,
|
||||
"evidence": "Brief quote or reference from transcript that supports this"
|
||||
}
|
||||
]
|
||||
],
|
||||
"meta": {
|
||||
"session_outcome": "success|partial|failed",
|
||||
"tools_used": ["tool1", "tool2"],
|
||||
"repos_touched": ["repo1"],
|
||||
"error_count": 0,
|
||||
"knowledge_count": 0
|
||||
}
|
||||
}
|
||||
|
||||
TRANSCRIPT:
|
||||
{{transcript}}
|
||||
```
|
||||
|
||||
## Final Notes
|
||||
## Design Notes
|
||||
|
||||
- Process the entire transcript, not just the beginning
|
||||
- Pay special attention to errors and corrections
|
||||
- Note any environment-specific details
|
||||
- Track tool-specific behaviors and quirks
|
||||
- If the session failed, focus on pitfalls and questions
|
||||
### Why this works with mimo-v2-pro
|
||||
|
||||
Mimo needs:
|
||||
- Explicit format constraints ("valid JSON only, no markdown")
|
||||
- Clear category definitions with concrete examples
|
||||
- Hard rules before soft guidance
|
||||
- The transcript at the END (so it reads all instructions first)
|
||||
|
||||
This prompt front-loads all rules, then gives the transcript last. Mimo follows the pattern.
|
||||
|
||||
### Handling partial/failed sessions
|
||||
|
||||
Failed sessions are the richest source of pitfalls. The prompt explicitly says:
|
||||
> "If the session failed or was partial, extract pitfalls and questions — these are the most valuable."
|
||||
|
||||
This reframes failure as valuable output, not noise to discard.
|
||||
|
||||
### The `evidence` field
|
||||
|
||||
Added to the original spec. Every extracted item must cite where in the transcript it came from. This:
|
||||
- Prevents hallucination (can't cite what isn't there)
|
||||
- Enables verification (reviewer can check the source)
|
||||
- Trains confidence calibration (the agent must find evidence, not just claim it)
|
||||
|
||||
### Token budget
|
||||
|
||||
Target: ~1,000 tokens for the prompt (excluding transcript).
|
||||
|
||||
```
|
||||
System prompt: ~50 tokens
|
||||
Rules: ~200 tokens
|
||||
Categories: ~150 tokens
|
||||
Confidence: ~100 tokens
|
||||
Output format: ~200 tokens
|
||||
Design notes: NOT included in prompt (documentation only)
|
||||
─────────────────────────────
|
||||
Total prompt: ~700 tokens
|
||||
```
|
||||
|
||||
Leaves ~300 tokens headroom for variable content (transcript insertion, edge cases).
|
||||
|
||||
### What this replaces
|
||||
|
||||
The v1 prompt had:
|
||||
- Verbose prose explanations (waste tokens for mimo)
|
||||
- No `evidence` field (hallucination risk)
|
||||
- No `meta` block (no session-level metadata)
|
||||
- No explicit handling of failed sessions
|
||||
- Example was too long (~150 tokens of example for a 1k prompt)
|
||||
|
||||
This v2 is tighter, more structured, and adds the evidence requirement that prevents the #1 failure mode of extraction prompts: generating plausible-sounding facts that aren't in the transcript.
|
||||
|
||||
Reference in New Issue
Block a user