152 lines
5.7 KiB
Markdown
152 lines
5.7 KiB
Markdown
|
|
# Deep Dive Prompt Engineering — Knowledge Transfer
|
|||
|
|
|
|||
|
|
> **Issue**: [#830](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/830) — Deep Dive: Sovereign NotebookLM + Daily AI Intelligence Briefing
|
|||
|
|
> **Created**: 2026-04-05 by Ezra, Archivist
|
|||
|
|
> **Purpose**: Explain how the production synthesis prompt works, how to A/B test it, and how to maintain quality as the fleet evolves.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. The Prompt Files
|
|||
|
|
|
|||
|
|
| File | Role | When to Change |
|
|||
|
|
|------|------|----------------|
|
|||
|
|
| `production_briefing_v1.txt` | Default prompt for daily briefing generation | When voice quality degrades or acceptance criteria drift |
|
|||
|
|
| `production_briefing_v2_*.txt` | Experimental variants | During A/B tests |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Design Philosophy
|
|||
|
|
|
|||
|
|
The prompt is engineered around **three non-negotiables** from Alexander:
|
|||
|
|
|
|||
|
|
1. **Grounded in our world first** — Fleet context is not decoration. It must shape the narrative.
|
|||
|
|
2. **Actionable, not encyclopedic** — Every headline needs a "so what" for Timmy Foundation work.
|
|||
|
|
3. **Premium audio experience** — The output is a podcast script, not a report. Structure, pacing, and tone matter.
|
|||
|
|
|
|||
|
|
### Why 1,300–1,950 words?
|
|||
|
|
|
|||
|
|
At a natural speaking pace of ~130 WPM:
|
|||
|
|
- 1,300 words ≈ 10 minutes
|
|||
|
|
- 1,950 words ≈ 15 minutes
|
|||
|
|
|
|||
|
|
This hits the acceptance criterion for default audio runtime.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Prompt Architecture
|
|||
|
|
|
|||
|
|
The prompt has four layers:
|
|||
|
|
|
|||
|
|
### Layer 1: Persona
|
|||
|
|
> "You are the voice of Deep Dive..."
|
|||
|
|
|
|||
|
|
This establishes tone, authority, and audience. It prevents the model from slipping into academic summarizer mode.
|
|||
|
|
|
|||
|
|
### Layer 2: Output Schema
|
|||
|
|
> "Write this as a single continuous narrative... Structure the script in exactly these sections..."
|
|||
|
|
|
|||
|
|
The schema forces consistency. Without it, LLMs tend to produce bullet lists or inconsistent section ordering.
|
|||
|
|
|
|||
|
|
### Layer 3: Content Constraints
|
|||
|
|
> "Every headline item MUST include a connection to our work..."
|
|||
|
|
|
|||
|
|
This is the grounding enforcement layer. It raises the cost of generic summaries.
|
|||
|
|
|
|||
|
|
### Layer 4: Dynamic Context
|
|||
|
|
> `{{FLEET_CONTEXT}}` and `{{RESEARCH_ITEMS}}`
|
|||
|
|
|
|||
|
|
These are template variables substituted at runtime by `pipeline.py`. The prompt is **data-agnostic** — it defines how to think about whatever data is injected.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Integration with Pipeline
|
|||
|
|
|
|||
|
|
In `pipeline.py`, the `SynthesisEngine` loads the prompt file (if configured) and performs substitution:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Pseudo-code from pipeline.py
|
|||
|
|
prompt_template = load_prompt("prompts/production_briefing_v1.txt")
|
|||
|
|
prompt = prompt_template.replace("{{FLEET_CONTEXT}}", fleet_ctx.to_prompt_text())
|
|||
|
|
prompt = prompt.replace("{{RESEARCH_ITEMS}}", format_items(items))
|
|||
|
|
synthesis = self._call_llm(prompt)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
To switch prompts, update `config.yaml`:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
synthesis:
|
|||
|
|
llm_endpoint: "http://localhost:4000/v1"
|
|||
|
|
prompt_file: "prompts/production_briefing_v1.txt"
|
|||
|
|
max_tokens: 2500
|
|||
|
|
temperature: 0.7
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. A/B Testing Protocol
|
|||
|
|
|
|||
|
|
### Hypothesis Template
|
|||
|
|
|
|||
|
|
| Variant | Hypothesis | Expected Change |
|
|||
|
|
|---------|------------|-----------------|
|
|||
|
|
| V1 (default) | Neutral podcast script with fleet grounding | Baseline |
|
|||
|
|
| V2 (shorter) | Tighter 8–10 min briefings with sharper implications | Higher actionability score |
|
|||
|
|
| V3 (narrative) | Story-driven opening with character arcs for projects | Higher engagement, risk of lower conciseness |
|
|||
|
|
|
|||
|
|
### Test Procedure
|
|||
|
|
|
|||
|
|
1. Copy `production_briefing_v1.txt` → `production_briefing_v2_test.txt`
|
|||
|
|
2. Make a single controlled change (e.g., tighten word-count target, add explicit "Risk / Opportunity / Watch" subsection)
|
|||
|
|
3. Run the pipeline with both prompts against the **same** set of research items:
|
|||
|
|
```bash
|
|||
|
|
python3 pipeline.py --config config.v1.yaml --today --output briefing_v1.json
|
|||
|
|
python3 pipeline.py --config config.v2.yaml --today --output briefing_v2.json
|
|||
|
|
```
|
|||
|
|
4. Evaluate both with `quality_eval.py`:
|
|||
|
|
```bash
|
|||
|
|
python3 quality_eval.py briefing_v1.json --json > report_v1.json
|
|||
|
|
python3 quality_eval.py briefing_v2.json --json > report_v2.json
|
|||
|
|
```
|
|||
|
|
5. Compare dimension scores. Winner becomes the new default.
|
|||
|
|
6. Record results in `prompts/EXPERIMENTS.md`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Common Failure Modes & Fixes
|
|||
|
|
|
|||
|
|
| Symptom | Root Cause | Fix |
|
|||
|
|
|---------|------------|-----|
|
|||
|
|
| Bullet lists instead of narrative | Model defaulting to summarization | Strengthen "single continuous narrative" instruction; add example opening |
|
|||
|
|
| Generic connections ("this could be useful for AI") | Fleet context too abstract or model not penalized | Require explicit repo/issue names; verify `fleet_context` injection |
|
|||
|
|
| Too short (< 1,000 words) | Model being overly efficient | Raise `max_tokens` to 2500+; tighten lower bound in prompt |
|
|||
|
|
| Too long (> 2,200 words) | Model over-explaining each paper | Tighten upper bound; limit to top 4 items instead of 5 |
|
|||
|
|
| Robotic tone | Temperature too low or persona too vague | Raise temperature to 0.75; strengthen voice rules |
|
|||
|
|
| Ignores fleet context | Context injected at wrong position or too long | Move fleet context closer to the research items; truncate to top 3 repos/issues/commits |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Maintenance Checklist
|
|||
|
|
|
|||
|
|
Review this prompt monthly or whenever fleet structure changes significantly:
|
|||
|
|
|
|||
|
|
- [ ] Does the persona still match Alexander's preferred tone?
|
|||
|
|
- [ ] Are the repo names in the examples still current?
|
|||
|
|
- [ ] Does the word-count target still map to desired audio length?
|
|||
|
|
- [ ] Have any new acceptance criteria emerged that need prompt constraints?
|
|||
|
|
- [ ] Is the latest winning A/B variant promoted to `production_briefing_v1.txt`?
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Accountability
|
|||
|
|
|
|||
|
|
| Role | Owner |
|
|||
|
|
|------|-------|
|
|||
|
|
| Prompt architecture | @ezra |
|
|||
|
|
| A/B test execution | @gemini or assigned code agent |
|
|||
|
|
| Quality evaluation | Automated via `quality_eval.py` |
|
|||
|
|
| Final tone approval | @rockachopa (Alexander) |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*Last updated: 2026-04-05 by Ezra, Archivist*
|