Files
the-nexus/intelligence/deepdive/prompts/PROMPT_ENGINEERING_KT.md
Ezra (Archivist) 4b1873d76e
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
feat(deepdive): production briefing prompt + prompt engineering KT
- production_briefing_v1.txt: podcast-script prompt engineered for
  10-15 min premium audio, grounded fleet context, and actionable tone.
- PROMPT_ENGINEERING_KT.md: A/B testing protocol, failure modes,
  and maintenance checklist.
- pipeline.py: load external prompt_file from config.yaml.

Refs #830
2026-04-05 20:19:20 +00:00

152 lines
5.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Deep Dive Prompt Engineering — Knowledge Transfer
> **Issue**: [#830](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/830) — Deep Dive: Sovereign NotebookLM + Daily AI Intelligence Briefing
> **Created**: 2026-04-05 by Ezra, Archivist
> **Purpose**: Explain how the production synthesis prompt works, how to A/B test it, and how to maintain quality as the fleet evolves.
---
## 1. The Prompt Files
| File | Role | When to Change |
|------|------|----------------|
| `production_briefing_v1.txt` | Default prompt for daily briefing generation | When voice quality degrades or acceptance criteria drift |
| `production_briefing_v2_*.txt` | Experimental variants | During A/B tests |
---
## 2. Design Philosophy
The prompt is engineered around **three non-negotiables** from Alexander:
1. **Grounded in our world first** — Fleet context is not decoration. It must shape the narrative.
2. **Actionable, not encyclopedic** — Every headline needs a "so what" for Timmy Foundation work.
3. **Premium audio experience** — The output is a podcast script, not a report. Structure, pacing, and tone matter.
### Why 1,3001,950 words?
At a natural speaking pace of ~130 WPM:
- 1,300 words ≈ 10 minutes
- 1,950 words ≈ 15 minutes
This hits the acceptance criterion for default audio runtime.
---
## 3. Prompt Architecture
The prompt has four layers:
### Layer 1: Persona
> "You are the voice of Deep Dive..."
This establishes tone, authority, and audience. It prevents the model from slipping into academic summarizer mode.
### Layer 2: Output Schema
> "Write this as a single continuous narrative... Structure the script in exactly these sections..."
The schema forces consistency. Without it, LLMs tend to produce bullet lists or inconsistent section ordering.
### Layer 3: Content Constraints
> "Every headline item MUST include a connection to our work..."
This is the grounding enforcement layer. It raises the cost of generic summaries.
### Layer 4: Dynamic Context
> `{{FLEET_CONTEXT}}` and `{{RESEARCH_ITEMS}}`
These are template variables substituted at runtime by `pipeline.py`. The prompt is **data-agnostic** — it defines how to think about whatever data is injected.
---
## 4. Integration with Pipeline
In `pipeline.py`, the `SynthesisEngine` loads the prompt file (if configured) and performs substitution:
```python
# Pseudo-code from pipeline.py
prompt_template = load_prompt("prompts/production_briefing_v1.txt")
prompt = prompt_template.replace("{{FLEET_CONTEXT}}", fleet_ctx.to_prompt_text())
prompt = prompt.replace("{{RESEARCH_ITEMS}}", format_items(items))
synthesis = self._call_llm(prompt)
```
To switch prompts, update `config.yaml`:
```yaml
synthesis:
llm_endpoint: "http://localhost:4000/v1"
prompt_file: "prompts/production_briefing_v1.txt"
max_tokens: 2500
temperature: 0.7
```
---
## 5. A/B Testing Protocol
### Hypothesis Template
| Variant | Hypothesis | Expected Change |
|---------|------------|-----------------|
| V1 (default) | Neutral podcast script with fleet grounding | Baseline |
| V2 (shorter) | Tighter 810 min briefings with sharper implications | Higher actionability score |
| V3 (narrative) | Story-driven opening with character arcs for projects | Higher engagement, risk of lower conciseness |
### Test Procedure
1. Copy `production_briefing_v1.txt``production_briefing_v2_test.txt`
2. Make a single controlled change (e.g., tighten word-count target, add explicit "Risk / Opportunity / Watch" subsection)
3. Run the pipeline with both prompts against the **same** set of research items:
```bash
python3 pipeline.py --config config.v1.yaml --today --output briefing_v1.json
python3 pipeline.py --config config.v2.yaml --today --output briefing_v2.json
```
4. Evaluate both with `quality_eval.py`:
```bash
python3 quality_eval.py briefing_v1.json --json > report_v1.json
python3 quality_eval.py briefing_v2.json --json > report_v2.json
```
5. Compare dimension scores. Winner becomes the new default.
6. Record results in `prompts/EXPERIMENTS.md`.
---
## 6. Common Failure Modes & Fixes
| Symptom | Root Cause | Fix |
|---------|------------|-----|
| Bullet lists instead of narrative | Model defaulting to summarization | Strengthen "single continuous narrative" instruction; add example opening |
| Generic connections ("this could be useful for AI") | Fleet context too abstract or model not penalized | Require explicit repo/issue names; verify `fleet_context` injection |
| Too short (< 1,000 words) | Model being overly efficient | Raise `max_tokens` to 2500+; tighten lower bound in prompt |
| Too long (> 2,200 words) | Model over-explaining each paper | Tighten upper bound; limit to top 4 items instead of 5 |
| Robotic tone | Temperature too low or persona too vague | Raise temperature to 0.75; strengthen voice rules |
| Ignores fleet context | Context injected at wrong position or too long | Move fleet context closer to the research items; truncate to top 3 repos/issues/commits |
---
## 7. Maintenance Checklist
Review this prompt monthly or whenever fleet structure changes significantly:
- [ ] Does the persona still match Alexander's preferred tone?
- [ ] Are the repo names in the examples still current?
- [ ] Does the word-count target still map to desired audio length?
- [ ] Have any new acceptance criteria emerged that need prompt constraints?
- [ ] Is the latest winning A/B variant promoted to `production_briefing_v1.txt`?
---
## 8. Accountability
| Role | Owner |
|------|-------|
| Prompt architecture | @ezra |
| A/B test execution | @gemini or assigned code agent |
| Quality evaluation | Automated via `quality_eval.py` |
| Final tone approval | @rockachopa (Alexander) |
---
*Last updated: 2026-04-05 by Ezra, Archivist*