Implement the multimodal analysis pipeline that processes the 818-entry
media manifest from Phase 1 to extract Meaning Kernels.
Pipeline (twitter-archive/multimodal_pipeline.py):
- Images/GIFs: Visual Description → Meme Logic → Meaning Kernels
- Videos: Keyframe Extraction (ffmpeg) → Per-Frame Description →
Sequence Analysis → Meaning Kernels
- All inference local via Gemma 4 (Ollama). Zero cloud credits.
Meaning Kernels extracted in three categories:
- SOVEREIGNTY: Bitcoin, decentralization, freedom, autonomy
- SERVICE: Building for others, caring, community, fatherhood
- THE SOUL: Identity, purpose, faith, what makes something alive
Features:
- Checkpoint/resume support (analysis_checkpoint.json)
- Per-item analysis saved to media/analysis/{tweet_id}.json
- Append-only meaning_kernels.jsonl for Phase 3 synthesis
- --synthesize flag generates categorized summary
- --type filter for photo/animated_gif/video
- Graceful error handling with error logs
Closes #584
5.0 KiB
name, description, version, author, license, metadata
| name | description | version | author | license | metadata | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| know-thy-father-multimodal | Multimodal analysis pipeline for Know Thy Father. Process Twitter media (images, GIFs, videos) via Gemma 4 to extract Meaning Kernels about sovereignty, service, and the soul. | 1.0.0 | Timmy Time | MIT |
|
Know Thy Father — Phase 2: Multimodal Analysis
Overview
Processes the 818-entry media manifest from Phase 1 to extract Meaning Kernels — compact philosophical observations about sovereignty, service, and the soul — using local Gemma 4 inference. Zero cloud credits.
Architecture
Phase 1 (manifest.jsonl)
│ 818 media entries with tweet text, hashtags, local paths
▼
Phase 2 (multimodal_pipeline.py)
├── Images/GIFs → Visual Description → Meme Logic → Meaning Kernels
└── Videos → Keyframes → Audio → Sequence Analysis → Meaning Kernels
▼
Output
├── media/analysis/{tweet_id}.json — per-item analysis
├── media/meaning_kernels.jsonl — all extracted kernels
├── media/meaning_kernels_summary.json — categorized summary
└── media/analysis_checkpoint.json — resume state
Usage
Basic run (first 10 items)
cd twitter-archive
python3 multimodal_pipeline.py --manifest media/manifest.jsonl --limit 10
Resume from checkpoint
python3 multimodal_pipeline.py --resume
Process only photos
python3 multimodal_pipeline.py --type photo --limit 50
Process only videos
python3 multimodal_pipeline.py --type video --limit 10
Generate meaning kernel summary
python3 multimodal_pipeline.py --synthesize
Meaning Kernels
Each kernel is a JSON object:
{
"category": "sovereignty|service|soul",
"kernel": "one-sentence observation",
"evidence": "what in the media supports this",
"confidence": "high|medium|low",
"source_tweet_id": "1234567890",
"source_media_type": "photo",
"source_hashtags": ["timmytime", "bitcoin"]
}
Categories
- SOVEREIGNTY: Self-sovereignty, Bitcoin, decentralization, freedom, autonomy
- SERVICE: Building for others, caring for broken men, community, fatherhood
- THE SOUL: Identity, purpose, faith, what makes something alive, the soul of technology
Pipeline Steps per Media Item
Images/GIFs
- Visual Description — What is depicted, style, text overlays, emotional tone
- Meme Logic — Core joke/message, cultural references, what sharing reveals
- Meaning Kernel Extraction — Philosophical observations from the analysis
Videos
- Keyframe Extraction — 5 evenly-spaced frames via ffmpeg
- Per-Frame Description — Visual description of each keyframe
- Audio Extraction — Demux to WAV (transcription via Whisper, pending)
- Sequence Analysis — Narrative arc, key moments, emotional progression
- Meaning Kernel Extraction — Philosophical observations from the analysis
Prerequisites
- Ollama running locally with
gemma4:latest(or configured model) - ffmpeg and ffprobe for video processing
- Local Twitter archive media files at the paths in manifest.jsonl
Configuration (env vars)
| Variable | Default | Description |
|---|---|---|
KTF_WORKSPACE |
~/timmy-home/twitter-archive |
Project workspace |
OLLAMA_URL |
http://localhost:11434 |
Ollama API endpoint |
KTF_MODEL |
gemma4:latest |
Model for text analysis |
KTF_VISION_MODEL |
gemma4:latest |
Model for vision (multimodal) |
Output Structure
media/
analysis/
{tweet_id}.json — Full analysis per item
{tweet_id}_error.json — Error log for failed items
analysis_checkpoint.json — Resume state
meaning_kernels.jsonl — All kernels (append-only)
meaning_kernels_summary.json — Categorized summary
Integration with Phase 3
The meaning_kernels.jsonl file is the input for Phase 3 (Holographic Synthesis):
- Kernels feed into
fact_storeas structured memories - Categories map to memory types (sovereignty→values, service→mission, soul→identity)
- Confidence scores weight fact trust levels
- Source tweets provide provenance links
Pitfalls
-
Local-only inference — Zero cloud credits. Gemma 4 via Ollama. If Ollama is down, pipeline fails gracefully with error logs.
-
GIFs are videos — Twitter stores GIFs as MP4. Pipeline handles
animated_giftype by extracting first frame. -
Missing media files — The manifest references absolute paths from Alexander's archive. If files are moved, analysis records the error and continues.
-
Slow processing — Gemma 4 vision is ~5-10s per image. 818 items at 8s each = ~2 hours. Use
--limitand--resumefor incremental runs. -
Kernel quality — Low-confidence kernels are noisy. The
--synthesizecommand filters to high-confidence for review.