Files

Timmy (AI Agent) 726b867edd

Smoke Test / smoke (pull_request) Failing after 11s

Details

feat(know-thy-father): Phase 2 Multimodal Analysis Pipeline (#584 )

Implement the multimodal analysis pipeline that processes the 818-entry
media manifest from Phase 1 to extract Meaning Kernels.

Pipeline (twitter-archive/multimodal_pipeline.py):
- Images/GIFs: Visual Description → Meme Logic → Meaning Kernels
- Videos: Keyframe Extraction (ffmpeg) → Per-Frame Description →
  Sequence Analysis → Meaning Kernels
- All inference local via Gemma 4 (Ollama). Zero cloud credits.

Meaning Kernels extracted in three categories:
- SOVEREIGNTY: Bitcoin, decentralization, freedom, autonomy
- SERVICE: Building for others, caring, community, fatherhood
- THE SOUL: Identity, purpose, faith, what makes something alive

Features:
- Checkpoint/resume support (analysis_checkpoint.json)
- Per-item analysis saved to media/analysis/{tweet_id}.json
- Append-only meaning_kernels.jsonl for Phase 3 synthesis
- --synthesize flag generates categorized summary
- --type filter for photo/animated_gif/video
- Graceful error handling with error logs

Closes #584

2026-04-13 20:32:56 -04:00

5.0 KiB

Raw Blame History

name, description, version, author, license, metadata

name

description

version

author

license

metadata

know-thy-father-multimodal

Multimodal analysis pipeline for Know Thy Father. Process Twitter media (images, GIFs, videos) via Gemma 4 to extract Meaning Kernels about sovereignty, service, and the soul.

1.0.0

Timmy Time

MIT

hermes

Know Thy Father — Phase 2: Multimodal Analysis

Overview

Processes the 818-entry media manifest from Phase 1 to extract Meaning Kernels — compact philosophical observations about sovereignty, service, and the soul — using local Gemma 4 inference. Zero cloud credits.

Architecture

Phase 1 (manifest.jsonl)
    │  818 media entries with tweet text, hashtags, local paths
    ▼
Phase 2 (multimodal_pipeline.py)
    ├── Images/GIFs → Visual Description → Meme Logic → Meaning Kernels
    └── Videos → Keyframes → Audio → Sequence Analysis → Meaning Kernels
    ▼
Output
    ├── media/analysis/{tweet_id}.json     — per-item analysis
    ├── media/meaning_kernels.jsonl        — all extracted kernels
    ├── media/meaning_kernels_summary.json — categorized summary
    └── media/analysis_checkpoint.json     — resume state

Usage

Basic run (first 10 items)

cd twitter-archive
python3 multimodal_pipeline.py --manifest media/manifest.jsonl --limit 10

Resume from checkpoint

python3 multimodal_pipeline.py --resume

Process only photos

python3 multimodal_pipeline.py --type photo --limit 50

Process only videos

python3 multimodal_pipeline.py --type video --limit 10

Generate meaning kernel summary

python3 multimodal_pipeline.py --synthesize

Meaning Kernels

Each kernel is a JSON object:

{
  "category": "sovereignty|service|soul",
  "kernel": "one-sentence observation",
  "evidence": "what in the media supports this",
  "confidence": "high|medium|low",
  "source_tweet_id": "1234567890",
  "source_media_type": "photo",
  "source_hashtags": ["timmytime", "bitcoin"]
}

Pipeline Steps per Media Item

Images/GIFs

Visual Description — What is depicted, style, text overlays, emotional tone
Meme Logic — Core joke/message, cultural references, what sharing reveals
Meaning Kernel Extraction — Philosophical observations from the analysis

Videos

Keyframe Extraction — 5 evenly-spaced frames via ffmpeg
Per-Frame Description — Visual description of each keyframe
Audio Extraction — Demux to WAV (transcription via Whisper, pending)
Sequence Analysis — Narrative arc, key moments, emotional progression
Meaning Kernel Extraction — Philosophical observations from the analysis

Prerequisites

Ollama running locally with gemma4:latest (or configured model)
ffmpeg and ffprobe for video processing
Local Twitter archive media files at the paths in manifest.jsonl

Configuration (env vars)

Variable	Default	Description
`KTF_WORKSPACE`	`~/timmy-home/twitter-archive`	Project workspace
`OLLAMA_URL`	`http://localhost:11434`	Ollama API endpoint
`KTF_MODEL`	`gemma4:latest`	Model for text analysis
`KTF_VISION_MODEL`	`gemma4:latest`	Model for vision (multimodal)

Output Structure

media/
  analysis/
    {tweet_id}.json       — Full analysis per item
    {tweet_id}_error.json — Error log for failed items
  analysis_checkpoint.json — Resume state
  meaning_kernels.jsonl    — All kernels (append-only)
  meaning_kernels_summary.json — Categorized summary

Integration with Phase 3

The meaning_kernels.jsonl file is the input for Phase 3 (Holographic Synthesis):

Kernels feed into fact_store as structured memories
Categories map to memory types (sovereignty→values, service→mission, soul→identity)
Confidence scores weight fact trust levels
Source tweets provide provenance links

Pitfalls

Local-only inference — Zero cloud credits. Gemma 4 via Ollama. If Ollama is down, pipeline fails gracefully with error logs.
GIFs are videos — Twitter stores GIFs as MP4. Pipeline handles animated_gif type by extracting first frame.
Missing media files — The manifest references absolute paths from Alexander's archive. If files are moved, analysis records the error and continues.
Slow processing — Gemma 4 vision is ~5-10s per image. 818 items at 8s each = ~2 hours. Use --limit and --resume for incremental runs.
Kernel quality — Low-confidence kernels are noisy. The --synthesize command filters to high-confidence for review.

5.0 KiB Raw Blame History