Files
timmy-home/skills/autonomous-ai-agents/know-thy-father-multimodal/SKILL.md
Timmy (AI Agent) 726b867edd
Some checks failed
Smoke Test / smoke (pull_request) Failing after 11s
feat(know-thy-father): Phase 2 Multimodal Analysis Pipeline (#584)
Implement the multimodal analysis pipeline that processes the 818-entry
media manifest from Phase 1 to extract Meaning Kernels.

Pipeline (twitter-archive/multimodal_pipeline.py):
- Images/GIFs: Visual Description → Meme Logic → Meaning Kernels
- Videos: Keyframe Extraction (ffmpeg) → Per-Frame Description →
  Sequence Analysis → Meaning Kernels
- All inference local via Gemma 4 (Ollama). Zero cloud credits.

Meaning Kernels extracted in three categories:
- SOVEREIGNTY: Bitcoin, decentralization, freedom, autonomy
- SERVICE: Building for others, caring, community, fatherhood
- THE SOUL: Identity, purpose, faith, what makes something alive

Features:
- Checkpoint/resume support (analysis_checkpoint.json)
- Per-item analysis saved to media/analysis/{tweet_id}.json
- Append-only meaning_kernels.jsonl for Phase 3 synthesis
- --synthesize flag generates categorized summary
- --type filter for photo/animated_gif/video
- Graceful error handling with error logs

Closes #584
2026-04-13 20:32:56 -04:00

5.0 KiB

name, description, version, author, license, metadata
name description version author license metadata
know-thy-father-multimodal Multimodal analysis pipeline for Know Thy Father. Process Twitter media (images, GIFs, videos) via Gemma 4 to extract Meaning Kernels about sovereignty, service, and the soul. 1.0.0 Timmy Time MIT
hermes
tags related_skills
multimodal
vision
analysis
meaning-kernels
twitter
sovereign
know-thy-father-pipeline
sovereign-meaning-synthesis

Know Thy Father — Phase 2: Multimodal Analysis

Overview

Processes the 818-entry media manifest from Phase 1 to extract Meaning Kernels — compact philosophical observations about sovereignty, service, and the soul — using local Gemma 4 inference. Zero cloud credits.

Architecture

Phase 1 (manifest.jsonl)
    │  818 media entries with tweet text, hashtags, local paths
    ▼
Phase 2 (multimodal_pipeline.py)
    ├── Images/GIFs → Visual Description → Meme Logic → Meaning Kernels
    └── Videos → Keyframes → Audio → Sequence Analysis → Meaning Kernels
    ▼
Output
    ├── media/analysis/{tweet_id}.json     — per-item analysis
    ├── media/meaning_kernels.jsonl        — all extracted kernels
    ├── media/meaning_kernels_summary.json — categorized summary
    └── media/analysis_checkpoint.json     — resume state

Usage

Basic run (first 10 items)

cd twitter-archive
python3 multimodal_pipeline.py --manifest media/manifest.jsonl --limit 10

Resume from checkpoint

python3 multimodal_pipeline.py --resume

Process only photos

python3 multimodal_pipeline.py --type photo --limit 50

Process only videos

python3 multimodal_pipeline.py --type video --limit 10

Generate meaning kernel summary

python3 multimodal_pipeline.py --synthesize

Meaning Kernels

Each kernel is a JSON object:

{
  "category": "sovereignty|service|soul",
  "kernel": "one-sentence observation",
  "evidence": "what in the media supports this",
  "confidence": "high|medium|low",
  "source_tweet_id": "1234567890",
  "source_media_type": "photo",
  "source_hashtags": ["timmytime", "bitcoin"]
}

Categories

  • SOVEREIGNTY: Self-sovereignty, Bitcoin, decentralization, freedom, autonomy
  • SERVICE: Building for others, caring for broken men, community, fatherhood
  • THE SOUL: Identity, purpose, faith, what makes something alive, the soul of technology

Pipeline Steps per Media Item

Images/GIFs

  1. Visual Description — What is depicted, style, text overlays, emotional tone
  2. Meme Logic — Core joke/message, cultural references, what sharing reveals
  3. Meaning Kernel Extraction — Philosophical observations from the analysis

Videos

  1. Keyframe Extraction — 5 evenly-spaced frames via ffmpeg
  2. Per-Frame Description — Visual description of each keyframe
  3. Audio Extraction — Demux to WAV (transcription via Whisper, pending)
  4. Sequence Analysis — Narrative arc, key moments, emotional progression
  5. Meaning Kernel Extraction — Philosophical observations from the analysis

Prerequisites

  • Ollama running locally with gemma4:latest (or configured model)
  • ffmpeg and ffprobe for video processing
  • Local Twitter archive media files at the paths in manifest.jsonl

Configuration (env vars)

Variable Default Description
KTF_WORKSPACE ~/timmy-home/twitter-archive Project workspace
OLLAMA_URL http://localhost:11434 Ollama API endpoint
KTF_MODEL gemma4:latest Model for text analysis
KTF_VISION_MODEL gemma4:latest Model for vision (multimodal)

Output Structure

media/
  analysis/
    {tweet_id}.json       — Full analysis per item
    {tweet_id}_error.json — Error log for failed items
  analysis_checkpoint.json — Resume state
  meaning_kernels.jsonl    — All kernels (append-only)
  meaning_kernels_summary.json — Categorized summary

Integration with Phase 3

The meaning_kernels.jsonl file is the input for Phase 3 (Holographic Synthesis):

  • Kernels feed into fact_store as structured memories
  • Categories map to memory types (sovereignty→values, service→mission, soul→identity)
  • Confidence scores weight fact trust levels
  • Source tweets provide provenance links

Pitfalls

  1. Local-only inference — Zero cloud credits. Gemma 4 via Ollama. If Ollama is down, pipeline fails gracefully with error logs.

  2. GIFs are videos — Twitter stores GIFs as MP4. Pipeline handles animated_gif type by extracting first frame.

  3. Missing media files — The manifest references absolute paths from Alexander's archive. If files are moved, analysis records the error and continues.

  4. Slow processing — Gemma 4 vision is ~5-10s per image. 818 items at 8s each = ~2 hours. Use --limit and --resume for incremental runs.

  5. Kernel quality — Low-confidence kernels are noisy. The --synthesize command filters to high-confidence for review.