From 55beaf241f90e2baceb28a671e478ebd0473fe14 Mon Sep 17 00:00:00 2001 From: "Claude (Opus 4.6)" Date: Tue, 24 Mar 2026 01:46:28 +0000 Subject: [PATCH] [claude] Research summary: Kimi creative blueprint (#891) (#1286) --- docs/research/kimi-creative-blueprint-891.md | 290 +++++++++++++++++++ 1 file changed, 290 insertions(+) create mode 100644 docs/research/kimi-creative-blueprint-891.md diff --git a/docs/research/kimi-creative-blueprint-891.md b/docs/research/kimi-creative-blueprint-891.md new file mode 100644 index 0000000..faaa1fc --- /dev/null +++ b/docs/research/kimi-creative-blueprint-891.md @@ -0,0 +1,290 @@ +# Building Timmy: Technical Blueprint for Sovereign Creative AI + +> **Source:** PDF attached to issue #891, "Building Timmy: a technical blueprint for sovereign +> creative AI" — generated by Kimi.ai, 16 pages, filed by Perplexity for Timmy's review. +> **Filed:** 2026-03-22 · **Reviewed:** 2026-03-23 + +--- + +## Executive Summary + +The blueprint establishes that a sovereign creative AI capable of coding, composing music, +generating art, building worlds, publishing narratives, and managing its own economy is +**technically feasible today** — but only through orchestration of dozens of tools operating +at different maturity levels. The core insight: *the integration is the invention*. No single +component is new; the missing piece is a coherent identity operating across all domains +simultaneously with persistent memory, autonomous economics, and cross-domain creative +reactions. + +Three non-negotiable architectural decisions: +1. **Human oversight for all public-facing content** — every successful creative AI has this; + every one that removed it failed. +2. **Legal entity before economic activity** — AI agents are not legal persons; establish + structure before wealth accumulates (Truth Terminal cautionary tale: $20M acquired before + a foundation was retroactively created). +3. **Hybrid memory: vector search + knowledge graph** — neither alone is sufficient for + multi-domain context breadth. + +--- + +## Domain-by-Domain Assessment + +### Software Development (immediately deployable) + +| Component | Recommendation | Notes | +|-----------|----------------|-------| +| Primary agent | Claude Code (Opus 4.6, 77.2% SWE-bench) | Already in use | +| Self-hosted forge | Forgejo (MIT, 170–200MB RAM) | Project uses Gitea/Forgejo now | +| CI/CD | GitHub Actions-compatible via `act_runner` | — | +| Tool-making | LATM pattern: frontier model creates tools, cheaper model applies them | New — see ADR opportunity | +| Open-source fallback | OpenHands (~65% SWE-bench, Docker sandboxed) | Backup to Claude Code | +| Self-improvement | Darwin Gödel Machine / SICA patterns | 3–6 month investment | + +**Development estimate:** 2–3 weeks for Forgejo + Claude Code integration with automated +PR workflows; 1–2 months for self-improving tool-making pipeline. + +**Cross-reference:** This project already runs Claude Code agents on Forgejo. The LATM +pattern (tool registry) and self-improvement loop are the actionable gaps. + +--- + +### Music (1–4 weeks) + +| Component | Recommendation | Notes | +|-----------|----------------|-------| +| Commercial vocals | Suno v5 API (~$0.03/song, $30/month Premier) | No official API; third-party: sunoapi.org, AIMLAPI, EvoLink | +| Local instrumental | MusicGen 1.5B (CC-BY-NC — monetization blocker) | On M2 Max: ~60s for 5s clip | +| Voice cloning | GPT-SoVITS v4 (MIT) | Works on Apple Silicon CPU, RTF 0.526 on M4 | +| Voice conversion | RVC (MIT, 5–10 min training audio) | — | +| Apple Silicon TTS | MLX-Audio: Kokoro 82M + Qwen3-TTS 0.6B | 4–5x faster via Metal | +| Publishing | Wavlake (90/10 split, Lightning micropayments) | Auto-syndicates to Fountain.fm | +| Nostr | NIP-94 (kind:1063) audio events → NIP-96 servers | — | + +**Copyright reality:** US Copyright Office (Jan 2025) and US Court of Appeals (Mar 2025): +purely AI-generated music cannot be copyrighted and enters public domain. Wavlake's +Value4Value model works around this — fans pay for relationship, not exclusive rights. + +**Avoid:** Udio (download disabled since Oct 2025, 2.4/5 Trustpilot). + +--- + +### Visual Art (1–3 weeks) + +| Component | Recommendation | Notes | +|-----------|----------------|-------| +| Local generation | ComfyUI API at `127.0.0.1:8188` (programmatic control via WebSocket) | MLX extension: 50–70% faster | +| Speed | Draw Things (free, Mac App Store) | 3× faster than ComfyUI via Metal shaders | +| Quality frontier | Flux 2 (Nov 2025, 4MP, multi-reference) | SDXL needs 16GB+, Flux Dev 32GB+ | +| Character consistency | LoRA training (30 min, 15–30 references) + Flux.1 Kontext | Solved problem | +| Face consistency | IP-Adapter + FaceID (ComfyUI-IP-Adapter-Plus) | Training-free | +| Comics | Jenova AI ($20/month, 200+ page consistency) or LlamaGen AI (free) | — | +| Publishing | Blossom protocol (SHA-256 addressed, kind:10063) + Nostr NIP-94 | — | +| Physical | Printful REST API (200+ products, automated fulfillment) | — | + +--- + +### Writing / Narrative (1–4 weeks for pipeline; ongoing for quality) + +| Component | Recommendation | Notes | +|-----------|----------------|-------| +| LLM | Claude Opus 4.5/4.6 (leads Mazur Writing Benchmark at 8.561) | Already in use | +| Context | 500K tokens (1M in beta) — entire novels fit | — | +| Architecture | Outline-first → RAG lore bible → chapter-by-chapter generation | Without outline: novels meander | +| Lore management | WorldAnvil Pro or custom LoreScribe (local RAG) | No tool achieves 100% consistency | +| Publishing (ebooks) | Pandoc → EPUB / KDP PDF | pandoc-novel template on GitHub | +| Publishing (print) | Lulu Press REST API (80% profit, global print network) | KDP: no official API, 3-book/day limit | +| Publishing (Nostr) | NIP-23 kind:30023 long-form events | Habla.news, YakiHonne, Stacker News | +| Podcasts | LLM script → TTS (ElevenLabs or local Kokoro/MLX-Audio) → feedgen RSS → Fountain.fm | Value4Value sats-per-minute | + +**Key constraint:** AI-assisted (human directs, AI drafts) = 40% faster. Fully autonomous +without editing = "generic, soulless prose" and character drift by chapter 3 without explicit +memory. + +--- + +### World Building / Games (2 weeks–3 months depending on target) + +| Component | Recommendation | Notes | +|-----------|----------------|-------| +| Algorithms | Wave Function Collapse, Perlin noise (FastNoiseLite in Godot 4), L-systems | All mature | +| Platform | Godot Engine + gd-agentic-skills (82+ skills, 26 genre blueprints) | Strong LLM/GDScript knowledge | +| Narrative design | Knowledge graph (world state) + LLM + quest template grammar | CHI 2023 validated | +| Quick win | Luanti/Minetest (Lua API, 2,800+ open mods for reference) | Immediately feasible | +| Medium effort | OpenMW content creation (omwaddon format engineering required) | 2–3 months | +| Future | Unity MCP (AI direct Unity Editor interaction) | Early-stage | + +--- + +### Identity Architecture (2 months) + +The blueprint formalizes the **SOUL.md standard** (GitHub: aaronjmars/soul.md): + +| File | Purpose | +|------|---------| +| `SOUL.md` | Who you are — identity, worldview, opinions | +| `STYLE.md` | How you write — voice, syntax, patterns | +| `SKILL.md` | Operating modes | +| `MEMORY.md` | Session continuity | + +**Critical decision — static vs self-modifying identity:** +- Static Core Truths (version-controlled, human-approved changes only) ✓ +- Self-modifying Learned Preferences (logged with rollback, monitored by guardian) ✓ +- **Warning:** OpenClaw's "Soul Evolution" creates a security attack surface — Zenity Labs + demonstrated a complete zero-click attack chain targeting SOUL.md files. + +**Relevance to this repo:** Claude Code agents already use a `MEMORY.md` pattern in +this project. The SOUL.md stack is a natural extension. + +--- + +### Memory Architecture (2 months) + +Hybrid vector + knowledge graph is the recommendation: + +| Component | Tool | Notes | +|-----------|------|-------| +| Vector + KG combined | Mem0 (mem0.ai) | 26% accuracy improvement over OpenAI memory, 91% lower p95 latency, 90% token savings | +| Vector store | Qdrant (Rust, open-source) | High-throughput with metadata filtering | +| Temporal KG | Neo4j + Graphiti (Zep AI) | P95 retrieval: 300ms, hybrid semantic + BM25 + graph | +| Backup/migration | AgentKeeper (95% critical fact recovery across model migrations) | — | + +**Journal pattern (Stanford Generative Agents):** Agent writes about experiences, generates +high-level reflections 2–3x/day when importance scores exceed threshold. Ablation studies: +removing any component (observation, planning, reflection) significantly reduces behavioral +believability. + +**Cross-reference:** The existing `brain/` package is the memory system. Qdrant and +Mem0 are the recommended upgrade targets. + +--- + +### Multi-Agent Sub-System (3–6 months) + +The blueprint describes a named sub-agent hierarchy: + +| Agent | Role | +|-------|------| +| Oracle | Top-level planner / supervisor | +| Sentinel | Safety / moderation | +| Scout | Research / information gathering | +| Scribe | Writing / narrative | +| Ledger | Economic management | +| Weaver | Visual art generation | +| Composer | Music generation | +| Social | Platform publishing | + +**Orchestration options:** +- **Agno** (already in use) — microsecond instantiation, 50× less memory than LangGraph +- **CrewAI Flows** — event-driven with fine-grained control +- **LangGraph** — DAG-based with stateful workflows and time-travel debugging + +**Scheduling pattern (Stanford Generative Agents):** Top-down recursive daily → hourly → +5-minute planning. Event interrupts for reactive tasks. Re-planning triggers when accumulated +importance scores exceed threshold. + +**Cross-reference:** The existing `spark/` package (event capture, advisory engine) aligns +with this architecture. `infrastructure/event_bus` is the choreography backbone. + +--- + +### Economic Engine (1–4 weeks) + +Lightning Labs released `lightning-agent-tools` (open-source) in February 2026: +- `lnget` — CLI HTTP client for L402 payments +- Remote signer architecture (private keys on separate machine from agent) +- Scoped macaroon credentials (pay-only, invoice-only, read-only roles) +- **Aperture** — converts any API to pay-per-use via L402 (HTTP 402) + +| Option | Effort | Notes | +|--------|--------|-------| +| ln.bot | 1 week | "Bitcoin for AI Agents" — 3 commands create a wallet; CLI + MCP + REST | +| LND via gRPC | 2–3 weeks | Full programmatic node management for production | +| Coinbase Agentic Wallets | — | Fiat-adjacent; less aligned with sovereignty ethos | + +**Revenue channels:** Wavlake (music, 90/10 Lightning), Nostr zaps (articles), Stacker News +(earn sats from engagement), Printful (physical goods), L402-gated API access (pay-per-use +services), Geyser.fund (Lightning crowdfunding, better initial runway than micropayments). + +**Cross-reference:** The existing `lightning/` package in this repo is the foundation. +L402 paywall endpoints for Timmy's own services is the actionable gap. + +--- + +## Pioneer Case Studies + +| Agent | Active | Revenue | Key Lesson | +|-------|--------|---------|-----------| +| Botto | Since Oct 2021 | $5M+ (art auctions) | Community governance via DAO sustains engagement; "taste model" (humans guide, not direct) preserves autonomous authorship | +| Neuro-sama | Since Dec 2022 | $400K+/month (subscriptions) | 3+ years of iteration; errors became entertainment features; 24/7 capability is an insurmountable advantage | +| Truth Terminal | Since Jun 2024 | $20M accumulated | Memetic fitness > planned monetization; human gatekeeper approved tweets while selecting AI-intent responses; **establish legal entity first** | +| Holly+ | Since 2021 | Conceptual | DAO of stewards for voice governance; "identity play" as alternative to defensive IP | +| AI Sponge | 2023 | Banned | Unmoderated content → TOS violations + copyright | +| Nothing Forever | 2022–present | 8 viewers | Unmoderated content → ban → audience collapse; novelty-only propositions fail | + +**Universal pattern:** Human oversight + economic incentive alignment + multi-year personality +development + platform-native economics = success. + +--- + +## Recommended Implementation Sequence + +From the blueprint, mapped against Timmy's existing architecture: + +### Phase 1: Immediate (weeks) +1. **Code sovereignty** — Forgejo + Claude Code automated PR workflows (already substantially done) +2. **Music pipeline** — Suno API → Wavlake/Nostr NIP-94 publishing +3. **Visual art pipeline** — ComfyUI API → Blossom/Nostr with LoRA character consistency +4. **Basic Lightning wallet** — ln.bot integration for receiving micropayments +5. **Long-form publishing** — Nostr NIP-23 + RSS feed generation + +### Phase 2: Moderate effort (1–3 months) +6. **LATM tool registry** — frontier model creates Python utilities, caches them, lighter model applies +7. **Event-driven cross-domain reactions** — game event → blog + artwork + music (CrewAI/LangGraph) +8. **Podcast generation** — TTS + feedgen → Fountain.fm +9. **Self-improving pipeline** — agent creates, tests, caches own Python utilities +10. **Comic generation** — character-consistent panels with Jenova AI or local LoRA + +### Phase 3: Significant investment (3–6 months) +11. **Full sub-agent hierarchy** — Oracle/Sentinel/Scout/Scribe/Ledger/Weaver with Agno +12. **SOUL.md identity system** — bounded evolution + guardian monitoring +13. **Hybrid memory upgrade** — Qdrant + Mem0/Graphiti replacing or extending `brain/` +14. **Procedural world generation** — Godot + AI-driven narrative (quests, NPCs, lore) +15. **Self-sustaining economic loop** — earned revenue covers compute costs + +### Remains aspirational (12+ months) +- Fully autonomous novel-length fiction without editorial intervention +- YouTube monetization for AI-generated content (tightening platform policies) +- Copyright protection for AI-generated works (current US law denies this) +- True artistic identity evolution (genuine creative voice vs pattern remixing) +- Self-modifying architecture without regression or identity drift + +--- + +## Gap Analysis: Blueprint vs Current Codebase + +| Blueprint Capability | Current Status | Gap | +|---------------------|----------------|-----| +| Code sovereignty | Done (Claude Code + Forgejo) | LATM tool registry | +| Music generation | Not started | Suno API integration + Wavlake publishing | +| Visual art | Not started | ComfyUI API client + Blossom publishing | +| Writing/publishing | Not started | Nostr NIP-23 + Pandoc pipeline | +| World building | Bannerlord work (different scope) | Luanti mods as quick win | +| Identity (SOUL.md) | Partial (CLAUDE.md + MEMORY.md) | Full SOUL.md stack | +| Memory (hybrid) | `brain/` package (SQLite-based) | Qdrant + knowledge graph | +| Multi-agent | Agno in use | Named hierarchy + event choreography | +| Lightning payments | `lightning/` package | ln.bot wallet + L402 endpoints | +| Nostr identity | Referenced in roadmap, not built | NIP-05, NIP-89 capability cards | +| Legal entity | Unknown | **Must be resolved before economic activity** | + +--- + +## ADR Candidates + +Issues that warrant Architecture Decision Records based on this review: + +1. **LATM tool registry pattern** — How Timmy creates, tests, and caches self-made tools +2. **Music generation strategy** — Suno (cloud, commercial quality) vs MusicGen (local, CC-BY-NC) +3. **Memory upgrade path** — When/how to migrate `brain/` from SQLite to Qdrant + KG +4. **SOUL.md adoption** — Extending existing CLAUDE.md/MEMORY.md to full SOUL.md stack +5. **Lightning L402 strategy** — Which services Timmy gates behind micropayments +6. **Sub-agent naming and contracts** — Formalizing Oracle/Sentinel/Scout/Scribe/Ledger/Weaver