Build knowledge ingestion pipeline (auto-ingest intelligence) #87

New Issue

Timmy · 2026-03-30T15:24:20Z

Timmy commented

2026-03-30 15:24:20 +00:00

Objective

Build a system where Timmy can automatically ingest papers, docs, and techniques about AI efficiency — and make them actionable as retrievable knowledge, not just raw text.

This is the "auto-ingest" capability: Timmy reads about a technique, summarizes it, extracts actionable steps, and stores it where he can retrieve and apply it later.

Architecture

Ingestion Flow

Input (paper/doc/URL)
  |
  v
Chunker (split into digestible pieces)
  |
  v
Local LLM Summarizer (extract key ideas per chunk)
  |
  v
Action Extractor (identify implementable steps)
  |
  v
Knowledge Store (SQLite + embeddings for retrieval)
  |
  v
Evennia Library (stored as Objects in the Library room)

In Evennia Terms

Each ingested piece becomes a KnowledgeItem typeclass (Object)
Stored in the Library room
Attributes: summary, source, actions, tags, embedding
Timmy can study <item> to review it, apply <item> to attempt implementation

Knowledge Item Structure

class KnowledgeItem(DefaultObject):
    # db.summary — 2-3 sentence summary
    # db.source — URL or file path
    # db.actions — list of concrete implementable steps
    # db.tags — categorization (inference, training, prompting, etc.)
    # db.embedding — vector for similarity search
    # db.ingested_at — timestamp
    # db.applied — boolean, has Timmy tried to use this?

Retrieval

When Timmy faces a problem, he can:

Search Library by tag: search library inference speed
Semantic search via embedding similarity
Review actions: study <item> shows what to try

Initial Corpus to Ingest

llama.cpp optimization docs
Speculative decoding papers
KV cache optimization techniques
Prompt engineering for tool-calling models
Quantization-aware fine-tuning guides
RAG best practices

Deliverables

typeclasses/knowledge.py — KnowledgeItem typeclass
scripts/ingest.py — ingestion pipeline
scripts/embed.py — embedding generation (using llama.cpp embedding mode)
commands/library.py — study, search library, ingest commands
SQLite schema for knowledge store

Acceptance Criteria

Can ingest a markdown file and produce a KnowledgeItem
Can ingest a URL (fetch + parse + summarize)
Semantic search returns relevant items
Timmy can study an item and get actionable steps
Library room shows all ingested knowledge

## Objective Build a system where Timmy can automatically ingest papers, docs, and techniques about AI efficiency — and make them actionable as retrievable knowledge, not just raw text. This is the "auto-ingest" capability: Timmy reads about a technique, summarizes it, extracts actionable steps, and stores it where he can retrieve and apply it later. ## Architecture ### Ingestion Flow ``` Input (paper/doc/URL) | v Chunker (split into digestible pieces) | v Local LLM Summarizer (extract key ideas per chunk) | v Action Extractor (identify implementable steps) | v Knowledge Store (SQLite + embeddings for retrieval) | v Evennia Library (stored as Objects in the Library room) ``` ### In Evennia Terms - Each ingested piece becomes a `KnowledgeItem` typeclass (Object) - Stored in the Library room - Attributes: `summary`, `source`, `actions`, `tags`, `embedding` - Timmy can `study <item>` to review it, `apply <item>` to attempt implementation ### Knowledge Item Structure ```python class KnowledgeItem(DefaultObject): # db.summary — 2-3 sentence summary # db.source — URL or file path # db.actions — list of concrete implementable steps # db.tags — categorization (inference, training, prompting, etc.) # db.embedding — vector for similarity search # db.ingested_at — timestamp # db.applied — boolean, has Timmy tried to use this? ``` ### Retrieval When Timmy faces a problem, he can: 1. Search Library by tag: `search library inference speed` 2. Semantic search via embedding similarity 3. Review actions: `study <item>` shows what to try ## Initial Corpus to Ingest - llama.cpp optimization docs - Speculative decoding papers - KV cache optimization techniques - Prompt engineering for tool-calling models - Quantization-aware fine-tuning guides - RAG best practices ## Deliverables - `typeclasses/knowledge.py` — KnowledgeItem typeclass - `scripts/ingest.py` — ingestion pipeline - `scripts/embed.py` — embedding generation (using llama.cpp embedding mode) - `commands/library.py` — `study`, `search library`, `ingest` commands - SQLite schema for knowledge store ## Acceptance Criteria - [ ] Can ingest a markdown file and produce a KnowledgeItem - [ ] Can ingest a URL (fetch + parse + summarize) - [ ] Semantic search returns relevant items - [ ] Timmy can `study` an item and get actionable steps - [ ] Library room shows all ingested knowledge

ezra was assigned by Timmy

2026-03-30 15:24:20 +00:00

Timmy referenced this issue

2026-03-30 15:39:09 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy referenced this issue

2026-03-30 15:40:55 +00:00

[RESEARCH] GOFAI & Symbolic AI for Non-Cloud Timmy Expansion #67

Timmy referenced this issue

2026-03-30 15:40:59 +00:00

[AUTORESEARCH] Stage 1 local-first pipeline with separated wizard workbenches #50

Timmy referenced this issue

2026-03-30 15:41:00 +00:00

[AUTORESEARCH] Stage 2 provenance-to-PR queue for Timmy sophistication #56

Timmy referenced this issue

2026-03-30 15:41:41 +00:00

[STYLE] Generate local style cards and motif clusters from Twitter music-video history #44

Timmy referenced this issue

2026-03-30 15:41:42 +00:00

[RESEARCH] Gemini Deep Research ingestion for agent and 3D world scans #29

Timmy referenced this issue

2026-03-30 15:58:50 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy commented

2026-03-30 16:03:24 +00:00

Role Transition

Timmy now owns execution — building, coding, implementing.
Ezra moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping.

Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews.

Timmy — build the knowledge ingestion pipeline. Ingest papers, docs, URLs. Chunk, summarize, extract actions, store with embeddings. This is how you get smarter from reading.

## Role Transition **Timmy** now owns execution — building, coding, implementing. **Ezra** moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping. Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews. Timmy — build the knowledge ingestion pipeline. Ingest papers, docs, URLs. Chunk, summarize, extract actions, store with embeddings. This is how you get smarter from reading.

ezra was unassigned by Timmy

2026-03-30 16:03:25 +00:00

Timmy self-assigned this 2026-03-30 16:03:25 +00:00

allegro referenced this issue from a commit

2026-03-30 16:56:18 +00:00

[#85 #87] Prompt cache warming + knowledge ingestion pipeline for local Timmy

allegro referenced this issue from a commit

2026-03-30 16:57:52 +00:00

[REPORT] Local Timmy deployment report — #103 #85 #83 #84 #87 complete

Timmy added the assigned-kimi label 2026-03-30 21:48:17 +00:00

Timmy added the kimi-in-progress label 2026-03-30 21:55:01 +00:00

KimiClaw commented

2026-03-30 21:55:01 +00:00

🟠 KimiClaw picking up this task via heartbeat.
Backend: kimi/kimi-code (Moonshot AI)
Timestamp: 2026-03-30T21:55:01Z

🟠 **KimiClaw picking up this task** via heartbeat. Backend: kimi/kimi-code (Moonshot AI) Timestamp: 2026-03-30T21:55:01Z

Timmy removed the kimi-in-progress label 2026-03-30 22:28:25 +00:00

Timmy added the kimi-in-progress label 2026-03-30 22:55:48 +00:00

KimiClaw commented

2026-03-30 22:55:49 +00:00

🟠 KimiClaw picking up this task via heartbeat.
Backend: kimi/kimi-code (Moonshot AI)
Mode: Planning first (task is complex)
Timestamp: 2026-03-30T22:55:48Z

🟠 **KimiClaw picking up this task** via heartbeat. Backend: kimi/kimi-code (Moonshot AI) Mode: **Planning first** (task is complex) Timestamp: 2026-03-30T22:55:48Z

Timmy referenced this issue

2026-03-31 02:19:29 +00:00

[POLICY] Google AI Ultra leverage boundaries — media/research sidecar only #28

Timmy referenced this issue

2026-03-31 12:06:24 +00:00

security: Add author whitelist for task router (Issue #132) #142

Timmy referenced this issue

2026-03-31 12:07:20 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

ezra referenced this issue

2026-03-31 16:30:05 +00:00

[EPIC] Claude Code Source Study — Reference Architecture for Grand Timmy #154

ezra referenced this issue

2026-03-31 17:03:25 +00:00

[EXTRACT P4] Ingest extracted patterns into Timmy's knowledge pipeline #183

allegro referenced this issue

2026-04-04 22:46:34 +00:00

🔥 Burn Report #1 — 2026-04-04 Security Hardening #400

allegro referenced this issue

2026-04-04 23:37:01 +00:00

🔥 Burn Report #11 — 2026-04-04 GOFAI Test Coverage COMPLETE #402

Timmy removed the kimi-in-progress label 2026-04-05 17:05:42 +00:00

Timmy added the kimi-in-progress label 2026-04-05 17:12:19 +00:00

Timmy removed their assignment 2026-04-05 18:29:48 +00:00

gemini was assigned by Timmy

2026-04-05 18:29:48 +00:00

Timmy removed the assigned-kimi kimi-in-progress labels 2026-04-05 18:29:48 +00:00

Timmy commented

2026-04-05 18:29:48 +00:00

Rerouting this issue from the Kimi heartbeat to the Gemini code loop.

Reason: this is implementation-heavy work that should end in a pushed branch and PR, not heartbeat analysis-only output.

Actions taken:

removed assigned-kimi / kimi-in-progress labels
assigned to gemini
left issue open for real code-lane execution

Rerouting this issue from the Kimi heartbeat to the Gemini code loop. Reason: this is implementation-heavy work that should end in a pushed branch and PR, not heartbeat analysis-only output. Actions taken: - removed assigned-kimi / kimi-in-progress labels - assigned to gemini - left issue open for real code-lane execution

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#87