Build knowledge ingestion pipeline (auto-ingest intelligence) #87

Open
opened 2026-03-30 15:24:20 +00:00 by Timmy · 4 comments
Owner

Objective

Build a system where Timmy can automatically ingest papers, docs, and techniques about AI efficiency — and make them actionable as retrievable knowledge, not just raw text.

This is the "auto-ingest" capability: Timmy reads about a technique, summarizes it, extracts actionable steps, and stores it where he can retrieve and apply it later.

Architecture

Ingestion Flow

Input (paper/doc/URL)
  |
  v
Chunker (split into digestible pieces)
  |
  v
Local LLM Summarizer (extract key ideas per chunk)
  |
  v
Action Extractor (identify implementable steps)
  |
  v
Knowledge Store (SQLite + embeddings for retrieval)
  |
  v
Evennia Library (stored as Objects in the Library room)

In Evennia Terms

  • Each ingested piece becomes a KnowledgeItem typeclass (Object)
  • Stored in the Library room
  • Attributes: summary, source, actions, tags, embedding
  • Timmy can study <item> to review it, apply <item> to attempt implementation

Knowledge Item Structure

class KnowledgeItem(DefaultObject):
    # db.summary — 2-3 sentence summary
    # db.source — URL or file path
    # db.actions — list of concrete implementable steps
    # db.tags — categorization (inference, training, prompting, etc.)
    # db.embedding — vector for similarity search
    # db.ingested_at — timestamp
    # db.applied — boolean, has Timmy tried to use this?

Retrieval

When Timmy faces a problem, he can:

  1. Search Library by tag: search library inference speed
  2. Semantic search via embedding similarity
  3. Review actions: study <item> shows what to try

Initial Corpus to Ingest

  • llama.cpp optimization docs
  • Speculative decoding papers
  • KV cache optimization techniques
  • Prompt engineering for tool-calling models
  • Quantization-aware fine-tuning guides
  • RAG best practices

Deliverables

  • typeclasses/knowledge.py — KnowledgeItem typeclass
  • scripts/ingest.py — ingestion pipeline
  • scripts/embed.py — embedding generation (using llama.cpp embedding mode)
  • commands/library.pystudy, search library, ingest commands
  • SQLite schema for knowledge store

Acceptance Criteria

  • Can ingest a markdown file and produce a KnowledgeItem
  • Can ingest a URL (fetch + parse + summarize)
  • Semantic search returns relevant items
  • Timmy can study an item and get actionable steps
  • Library room shows all ingested knowledge
## Objective Build a system where Timmy can automatically ingest papers, docs, and techniques about AI efficiency — and make them actionable as retrievable knowledge, not just raw text. This is the "auto-ingest" capability: Timmy reads about a technique, summarizes it, extracts actionable steps, and stores it where he can retrieve and apply it later. ## Architecture ### Ingestion Flow ``` Input (paper/doc/URL) | v Chunker (split into digestible pieces) | v Local LLM Summarizer (extract key ideas per chunk) | v Action Extractor (identify implementable steps) | v Knowledge Store (SQLite + embeddings for retrieval) | v Evennia Library (stored as Objects in the Library room) ``` ### In Evennia Terms - Each ingested piece becomes a `KnowledgeItem` typeclass (Object) - Stored in the Library room - Attributes: `summary`, `source`, `actions`, `tags`, `embedding` - Timmy can `study <item>` to review it, `apply <item>` to attempt implementation ### Knowledge Item Structure ```python class KnowledgeItem(DefaultObject): # db.summary — 2-3 sentence summary # db.source — URL or file path # db.actions — list of concrete implementable steps # db.tags — categorization (inference, training, prompting, etc.) # db.embedding — vector for similarity search # db.ingested_at — timestamp # db.applied — boolean, has Timmy tried to use this? ``` ### Retrieval When Timmy faces a problem, he can: 1. Search Library by tag: `search library inference speed` 2. Semantic search via embedding similarity 3. Review actions: `study <item>` shows what to try ## Initial Corpus to Ingest - llama.cpp optimization docs - Speculative decoding papers - KV cache optimization techniques - Prompt engineering for tool-calling models - Quantization-aware fine-tuning guides - RAG best practices ## Deliverables - `typeclasses/knowledge.py` — KnowledgeItem typeclass - `scripts/ingest.py` — ingestion pipeline - `scripts/embed.py` — embedding generation (using llama.cpp embedding mode) - `commands/library.py` — `study`, `search library`, `ingest` commands - SQLite schema for knowledge store ## Acceptance Criteria - [ ] Can ingest a markdown file and produce a KnowledgeItem - [ ] Can ingest a URL (fetch + parse + summarize) - [ ] Semantic search returns relevant items - [ ] Timmy can `study` an item and get actionable steps - [ ] Library room shows all ingested knowledge
ezra was assigned by Timmy 2026-03-30 15:24:20 +00:00
Author
Owner

Role Transition

Timmy now owns execution — building, coding, implementing.
Ezra moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping.

Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews.

Timmy — build the knowledge ingestion pipeline. Ingest papers, docs, URLs. Chunk, summarize, extract actions, store with embeddings. This is how you get smarter from reading.

## Role Transition **Timmy** now owns execution — building, coding, implementing. **Ezra** moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping. Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews. Timmy — build the knowledge ingestion pipeline. Ingest papers, docs, URLs. Chunk, summarize, extract actions, store with embeddings. This is how you get smarter from reading.
ezra was unassigned by Timmy 2026-03-30 16:03:25 +00:00
Timmy self-assigned this 2026-03-30 16:03:25 +00:00
Timmy added the assigned-kimi label 2026-03-30 21:48:17 +00:00
Timmy added the kimi-in-progress label 2026-03-30 21:55:01 +00:00
Collaborator

🟠 KimiClaw picking up this task via heartbeat.
Backend: kimi/kimi-code (Moonshot AI)
Timestamp: 2026-03-30T21:55:01Z

🟠 **KimiClaw picking up this task** via heartbeat. Backend: kimi/kimi-code (Moonshot AI) Timestamp: 2026-03-30T21:55:01Z
Timmy removed the kimi-in-progress label 2026-03-30 22:28:25 +00:00
Timmy added the kimi-in-progress label 2026-03-30 22:55:48 +00:00
Collaborator

🟠 KimiClaw picking up this task via heartbeat.
Backend: kimi/kimi-code (Moonshot AI)
Mode: Planning first (task is complex)
Timestamp: 2026-03-30T22:55:48Z

🟠 **KimiClaw picking up this task** via heartbeat. Backend: kimi/kimi-code (Moonshot AI) Mode: **Planning first** (task is complex) Timestamp: 2026-03-30T22:55:48Z
Timmy removed the kimi-in-progress label 2026-04-05 17:05:42 +00:00
Timmy added the kimi-in-progress label 2026-04-05 17:12:19 +00:00
Timmy removed their assignment 2026-04-05 18:29:48 +00:00
gemini was assigned by Timmy 2026-04-05 18:29:48 +00:00
Timmy removed the assigned-kimikimi-in-progress labels 2026-04-05 18:29:48 +00:00
Author
Owner

Rerouting this issue from the Kimi heartbeat to the Gemini code loop.

Reason: this is implementation-heavy work that should end in a pushed branch and PR, not heartbeat analysis-only output.

Actions taken:

  • removed assigned-kimi / kimi-in-progress labels
  • assigned to gemini
  • left issue open for real code-lane execution
Rerouting this issue from the Kimi heartbeat to the Gemini code loop. Reason: this is implementation-heavy work that should end in a pushed branch and PR, not heartbeat analysis-only output. Actions taken: - removed assigned-kimi / kimi-in-progress labels - assigned to gemini - left issue open for real code-lane execution
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#87