Build RAG pipeline with local embeddings for grounded responses #93

New Issue

Timmy · 2026-03-30T15:24:23Z

Timmy commented

2026-03-30 15:24:23 +00:00

Objective

Give Timmy retrieval-augmented generation so he can ground answers in actual documents instead of relying on parametric memory.

Architecture

Query
  |
  v
Embed query (llama.cpp embedding mode)
  |
  v
Search vector store (FAISS or SQLite + cosine sim)
  |
  v
Retrieve top-K chunks
  |
  v
Inject into prompt as context
  |
  v
Generate grounded response

Implementation

1. Embedding Generation

llama.cpp can run in embedding mode:

llama-embedding -m model.gguf --embd-normalize 2 -p "text to embed"

Or use the /embedding endpoint on llama-server.

2. Document Indexing

Chunk documents (soul files, configs, scripts, knowledge items) into ~500 token pieces
Generate embeddings for each chunk
Store in SQLite with vector column (or FAISS index)

3. Retrieval at Query Time

Embed the incoming query
Find top 3 most similar chunks
Inject as context: "Reference material: [chunk1] [chunk2] [chunk3]"

4. Index Maintenance

Re-index when files change (watch ~/timmy/ for modifications)
Incremental indexing (only re-embed changed files)

In Evennia

Index stored as a Script (daemon)
index <path> command to manually index a file
retrieve <query> command to test retrieval
Automatic injection into think command

Deliverables

agent/embeddings.py — embedding generation via llama-server
agent/vector_store.py — SQLite-based vector storage + search
agent/rag.py — retrieval + prompt injection
scripts/index_documents.py — batch indexer
Benchmark: grounded vs ungrounded accuracy on factual questions

Acceptance Criteria

Can embed and index all files in ~/timmy/
Retrieval returns relevant chunks for test queries
Grounded responses cite actual file content
Index updates within 60s of file changes

## Objective Give Timmy retrieval-augmented generation so he can ground answers in actual documents instead of relying on parametric memory. ## Architecture ``` Query | v Embed query (llama.cpp embedding mode) | v Search vector store (FAISS or SQLite + cosine sim) | v Retrieve top-K chunks | v Inject into prompt as context | v Generate grounded response ``` ## Implementation ### 1. Embedding Generation llama.cpp can run in embedding mode: ```bash llama-embedding -m model.gguf --embd-normalize 2 -p "text to embed" ``` Or use the `/embedding` endpoint on llama-server. ### 2. Document Indexing - Chunk documents (soul files, configs, scripts, knowledge items) into ~500 token pieces - Generate embeddings for each chunk - Store in SQLite with vector column (or FAISS index) ### 3. Retrieval at Query Time - Embed the incoming query - Find top 3 most similar chunks - Inject as context: "Reference material: [chunk1] [chunk2] [chunk3]" ### 4. Index Maintenance - Re-index when files change (watch ~/timmy/ for modifications) - Incremental indexing (only re-embed changed files) ### In Evennia - Index stored as a Script (daemon) - `index <path>` command to manually index a file - `retrieve <query>` command to test retrieval - Automatic injection into `think` command ## Deliverables - `agent/embeddings.py` — embedding generation via llama-server - `agent/vector_store.py` — SQLite-based vector storage + search - `agent/rag.py` — retrieval + prompt injection - `scripts/index_documents.py` — batch indexer - Benchmark: grounded vs ungrounded accuracy on factual questions ## Acceptance Criteria - [ ] Can embed and index all files in ~/timmy/ - [ ] Retrieval returns relevant chunks for test queries - [ ] Grounded responses cite actual file content - [ ] Index updates within 60s of file changes

ezra was assigned by Timmy

2026-03-30 15:24:23 +00:00

Timmy referenced this issue

2026-03-30 15:39:09 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy referenced this issue

2026-03-30 15:52:08 +00:00

Build comprehensive caching layer — cache everywhere #103

Timmy referenced this issue

2026-03-30 15:58:49 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy commented

2026-03-30 16:03:27 +00:00

Role Transition

Timmy now owns execution — building, coding, implementing.
Ezra moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping.

Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews.

Timmy — build RAG with local embeddings. Index your soul files, configs, scripts. Retrieve relevant chunks at query time. Ground your responses in actual documents.

## Role Transition **Timmy** now owns execution — building, coding, implementing. **Ezra** moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping. Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews. Timmy — build RAG with local embeddings. Index your soul files, configs, scripts. Retrieve relevant chunks at query time. Ground your responses in actual documents.

ezra was unassigned by Timmy

2026-03-30 16:03:28 +00:00

Timmy self-assigned this 2026-03-30 16:03:28 +00:00

Timmy added the assigned-kimi label 2026-03-30 21:48:17 +00:00

Timmy added the kimi-in-progress label 2026-03-30 21:54:57 +00:00

KimiClaw commented

2026-03-30 21:54:57 +00:00

🟠 KimiClaw picking up this task via heartbeat.
Backend: kimi/kimi-code (Moonshot AI)
Timestamp: 2026-03-30T21:54:57Z

🟠 **KimiClaw picking up this task** via heartbeat. Backend: kimi/kimi-code (Moonshot AI) Timestamp: 2026-03-30T21:54:57Z

Timmy removed the kimi-in-progress label 2026-03-30 22:28:25 +00:00

Timmy added the kimi-in-progress label 2026-03-30 22:41:26 +00:00

KimiClaw commented

2026-03-30 22:41:26 +00:00

🟠 KimiClaw picking up this task via heartbeat.
Backend: kimi/kimi-code (Moonshot AI)
Mode: Planning first (task is complex)
Timestamp: 2026-03-30T22:41:26Z

🟠 **KimiClaw picking up this task** via heartbeat. Backend: kimi/kimi-code (Moonshot AI) Mode: **Planning first** (task is complex) Timestamp: 2026-03-30T22:41:26Z

ezra referenced this issue

2026-03-31 17:03:25 +00:00

[EXTRACT P4] Ingest extracted patterns into Timmy's knowledge pipeline #183

Timmy removed their assignment 2026-04-05 18:29:47 +00:00

gemini was assigned by Timmy

2026-04-05 18:29:47 +00:00

Timmy removed the assigned-kimi kimi-in-progress labels 2026-04-05 18:29:47 +00:00

Timmy commented

2026-04-05 18:29:47 +00:00

Rerouting this issue from the Kimi heartbeat to the Gemini code loop.

Reason: this is implementation-heavy work that should end in a pushed branch and PR, not heartbeat analysis-only output.

Actions taken:

removed assigned-kimi / kimi-in-progress labels
assigned to gemini
left issue open for real code-lane execution

Rerouting this issue from the Kimi heartbeat to the Gemini code loop. Reason: this is implementation-heavy work that should end in a pushed branch and PR, not heartbeat analysis-only output. Actions taken: - removed assigned-kimi / kimi-in-progress labels - assigned to gemini - left issue open for real code-lane execution

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#93