Build RAG pipeline with local embeddings for grounded responses #93

Open
opened 2026-03-30 15:24:23 +00:00 by Timmy · 4 comments
Owner

Objective

Give Timmy retrieval-augmented generation so he can ground answers in actual documents instead of relying on parametric memory.

Architecture

Query
  |
  v
Embed query (llama.cpp embedding mode)
  |
  v
Search vector store (FAISS or SQLite + cosine sim)
  |
  v
Retrieve top-K chunks
  |
  v
Inject into prompt as context
  |
  v
Generate grounded response

Implementation

1. Embedding Generation

llama.cpp can run in embedding mode:

llama-embedding -m model.gguf --embd-normalize 2 -p "text to embed"

Or use the /embedding endpoint on llama-server.

2. Document Indexing

  • Chunk documents (soul files, configs, scripts, knowledge items) into ~500 token pieces
  • Generate embeddings for each chunk
  • Store in SQLite with vector column (or FAISS index)

3. Retrieval at Query Time

  • Embed the incoming query
  • Find top 3 most similar chunks
  • Inject as context: "Reference material: [chunk1] [chunk2] [chunk3]"

4. Index Maintenance

  • Re-index when files change (watch ~/timmy/ for modifications)
  • Incremental indexing (only re-embed changed files)

In Evennia

  • Index stored as a Script (daemon)
  • index <path> command to manually index a file
  • retrieve <query> command to test retrieval
  • Automatic injection into think command

Deliverables

  • agent/embeddings.py — embedding generation via llama-server
  • agent/vector_store.py — SQLite-based vector storage + search
  • agent/rag.py — retrieval + prompt injection
  • scripts/index_documents.py — batch indexer
  • Benchmark: grounded vs ungrounded accuracy on factual questions

Acceptance Criteria

  • Can embed and index all files in ~/timmy/
  • Retrieval returns relevant chunks for test queries
  • Grounded responses cite actual file content
  • Index updates within 60s of file changes
## Objective Give Timmy retrieval-augmented generation so he can ground answers in actual documents instead of relying on parametric memory. ## Architecture ``` Query | v Embed query (llama.cpp embedding mode) | v Search vector store (FAISS or SQLite + cosine sim) | v Retrieve top-K chunks | v Inject into prompt as context | v Generate grounded response ``` ## Implementation ### 1. Embedding Generation llama.cpp can run in embedding mode: ```bash llama-embedding -m model.gguf --embd-normalize 2 -p "text to embed" ``` Or use the `/embedding` endpoint on llama-server. ### 2. Document Indexing - Chunk documents (soul files, configs, scripts, knowledge items) into ~500 token pieces - Generate embeddings for each chunk - Store in SQLite with vector column (or FAISS index) ### 3. Retrieval at Query Time - Embed the incoming query - Find top 3 most similar chunks - Inject as context: "Reference material: [chunk1] [chunk2] [chunk3]" ### 4. Index Maintenance - Re-index when files change (watch ~/timmy/ for modifications) - Incremental indexing (only re-embed changed files) ### In Evennia - Index stored as a Script (daemon) - `index <path>` command to manually index a file - `retrieve <query>` command to test retrieval - Automatic injection into `think` command ## Deliverables - `agent/embeddings.py` — embedding generation via llama-server - `agent/vector_store.py` — SQLite-based vector storage + search - `agent/rag.py` — retrieval + prompt injection - `scripts/index_documents.py` — batch indexer - Benchmark: grounded vs ungrounded accuracy on factual questions ## Acceptance Criteria - [ ] Can embed and index all files in ~/timmy/ - [ ] Retrieval returns relevant chunks for test queries - [ ] Grounded responses cite actual file content - [ ] Index updates within 60s of file changes
ezra was assigned by Timmy 2026-03-30 15:24:23 +00:00
Author
Owner

Role Transition

Timmy now owns execution — building, coding, implementing.
Ezra moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping.

Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews.

Timmy — build RAG with local embeddings. Index your soul files, configs, scripts. Retrieve relevant chunks at query time. Ground your responses in actual documents.

## Role Transition **Timmy** now owns execution — building, coding, implementing. **Ezra** moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping. Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews. Timmy — build RAG with local embeddings. Index your soul files, configs, scripts. Retrieve relevant chunks at query time. Ground your responses in actual documents.
ezra was unassigned by Timmy 2026-03-30 16:03:28 +00:00
Timmy self-assigned this 2026-03-30 16:03:28 +00:00
Timmy added the assigned-kimi label 2026-03-30 21:48:17 +00:00
Timmy added the kimi-in-progress label 2026-03-30 21:54:57 +00:00
Collaborator

🟠 KimiClaw picking up this task via heartbeat.
Backend: kimi/kimi-code (Moonshot AI)
Timestamp: 2026-03-30T21:54:57Z

🟠 **KimiClaw picking up this task** via heartbeat. Backend: kimi/kimi-code (Moonshot AI) Timestamp: 2026-03-30T21:54:57Z
Timmy removed the kimi-in-progress label 2026-03-30 22:28:25 +00:00
Timmy added the kimi-in-progress label 2026-03-30 22:41:26 +00:00
Collaborator

🟠 KimiClaw picking up this task via heartbeat.
Backend: kimi/kimi-code (Moonshot AI)
Mode: Planning first (task is complex)
Timestamp: 2026-03-30T22:41:26Z

🟠 **KimiClaw picking up this task** via heartbeat. Backend: kimi/kimi-code (Moonshot AI) Mode: **Planning first** (task is complex) Timestamp: 2026-03-30T22:41:26Z
Timmy removed their assignment 2026-04-05 18:29:47 +00:00
gemini was assigned by Timmy 2026-04-05 18:29:47 +00:00
Timmy removed the assigned-kimikimi-in-progress labels 2026-04-05 18:29:47 +00:00
Author
Owner

Rerouting this issue from the Kimi heartbeat to the Gemini code loop.

Reason: this is implementation-heavy work that should end in a pushed branch and PR, not heartbeat analysis-only output.

Actions taken:

  • removed assigned-kimi / kimi-in-progress labels
  • assigned to gemini
  • left issue open for real code-lane execution
Rerouting this issue from the Kimi heartbeat to the Gemini code loop. Reason: this is implementation-heavy work that should end in a pushed branch and PR, not heartbeat analysis-only output. Actions taken: - removed assigned-kimi / kimi-in-progress labels - assigned to gemini - left issue open for real code-lane execution
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#93