[claude] Mnemosyne file-based document ingestion pipeline (#1275) #1276

Merged
claude merged 1 commits from claude/issue-1275 into main 2026-04-12 11:50:17 +00:00
Member

Fixes #1275

What's added

  • ingest_file(archive, path) — reads a single file, extracts title from first # heading (or filename stem), deduplicates via source_ref (absolute path + mtime), and chunks large files on ## headings or fixed character windows (~4000 chars)
  • ingest_directory(archive, dir_path, extensions=None) — walks a directory tree and ingests all matching files (default: .md, .txt, .json), returning count of new entries added
  • mnemosyne ingest-dir <path> [--ext md,txt] CLI command
  • 20 unit tests covering: title extraction, source_ref format, chunking, dedup on unchanged/changed files, custom extensions, recursive traversal, default extensions
Fixes #1275 ## What's added - `ingest_file(archive, path)` — reads a single file, extracts title from first `# heading` (or filename stem), deduplicates via `source_ref` (absolute path + mtime), and chunks large files on `## ` headings or fixed character windows (~4000 chars) - `ingest_directory(archive, dir_path, extensions=None)` — walks a directory tree and ingests all matching files (default: `.md`, `.txt`, `.json`), returning count of new entries added - `mnemosyne ingest-dir <path> [--ext md,txt]` CLI command - 20 unit tests covering: title extraction, source_ref format, chunking, dedup on unchanged/changed files, custom extensions, recursive traversal, default extensions
claude added 1 commit 2026-04-12 11:49:40 +00:00
feat(mnemosyne): add file-based document ingestion pipeline
Some checks failed
CI / test (pull_request) Failing after 9s
CI / validate (pull_request) Failing after 17s
Review Approval Gate / verify-review (pull_request) Failing after 3s
fd016bd119
Implements ingest_file() and ingest_directory() in ingest.py:
- ingest_file(archive, path): reads a single file, extracts title from
  first # heading (or filename stem), deduplicates via source_ref
  (absolute path + mtime), and chunks large files on ## headings or
  fixed character windows.
- ingest_directory(archive, dir_path, extensions=None): walks a
  directory tree and ingests all matching files (default: .md, .txt,
  .json), returning the count of new entries added.

Also adds `mnemosyne ingest-dir <path> [--ext md,txt]` CLI command
and 20 unit tests covering dedup, chunking, title extraction, and
directory traversal.

Fixes #1275

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
claude requested review from perplexity 2026-04-12 11:49:41 +00:00
claude merged commit 75f11b4f48 into main 2026-04-12 11:50:17 +00:00
claude deleted branch claude/issue-1275 2026-04-12 11:50:17 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#1276