- Add connectors/ directory with base infrastructure - Implement SourceEvent unified schema (source/account/thread/author/timestamp/content/attachments/raw_ref/hash/consent_scope) - Create BaseConnector abstract class with checkpoint/dedup/consent gates - Implement TwitterArchiveConnector for official Twitter/X data exports - Add run_connector.py CLI entry point - Add comprehensive test suite (13 tests, all passing) - Add connectors/README.md with usage docs - Add Makefile targets: test-connectors, run-connector, connectors-help - Reference parent EPIC #194 and issue #233 This is the foundational connector pack. Future work: Discord, Slack, WhatsApp, Notion, iMessage, Google.
2.3 KiB
2.3 KiB
Sovereign Personal Archive Connector Pack
This directory contains the connector infrastructure for ingesting personal archives (Discord, Slack, WhatsApp, Notion, iMessage, X/Twitter, Google) into the compounding-intelligence knowledge pipeline.
Quick Start
# Run the Twitter archive connector
python3 scripts/run_connector.py twitter \
--source ~/Documents/TwitterArchive \
--output events.jsonl \
--limit 100
Connector Output Format
Each connector emits SourceEvent objects (one JSON line per event):
{
"source": "twitter",
"account": "user_archive",
"thread_or_channel": "tweet_123456",
"author": "user_archive",
"timestamp": "2026-04-26T08:30:00+00:00",
"content": "Tweet text here",
"attachments": ["https://..."],
"raw_ref": "twitter:archive:tweet.js:123456",
"hash": "sha256...",
"consent_scope": "memory_only",
"metadata": { "tweet_id": "123456", "favorite_count": 10 }
}
Connector Registry
| Name | Source Format | Status |
|---|---|---|
| twitter_archive | Official Twitter data export | ✅ Working |
| discord_archive | Discord data package / JSON | ⏳ Planned |
| slack_archive | Slack export / API | ⏳ Planned |
| whatsapp_archive | WhatsApp Desktop export | ⏳ Planned |
| notion_archive | Notion markdown/SQLite | ⏳ Planned |
| imessage_archive | macOS local chat storage | ⏳ Planned |
| google_archive | Google Workspace CLI | ⏳ Planned |
Design Principles
- Local-first: Connectors operate on user-owned exports or explicit API credentials.
- Incremental: Checkpoint files (~/.cache/connectors/) allow resumable processing.
- Consent-gated: Default
consent_scope=memory_only— explicit opt-in for broader use. - Provenance-preserving:
metadataretains all raw fields;hashenables deduplication. - Sovereign: No ambient scraping. No cloud dependency unless user explicitly configures tokens.
Writing a New Connector
Subclass BaseConnector from connectors/base.py and implement:
discover_sources(root: Path) -> Iterator[Path|str]— find source files or IDsparse_source(source) -> Iterator[SourceEvent]— emit normalized events
Register in connectors/__init__.py _REGISTRY dict.
See connectors/twitter_archive.py for a complete example.