Files
compounding-intelligence/connectors/README.md
Rockachopa 8628a0d610 feat(connectors): add sovereign personal archive connector pack foundation
- Add connectors/ directory with base infrastructure
- Implement SourceEvent unified schema (source/account/thread/author/timestamp/content/attachments/raw_ref/hash/consent_scope)
- Create BaseConnector abstract class with checkpoint/dedup/consent gates
- Implement TwitterArchiveConnector for official Twitter/X data exports
- Add run_connector.py CLI entry point
- Add comprehensive test suite (13 tests, all passing)
- Add connectors/README.md with usage docs
- Add Makefile targets: test-connectors, run-connector, connectors-help
- Reference parent EPIC #194 and issue #233

This is the foundational connector pack. Future work: Discord, Slack, WhatsApp, Notion, iMessage, Google.
2026-04-26 20:45:07 -04:00

2.3 KiB

Sovereign Personal Archive Connector Pack

This directory contains the connector infrastructure for ingesting personal archives (Discord, Slack, WhatsApp, Notion, iMessage, X/Twitter, Google) into the compounding-intelligence knowledge pipeline.

Quick Start

# Run the Twitter archive connector
python3 scripts/run_connector.py twitter \
  --source ~/Documents/TwitterArchive \
  --output events.jsonl \
  --limit 100

Connector Output Format

Each connector emits SourceEvent objects (one JSON line per event):

{
  "source": "twitter",
  "account": "user_archive",
  "thread_or_channel": "tweet_123456",
  "author": "user_archive",
  "timestamp": "2026-04-26T08:30:00+00:00",
  "content": "Tweet text here",
  "attachments": ["https://..."],
  "raw_ref": "twitter:archive:tweet.js:123456",
  "hash": "sha256...",
  "consent_scope": "memory_only",
  "metadata": { "tweet_id": "123456", "favorite_count": 10 }
}

Connector Registry

Name Source Format Status
twitter_archive Official Twitter data export Working
discord_archive Discord data package / JSON Planned
slack_archive Slack export / API Planned
whatsapp_archive WhatsApp Desktop export Planned
notion_archive Notion markdown/SQLite Planned
imessage_archive macOS local chat storage Planned
google_archive Google Workspace CLI Planned

Design Principles

  1. Local-first: Connectors operate on user-owned exports or explicit API credentials.
  2. Incremental: Checkpoint files (~/.cache/connectors/) allow resumable processing.
  3. Consent-gated: Default consent_scope=memory_only — explicit opt-in for broader use.
  4. Provenance-preserving: metadata retains all raw fields; hash enables deduplication.
  5. Sovereign: No ambient scraping. No cloud dependency unless user explicitly configures tokens.

Writing a New Connector

Subclass BaseConnector from connectors/base.py and implement:

  • discover_sources(root: Path) -> Iterator[Path|str] — find source files or IDs
  • parse_source(source) -> Iterator[SourceEvent] — emit normalized events

Register in connectors/__init__.py _REGISTRY dict.

See connectors/twitter_archive.py for a complete example.