feat: sovereign DNS management via Cloudflare API (#692 )

DNS record management script with Cloudflare API integration: Operations: python3 scripts/dns-manager.py list --zone example.com python3 scripts/dns-manager.py add --zone example.com --name forge --ip 1.2.3.4 python3 scripts/dns-manager.py update --zone example.com --name forge --ip 5.6.7.8 python3 scripts/dns-manager.py delete --zone example.com --name forge python3 scripts/dns-manager.py sync --zone example.com --config dns-records.yaml Features: - Cloudflare API v4 integration (stdlib only, no deps) - Zone auto-resolution from domain name - Sync from YAML config (add missing, update changed) - API token from CLOUDFLARE_API_TOKEN env or ~/.config/cloudflare/token - Fleet DNS records: forge, bezalel, allegro subdomains Also: dns-records.yaml with current fleet domain mappings.
Merge pull request 'research: Long Context vs RAG Decision Framework (backlog item #4.3)' (#750 ) from research/long-context-vs-rag into main
2026-04-15 23:47:32 -04:00 · 2026-04-16 01:39:55 +00:00 · 2026-04-15 16:38:07 +00:00
3 changed files with 385 additions and 0 deletions
--- a/dns-records.yaml
+++ b/dns-records.yaml
@@ -0,0 +1,21 @@
+# DNS Records — Fleet Domain Configuration
+# Sync with: python3 scripts/dns-manager.py sync --zone alexanderwhitestone.com --config dns-records.yaml
+# Part of #692
+
+zone: alexanderwhitestone.com
+
+records:
+  - name: forge.alexanderwhitestone.com
+    ip: 143.198.27.163
+    ttl: 300
+    note: Gitea forge (Ezra VPS)
+
+  - name: bezalel.alexanderwhitestone.com
+    ip: 167.99.126.228
+    ttl: 300
+    note: Bezalel VPS
+
+  - name: allegro.alexanderwhitestone.com
+    ip: 167.99.126.228
+    ttl: 300
+    note: Allegro VPS (shared with Bezalel)
--- a/research/long-context-vs-rag-decision-framework.md
+++ b/research/long-context-vs-rag-decision-framework.md
@@ -0,0 +1,102 @@
+# Long Context vs RAG Decision Framework
+
+**Research Backlog Item #4.3** | Impact: 4 | Effort: 1 | Ratio: 4.0  
+**Date**: 2026-04-15  
+**Status**: RESEARCHED
+
+## Executive Summary
+
+Modern LLMs have 128K-200K+ context windows, but we still treat them like 4K models by default. This document provides a decision framework for when to stuff context vs. use RAG, based on empirical findings and our stack constraints.
+
+## The Core Insight
+
+**Long context ≠ better answers.** Research shows:
+- "Lost in the Middle" effect: Models attend poorly to information in the middle of long contexts (Liu et al., 2023)
+- RAG with reranking outperforms full-context stuffing for document QA when docs > 50K tokens
+- Cost scales quadratically with context length (attention computation)
+- Latency increases linearly with input length
+
+**RAG ≠ always better.** Retrieval introduces:
+- Recall errors (miss relevant chunks)
+- Precision errors (retrieve irrelevant chunks)
+- Chunking artifacts (splitting mid-sentence)
+- Additional latency for embedding + search
+
+## Decision Matrix
+
+| Scenario | Context Size | Recommendation | Why |
+|----------|-------------|---------------|-----|
+| Single conversation (< 32K) | Small | **Stuff everything** | No retrieval overhead, full context available |
+| 5-20 documents, focused query | 32K-128K | **Hybrid** | Key docs in context, rest via RAG |
+| Large corpus search | > 128K | **Pure RAG + reranking** | Full context impossible, must retrieve |
+| Code review (< 5 files) | < 32K | **Stuff everything** | Code needs full context for understanding |
+| Code review (repo-wide) | > 128K | **RAG with file-level chunks** | Files are natural chunk boundaries |
+| Multi-turn conversation | Growing | **Hybrid + compression** | Keep recent turns in full, compress older |
+| Fact retrieval | Any | **RAG** | Always faster to search than read everything |
+| Complex reasoning across docs | 32K-128K | **Stuff + chain-of-thought** | Models need all context for cross-doc reasoning |
+
+## Our Stack Constraints
+
+### What We Have
+- **Cloud models**: 128K-200K context (OpenRouter providers)
+- **Local Ollama**: 8K-32K context (Gemma-4 default 8192)
+- **Hermes fact_store**: SQLite FTS5 full-text search
+- **Memory**: MemPalace holographic embeddings
+- **Session context**: Growing conversation history
+
+### What This Means
+1. **Cloud sessions**: We CAN stuff up to 128K but SHOULD we? Cost and latency matter.
+2. **Local sessions**: MUST use RAG for anything beyond 8K. Long context not available.
+3. **Mixed fleet**: Need a routing layer that decides per-session.
+
+## Advanced Patterns
+
+### 1. Progressive Context Loading
+Don't load everything at once. Start with RAG, then stuff additional docs as needed:
+```
+Turn 1: RAG search → top 3 chunks
+Turn 2: Model asks "I need more context about X" → stuff X
+Turn 3: Model has enough → continue
+```
+
+### 2. Context Budgeting
+Allocate context budget across components:
+```
+System prompt:     2,000 tokens  (always)
+Recent messages:  10,000 tokens  (last 5 turns)
+RAG results:       8,000 tokens  (top chunks)
+Stuffed docs:     12,000 tokens  (key docs)
+---------------------------
+Total:            32,000 tokens  (fits 32K model)
+```
+
+### 3. Smart Compression
+Before stuffing, compress older context:
+- Summarize turns older than 10
+- Remove tool call results (keep only final outputs)
+- Deduplicate repeated information
+- Use structured representations (JSON) instead of prose
+
+## Empirical Benchmarks Needed
+
+1. **Stuffing vs RAG accuracy** on our fact_store queries
+2. **Latency comparison** at 32K, 64K, 128K context
+3. **Cost per query** for cloud models at various context sizes
+4. **Local model behavior** when pushed beyond rated context
+
+## Recommendations
+
+1. **Audit current context usage**: How many sessions hit > 32K? (Low effort, high value)
+2. **Implement ContextRouter**: ~50 LOC, adds routing decisions to hermes
+3. **Add context-size logging**: Track input tokens per session for data gathering
+
+## References
+
+- Liu et al. "Lost in the Middle: How Language Models Use Long Contexts" (2023) — https://arxiv.org/abs/2307.03172
+- Shi et al. "Large Language Models are Easily Distracted by Irrelevant Context" (2023)
+- Xu et al. "Retrieval Meets Long Context LLMs" (2023) — hybrid approaches outperform both alone
+- Anthropic's Claude 3.5 context caching — built-in prefix caching reduces cost for repeated system prompts
+
+---
+
+*Sovereignty and service always.*
--- a/scripts/dns-manager.py
+++ b/scripts/dns-manager.py
@@ -0,0 +1,262 @@
+#!/usr/bin/env python3
+"""
+dns-manager.py — Manage DNS records via Cloudflare API.
+
+Provides add/update/delete/list operations for DNS A records.
+Designed for fleet VPS nodes that need API-driven DNS management.
+
+Usage:
+    python3 scripts/dns-manager.py list --zone alexanderwhitestone.com
+    python3 scripts/dns-manager.py add --zone alexanderwhitestone.com --name forge --ip 143.198.27.163
+    python3 scripts/dns-manager.py update --zone alexanderwhitestone.com --name forge --ip 167.99.126.228
+    python3 scripts/dns-manager.py delete --zone alexanderwhitestone.com --name forge
+    python3 scripts/dns-manager.py sync --config dns-records.yaml
+
+Config via env:
+    CLOUDFLARE_API_TOKEN — API token with DNS:Edit permission
+    CLOUDFLARE_ZONE_ID   — Zone ID (auto-resolved if not set)
+
+Part of #692: Sovereign DNS management.
+"""
+
+import argparse
+import json
+import os
+import sys
+import urllib.request
+import urllib.error
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+CF_API = "https://api.cloudflare.com/client/v4"
+
+# ── Auth ──────────────────────────────────────────────────────────────────
+
+def get_token() -> str:
+    """Get Cloudflare API token from env or config."""
+    token = os.environ.get("CLOUDFLARE_API_TOKEN", "")
+    if not token:
+        token_path = Path.home() / ".config" / "cloudflare" / "token"
+        if token_path.exists():
+            token = token_path.read_text().strip()
+    if not token:
+        print("ERROR: No Cloudflare API token found.", file=sys.stderr)
+        print("Set CLOUDFLARE_API_TOKEN env var or create ~/.config/cloudflare/token", file=sys.stderr)
+        sys.exit(1)
+    return token
+
+
+def cf_request(method: str, path: str, token: str, data: dict = None) -> dict:
+    """Make a Cloudflare API request."""
+    url = f"{CF_API}{path}"
+    headers = {
+        "Authorization": f"Bearer {token}",
+        "Content-Type": "application/json",
+    }
+    
+    body = json.dumps(data).encode() if data else None
+    req = urllib.request.Request(url, data=body, headers=headers, method=method)
+    
+    try:
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            result = json.loads(resp.read().decode())
+            if not result.get("success", True):
+                errors = result.get("errors", [])
+                print(f"API error: {errors}", file=sys.stderr)
+                sys.exit(1)
+            return result
+    except urllib.error.HTTPError as e:
+        body = e.read().decode() if e.fp else ""
+        print(f"HTTP {e.code}: {body[:500]}", file=sys.stderr)
+        sys.exit(1)
+
+
+# ── Zone Resolution ──────────────────────────────────────────────────────
+
+def resolve_zone_id(zone_name: str, token: str) -> str:
+    """Resolve zone name to zone ID."""
+    cached = os.environ.get("CLOUDFLARE_ZONE_ID", "")
+    if cached:
+        return cached
+    
+    result = cf_request("GET", f"/zones?name={zone_name}", token)
+    zones = result.get("result", [])
+    if not zones:
+        print(f"ERROR: Zone '{zone_name}' not found", file=sys.stderr)
+        sys.exit(1)
+    return zones[0]["id"]
+
+
+# ── DNS Operations ───────────────────────────────────────────────────────
+
+def list_records(zone_id: str, token: str, name_filter: str = "") -> List[dict]:
+    """List DNS records in a zone."""
+    path = f"/zones/{zone_id}/dns_records?per_page=100"
+    if name_filter:
+        path += f"&name={name_filter}"
+    result = cf_request("GET", path, token)
+    return result.get("result", [])
+
+
+def find_record(zone_id: str, token: str, name: str, record_type: str = "A") -> Optional[dict]:
+    """Find a specific DNS record."""
+    records = list_records(zone_id, token, name)
+    for r in records:
+        if r["name"] == name and r["type"] == record_type:
+            return r
+    return None
+
+
+def add_record(zone_id: str, token: str, name: str, ip: str, ttl: int = 300, proxied: bool = False) -> dict:
+    """Add a new DNS A record."""
+    # Check if record already exists
+    existing = find_record(zone_id, token, name)
+    if existing:
+        print(f"Record {name} already exists (IP: {existing['content']}). Use 'update' to change.")
+        return existing
+    
+    data = {
+        "type": "A",
+        "name": name,
+        "content": ip,
+        "ttl": ttl,
+        "proxied": proxied,
+    }
+    result = cf_request("POST", f"/zones/{zone_id}/dns_records", token, data)
+    record = result["result"]
+    print(f"Added: {record['name']} -> {record['content']} (ID: {record['id']})")
+    return record
+
+
+def update_record(zone_id: str, token: str, name: str, ip: str, ttl: int = 300) -> dict:
+    """Update an existing DNS A record."""
+    existing = find_record(zone_id, token, name)
+    if not existing:
+        print(f"Record {name} not found. Use 'add' to create it.")
+        sys.exit(1)
+    
+    data = {
+        "type": "A",
+        "name": name,
+        "content": ip,
+        "ttl": ttl,
+        "proxied": existing.get("proxied", False),
+    }
+    result = cf_request("PUT", f"/zones/{zone_id}/dns_records/{existing['id']}", token, data)
+    record = result["result"]
+    print(f"Updated: {record['name']} {existing['content']} -> {record['content']}")
+    return record
+
+
+def delete_record(zone_id: str, token: str, name: str) -> bool:
+    """Delete a DNS A record."""
+    existing = find_record(zone_id, token, name)
+    if not existing:
+        print(f"Record {name} not found.")
+        return False
+    
+    cf_request("DELETE", f"/zones/{zone_id}/dns_records/{existing['id']}", token)
+    print(f"Deleted: {name} ({existing['content']})")
+    return True
+
+
+def sync_records(zone_id: str, token: str, config_path: str):
+    """Sync DNS records from a YAML config file."""
+    try:
+        import yaml
+    except ImportError:
+        print("ERROR: PyYAML required for sync. Install: pip install pyyaml", file=sys.stderr)
+        sys.exit(1)
+    
+    with open(config_path) as f:
+        config = yaml.safe_load(f)
+    
+    desired = config.get("records", [])
+    current = {r["name"]: r for r in list_records(zone_id, token)}
+    
+    added = 0
+    updated = 0
+    unchanged = 0
+    
+    for rec in desired:
+        name = rec["name"]
+        ip = rec["ip"]
+        ttl = rec.get("ttl", 300)
+        
+        if name in current:
+            if current[name]["content"] == ip:
+                unchanged += 1
+            else:
+                update_record(zone_id, token, name, ip, ttl)
+                updated += 1
+        else:
+            add_record(zone_id, token, name, ip, ttl)
+            added += 1
+    
+    print(f"\nSync complete: {added} added, {updated} updated, {unchanged} unchanged")
+
+
+# ── CLI ──────────────────────────────────────────────────────────────────
+
+def main():
+    parser = argparse.ArgumentParser(description="Manage DNS records via Cloudflare API")
+    sub = parser.add_subparsers(dest="command")
+    
+    # list
+    p_list = sub.add_parser("list", help="List DNS records")
+    p_list.add_argument("--zone", required=True, help="Zone name (e.g., example.com)")
+    p_list.add_argument("--name", default="", help="Filter by record name")
+    
+    # add
+    p_add = sub.add_parser("add", help="Add DNS A record")
+    p_add.add_argument("--zone", required=True)
+    p_add.add_argument("--name", required=True, help="Record name (e.g., forge.example.com)")
+    p_add.add_argument("--ip", required=True, help="IPv4 address")
+    p_add.add_argument("--ttl", type=int, default=300)
+    
+    # update
+    p_update = sub.add_parser("update", help="Update DNS A record")
+    p_update.add_argument("--zone", required=True)
+    p_update.add_argument("--name", required=True)
+    p_update.add_argument("--ip", required=True)
+    p_update.add_argument("--ttl", type=int, default=300)
+    
+    # delete
+    p_delete = sub.add_parser("delete", help="Delete DNS A record")
+    p_delete.add_argument("--zone", required=True)
+    p_delete.add_argument("--name", required=True)
+    
+    # sync
+    p_sync = sub.add_parser("sync", help="Sync records from YAML config")
+    p_sync.add_argument("--zone", required=True)
+    p_sync.add_argument("--config", required=True, help="Path to YAML config")
+    
+    args = parser.parse_args()
+    if not args.command:
+        parser.print_help()
+        sys.exit(1)
+    
+    token = get_token()
+    zone_id = resolve_zone_id(args.zone, token)
+    
+    if args.command == "list":
+        records = list_records(zone_id, token, args.name)
+        for r in sorted(records, key=lambda x: x["name"]):
+            print(f"  {r['type']:5s} {r['name']:40s} -> {r['content']:20s} TTL:{r['ttl']}")
+        print(f"\n{len(records)} records")
+    
+    elif args.command == "add":
+        add_record(zone_id, token, args.name, args.ip, args.ttl)
+    
+    elif args.command == "update":
+        update_record(zone_id, token, args.name, args.ip, args.ttl)
+    
+    elif args.command == "delete":
+        delete_record(zone_id, token, args.name)
+    
+    elif args.command == "sync":
+        sync_records(zone_id, token, args.config)
+
+
+if __name__ == "__main__":
+    main()