Compare commits
3 Commits
fix/674
...
fix/692-so
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ce3da2dbc4 | ||
| 601c5fe267 | |||
| 6222b18a38 |
21
dns-records.yaml
Normal file
21
dns-records.yaml
Normal file
@@ -0,0 +1,21 @@
|
||||
# DNS Records — Fleet Domain Configuration
|
||||
# Sync with: python3 scripts/dns-manager.py sync --zone alexanderwhitestone.com --config dns-records.yaml
|
||||
# Part of #692
|
||||
|
||||
zone: alexanderwhitestone.com
|
||||
|
||||
records:
|
||||
- name: forge.alexanderwhitestone.com
|
||||
ip: 143.198.27.163
|
||||
ttl: 300
|
||||
note: Gitea forge (Ezra VPS)
|
||||
|
||||
- name: bezalel.alexanderwhitestone.com
|
||||
ip: 167.99.126.228
|
||||
ttl: 300
|
||||
note: Bezalel VPS
|
||||
|
||||
- name: allegro.alexanderwhitestone.com
|
||||
ip: 167.99.126.228
|
||||
ttl: 300
|
||||
note: Allegro VPS (shared with Bezalel)
|
||||
102
research/long-context-vs-rag-decision-framework.md
Normal file
102
research/long-context-vs-rag-decision-framework.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Long Context vs RAG Decision Framework
|
||||
|
||||
**Research Backlog Item #4.3** | Impact: 4 | Effort: 1 | Ratio: 4.0
|
||||
**Date**: 2026-04-15
|
||||
**Status**: RESEARCHED
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Modern LLMs have 128K-200K+ context windows, but we still treat them like 4K models by default. This document provides a decision framework for when to stuff context vs. use RAG, based on empirical findings and our stack constraints.
|
||||
|
||||
## The Core Insight
|
||||
|
||||
**Long context ≠ better answers.** Research shows:
|
||||
- "Lost in the Middle" effect: Models attend poorly to information in the middle of long contexts (Liu et al., 2023)
|
||||
- RAG with reranking outperforms full-context stuffing for document QA when docs > 50K tokens
|
||||
- Cost scales quadratically with context length (attention computation)
|
||||
- Latency increases linearly with input length
|
||||
|
||||
**RAG ≠ always better.** Retrieval introduces:
|
||||
- Recall errors (miss relevant chunks)
|
||||
- Precision errors (retrieve irrelevant chunks)
|
||||
- Chunking artifacts (splitting mid-sentence)
|
||||
- Additional latency for embedding + search
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Scenario | Context Size | Recommendation | Why |
|
||||
|----------|-------------|---------------|-----|
|
||||
| Single conversation (< 32K) | Small | **Stuff everything** | No retrieval overhead, full context available |
|
||||
| 5-20 documents, focused query | 32K-128K | **Hybrid** | Key docs in context, rest via RAG |
|
||||
| Large corpus search | > 128K | **Pure RAG + reranking** | Full context impossible, must retrieve |
|
||||
| Code review (< 5 files) | < 32K | **Stuff everything** | Code needs full context for understanding |
|
||||
| Code review (repo-wide) | > 128K | **RAG with file-level chunks** | Files are natural chunk boundaries |
|
||||
| Multi-turn conversation | Growing | **Hybrid + compression** | Keep recent turns in full, compress older |
|
||||
| Fact retrieval | Any | **RAG** | Always faster to search than read everything |
|
||||
| Complex reasoning across docs | 32K-128K | **Stuff + chain-of-thought** | Models need all context for cross-doc reasoning |
|
||||
|
||||
## Our Stack Constraints
|
||||
|
||||
### What We Have
|
||||
- **Cloud models**: 128K-200K context (OpenRouter providers)
|
||||
- **Local Ollama**: 8K-32K context (Gemma-4 default 8192)
|
||||
- **Hermes fact_store**: SQLite FTS5 full-text search
|
||||
- **Memory**: MemPalace holographic embeddings
|
||||
- **Session context**: Growing conversation history
|
||||
|
||||
### What This Means
|
||||
1. **Cloud sessions**: We CAN stuff up to 128K but SHOULD we? Cost and latency matter.
|
||||
2. **Local sessions**: MUST use RAG for anything beyond 8K. Long context not available.
|
||||
3. **Mixed fleet**: Need a routing layer that decides per-session.
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### 1. Progressive Context Loading
|
||||
Don't load everything at once. Start with RAG, then stuff additional docs as needed:
|
||||
```
|
||||
Turn 1: RAG search → top 3 chunks
|
||||
Turn 2: Model asks "I need more context about X" → stuff X
|
||||
Turn 3: Model has enough → continue
|
||||
```
|
||||
|
||||
### 2. Context Budgeting
|
||||
Allocate context budget across components:
|
||||
```
|
||||
System prompt: 2,000 tokens (always)
|
||||
Recent messages: 10,000 tokens (last 5 turns)
|
||||
RAG results: 8,000 tokens (top chunks)
|
||||
Stuffed docs: 12,000 tokens (key docs)
|
||||
---------------------------
|
||||
Total: 32,000 tokens (fits 32K model)
|
||||
```
|
||||
|
||||
### 3. Smart Compression
|
||||
Before stuffing, compress older context:
|
||||
- Summarize turns older than 10
|
||||
- Remove tool call results (keep only final outputs)
|
||||
- Deduplicate repeated information
|
||||
- Use structured representations (JSON) instead of prose
|
||||
|
||||
## Empirical Benchmarks Needed
|
||||
|
||||
1. **Stuffing vs RAG accuracy** on our fact_store queries
|
||||
2. **Latency comparison** at 32K, 64K, 128K context
|
||||
3. **Cost per query** for cloud models at various context sizes
|
||||
4. **Local model behavior** when pushed beyond rated context
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Audit current context usage**: How many sessions hit > 32K? (Low effort, high value)
|
||||
2. **Implement ContextRouter**: ~50 LOC, adds routing decisions to hermes
|
||||
3. **Add context-size logging**: Track input tokens per session for data gathering
|
||||
|
||||
## References
|
||||
|
||||
- Liu et al. "Lost in the Middle: How Language Models Use Long Contexts" (2023) — https://arxiv.org/abs/2307.03172
|
||||
- Shi et al. "Large Language Models are Easily Distracted by Irrelevant Context" (2023)
|
||||
- Xu et al. "Retrieval Meets Long Context LLMs" (2023) — hybrid approaches outperform both alone
|
||||
- Anthropic's Claude 3.5 context caching — built-in prefix caching reduces cost for repeated system prompts
|
||||
|
||||
---
|
||||
|
||||
*Sovereignty and service always.*
|
||||
262
scripts/dns-manager.py
Executable file
262
scripts/dns-manager.py
Executable file
@@ -0,0 +1,262 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
dns-manager.py — Manage DNS records via Cloudflare API.
|
||||
|
||||
Provides add/update/delete/list operations for DNS A records.
|
||||
Designed for fleet VPS nodes that need API-driven DNS management.
|
||||
|
||||
Usage:
|
||||
python3 scripts/dns-manager.py list --zone alexanderwhitestone.com
|
||||
python3 scripts/dns-manager.py add --zone alexanderwhitestone.com --name forge --ip 143.198.27.163
|
||||
python3 scripts/dns-manager.py update --zone alexanderwhitestone.com --name forge --ip 167.99.126.228
|
||||
python3 scripts/dns-manager.py delete --zone alexanderwhitestone.com --name forge
|
||||
python3 scripts/dns-manager.py sync --config dns-records.yaml
|
||||
|
||||
Config via env:
|
||||
CLOUDFLARE_API_TOKEN — API token with DNS:Edit permission
|
||||
CLOUDFLARE_ZONE_ID — Zone ID (auto-resolved if not set)
|
||||
|
||||
Part of #692: Sovereign DNS management.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
CF_API = "https://api.cloudflare.com/client/v4"
|
||||
|
||||
# ── Auth ──────────────────────────────────────────────────────────────────
|
||||
|
||||
def get_token() -> str:
|
||||
"""Get Cloudflare API token from env or config."""
|
||||
token = os.environ.get("CLOUDFLARE_API_TOKEN", "")
|
||||
if not token:
|
||||
token_path = Path.home() / ".config" / "cloudflare" / "token"
|
||||
if token_path.exists():
|
||||
token = token_path.read_text().strip()
|
||||
if not token:
|
||||
print("ERROR: No Cloudflare API token found.", file=sys.stderr)
|
||||
print("Set CLOUDFLARE_API_TOKEN env var or create ~/.config/cloudflare/token", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return token
|
||||
|
||||
|
||||
def cf_request(method: str, path: str, token: str, data: dict = None) -> dict:
|
||||
"""Make a Cloudflare API request."""
|
||||
url = f"{CF_API}{path}"
|
||||
headers = {
|
||||
"Authorization": f"Bearer {token}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
body = json.dumps(data).encode() if data else None
|
||||
req = urllib.request.Request(url, data=body, headers=headers, method=method)
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=30) as resp:
|
||||
result = json.loads(resp.read().decode())
|
||||
if not result.get("success", True):
|
||||
errors = result.get("errors", [])
|
||||
print(f"API error: {errors}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return result
|
||||
except urllib.error.HTTPError as e:
|
||||
body = e.read().decode() if e.fp else ""
|
||||
print(f"HTTP {e.code}: {body[:500]}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
# ── Zone Resolution ──────────────────────────────────────────────────────
|
||||
|
||||
def resolve_zone_id(zone_name: str, token: str) -> str:
|
||||
"""Resolve zone name to zone ID."""
|
||||
cached = os.environ.get("CLOUDFLARE_ZONE_ID", "")
|
||||
if cached:
|
||||
return cached
|
||||
|
||||
result = cf_request("GET", f"/zones?name={zone_name}", token)
|
||||
zones = result.get("result", [])
|
||||
if not zones:
|
||||
print(f"ERROR: Zone '{zone_name}' not found", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return zones[0]["id"]
|
||||
|
||||
|
||||
# ── DNS Operations ───────────────────────────────────────────────────────
|
||||
|
||||
def list_records(zone_id: str, token: str, name_filter: str = "") -> List[dict]:
|
||||
"""List DNS records in a zone."""
|
||||
path = f"/zones/{zone_id}/dns_records?per_page=100"
|
||||
if name_filter:
|
||||
path += f"&name={name_filter}"
|
||||
result = cf_request("GET", path, token)
|
||||
return result.get("result", [])
|
||||
|
||||
|
||||
def find_record(zone_id: str, token: str, name: str, record_type: str = "A") -> Optional[dict]:
|
||||
"""Find a specific DNS record."""
|
||||
records = list_records(zone_id, token, name)
|
||||
for r in records:
|
||||
if r["name"] == name and r["type"] == record_type:
|
||||
return r
|
||||
return None
|
||||
|
||||
|
||||
def add_record(zone_id: str, token: str, name: str, ip: str, ttl: int = 300, proxied: bool = False) -> dict:
|
||||
"""Add a new DNS A record."""
|
||||
# Check if record already exists
|
||||
existing = find_record(zone_id, token, name)
|
||||
if existing:
|
||||
print(f"Record {name} already exists (IP: {existing['content']}). Use 'update' to change.")
|
||||
return existing
|
||||
|
||||
data = {
|
||||
"type": "A",
|
||||
"name": name,
|
||||
"content": ip,
|
||||
"ttl": ttl,
|
||||
"proxied": proxied,
|
||||
}
|
||||
result = cf_request("POST", f"/zones/{zone_id}/dns_records", token, data)
|
||||
record = result["result"]
|
||||
print(f"Added: {record['name']} -> {record['content']} (ID: {record['id']})")
|
||||
return record
|
||||
|
||||
|
||||
def update_record(zone_id: str, token: str, name: str, ip: str, ttl: int = 300) -> dict:
|
||||
"""Update an existing DNS A record."""
|
||||
existing = find_record(zone_id, token, name)
|
||||
if not existing:
|
||||
print(f"Record {name} not found. Use 'add' to create it.")
|
||||
sys.exit(1)
|
||||
|
||||
data = {
|
||||
"type": "A",
|
||||
"name": name,
|
||||
"content": ip,
|
||||
"ttl": ttl,
|
||||
"proxied": existing.get("proxied", False),
|
||||
}
|
||||
result = cf_request("PUT", f"/zones/{zone_id}/dns_records/{existing['id']}", token, data)
|
||||
record = result["result"]
|
||||
print(f"Updated: {record['name']} {existing['content']} -> {record['content']}")
|
||||
return record
|
||||
|
||||
|
||||
def delete_record(zone_id: str, token: str, name: str) -> bool:
|
||||
"""Delete a DNS A record."""
|
||||
existing = find_record(zone_id, token, name)
|
||||
if not existing:
|
||||
print(f"Record {name} not found.")
|
||||
return False
|
||||
|
||||
cf_request("DELETE", f"/zones/{zone_id}/dns_records/{existing['id']}", token)
|
||||
print(f"Deleted: {name} ({existing['content']})")
|
||||
return True
|
||||
|
||||
|
||||
def sync_records(zone_id: str, token: str, config_path: str):
|
||||
"""Sync DNS records from a YAML config file."""
|
||||
try:
|
||||
import yaml
|
||||
except ImportError:
|
||||
print("ERROR: PyYAML required for sync. Install: pip install pyyaml", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
with open(config_path) as f:
|
||||
config = yaml.safe_load(f)
|
||||
|
||||
desired = config.get("records", [])
|
||||
current = {r["name"]: r for r in list_records(zone_id, token)}
|
||||
|
||||
added = 0
|
||||
updated = 0
|
||||
unchanged = 0
|
||||
|
||||
for rec in desired:
|
||||
name = rec["name"]
|
||||
ip = rec["ip"]
|
||||
ttl = rec.get("ttl", 300)
|
||||
|
||||
if name in current:
|
||||
if current[name]["content"] == ip:
|
||||
unchanged += 1
|
||||
else:
|
||||
update_record(zone_id, token, name, ip, ttl)
|
||||
updated += 1
|
||||
else:
|
||||
add_record(zone_id, token, name, ip, ttl)
|
||||
added += 1
|
||||
|
||||
print(f"\nSync complete: {added} added, {updated} updated, {unchanged} unchanged")
|
||||
|
||||
|
||||
# ── CLI ──────────────────────────────────────────────────────────────────
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Manage DNS records via Cloudflare API")
|
||||
sub = parser.add_subparsers(dest="command")
|
||||
|
||||
# list
|
||||
p_list = sub.add_parser("list", help="List DNS records")
|
||||
p_list.add_argument("--zone", required=True, help="Zone name (e.g., example.com)")
|
||||
p_list.add_argument("--name", default="", help="Filter by record name")
|
||||
|
||||
# add
|
||||
p_add = sub.add_parser("add", help="Add DNS A record")
|
||||
p_add.add_argument("--zone", required=True)
|
||||
p_add.add_argument("--name", required=True, help="Record name (e.g., forge.example.com)")
|
||||
p_add.add_argument("--ip", required=True, help="IPv4 address")
|
||||
p_add.add_argument("--ttl", type=int, default=300)
|
||||
|
||||
# update
|
||||
p_update = sub.add_parser("update", help="Update DNS A record")
|
||||
p_update.add_argument("--zone", required=True)
|
||||
p_update.add_argument("--name", required=True)
|
||||
p_update.add_argument("--ip", required=True)
|
||||
p_update.add_argument("--ttl", type=int, default=300)
|
||||
|
||||
# delete
|
||||
p_delete = sub.add_parser("delete", help="Delete DNS A record")
|
||||
p_delete.add_argument("--zone", required=True)
|
||||
p_delete.add_argument("--name", required=True)
|
||||
|
||||
# sync
|
||||
p_sync = sub.add_parser("sync", help="Sync records from YAML config")
|
||||
p_sync.add_argument("--zone", required=True)
|
||||
p_sync.add_argument("--config", required=True, help="Path to YAML config")
|
||||
|
||||
args = parser.parse_args()
|
||||
if not args.command:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
token = get_token()
|
||||
zone_id = resolve_zone_id(args.zone, token)
|
||||
|
||||
if args.command == "list":
|
||||
records = list_records(zone_id, token, args.name)
|
||||
for r in sorted(records, key=lambda x: x["name"]):
|
||||
print(f" {r['type']:5s} {r['name']:40s} -> {r['content']:20s} TTL:{r['ttl']}")
|
||||
print(f"\n{len(records)} records")
|
||||
|
||||
elif args.command == "add":
|
||||
add_record(zone_id, token, args.name, args.ip, args.ttl)
|
||||
|
||||
elif args.command == "update":
|
||||
update_record(zone_id, token, args.name, args.ip, args.ttl)
|
||||
|
||||
elif args.command == "delete":
|
||||
delete_record(zone_id, token, args.name)
|
||||
|
||||
elif args.command == "sync":
|
||||
sync_records(zone_id, token, args.config)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user