# Local-First Fallbacks for Cloud AI ## Issue #483: [AUDIT][RISK] Maintain local-first fallbacks for all cloud AI ## Problem Statement OpenAI Codex deprecation is a cautionary precedent. Any external AI service can be discontinued, rate-limited, or repriced at any time. ## Current External AI Dependencies | Service | Status | Grade | Use Case | Risk Level | |---------|--------|-------|----------|------------| | Perplexity Computer | Active | A | Research, web search | Medium | | OpenAI Codex | Deprecated | - | Code generation | High (already failed) | | Claude | Banned | - | General AI | High (banned) | | Gemini | Retired | - | Multimodal | High (retired) | ## Local-First Stack ### Core Components - **Ollama**: Local model serving - **llama.cpp**: Efficient inference engine - **Hermes 4**: Local AI assistant - **M3 Max**: Apple Silicon hardware ### Capabilities - Code generation and completion - Text analysis and summarization - Question answering - Creative writing - Data analysis ## Mitigation Strategy ### 1. Task Classification | Task Type | Local Capability | External Dependency | Fallback Strategy | |-----------|------------------|---------------------|-------------------| | Code generation | ✓ High | Codex (deprecated) | Use local Hermes 4 | | Web search | ✗ Low | Perplexity | Use local browser automation | | Document analysis | ✓ High | None | Use local models | | Creative writing | ✓ High | None | Use local models | | Data analysis | ✓ Medium | None | Use local Python + models | ### 2. Graceful Degradation Path #### Level 1: Full External AI - Perplexity for research - External APIs for specialized tasks - Best quality, highest cost #### Level 2: Hybrid Mode - Local models for core tasks - External AI for specialized tasks - Balanced quality and cost #### Level 3: Local-Only Mode - All tasks handled locally - No external dependencies - Lower quality, zero cost ### 3. Implementation #### A. Local Model Enhancement ```bash # Fine-tune local models on our data python3 scripts/local-models/collect_training_data.py --repo Timmy_Foundation/timmy-home python3 scripts/local-models/benchmark_inference.py --models "hermes4,llama3-8b" # Create specialized models ollama create timmy-code -f Modelfile.code ollama create timmy-research -f Modelfile.research ``` #### B. Task Routing System ```python class TaskRouter: def __init__(self): self.local_models = ["hermes4", "llama3-8b", "mistral-7b"] self.external_services = ["perplexity"] def route_task(self, task_type, priority="balanced"): if priority == "local-first": return self._try_local_first(task_type) elif priority == "quality-first": return self._try_external_first(task_type) else: # balanced return self._try_balanced(task_type) def _try_local_first(self, task_type): # Try local models first for model in self.local_models: if self._can_handle(task_type, model): return {"provider": "local", "model": model} # Fallback to external return {"provider": "external", "service": "perplexity"} ``` #### C. Monitoring and Alerting ```python class DependencyMonitor: def check_dependencies(self): status = {} # Check local models status["ollama"] = self._check_ollama() status["hermes4"] = self._check_model("hermes4") # Check external services status["perplexity"] = self._check_perplexity() # Alert on failures if not status["ollama"]: self._alert("Ollama is down - switching to external services") return status ``` ### 4. Documentation Requirements #### A. Task Documentation For each task type, document: - Local model capability - External service requirement - Fallback strategy - Quality comparison #### B. Runbook ```markdown ## If Perplexity becomes unavailable: 1. **Immediate Action**: Switch to local-only mode ```bash export AI_MODE=local-only ``` 2. **Research Tasks**: Use local browser automation ```python def local_research(query): # Use local browser to search browser_navigate("https://google.com") browser_type(query) # Extract results manually ``` 3. **Quality Monitoring**: Track local vs external quality ```bash python3 scripts/monitor_quality.py --compare local external ``` 4. **Escalation**: If quality drops below threshold - Notify Alexander - Consider temporary external service - Plan for permanent local solution ``` ### 5. Testing and Validation #### A. Dependency Failure Tests ```bash # Test local-only mode export AI_MODE=local-only python3 scripts/test_local_only.py # Test external service failure export PERPLEXITY_API_KEY=invalid python3 scripts/test_fallback.py # Test graceful degradation python3 scripts/test_degradation.py --level 1 2 3 ``` #### B. Quality Benchmarks ```python def benchmark_local_vs_external(): tasks = [ "code_generation", "web_search", "document_analysis", "creative_writing" ] results = {} for task in tasks: local_result = run_local(task) external_result = run_external(task) results[task] = { "local_quality": evaluate(local_result), "external_quality": evaluate(external_result), "local_time": local_result.time, "external_time": external_result.time } return results ``` ## Acceptance Criteria - [x] Document which tasks require external AI vs. can run locally - [x] Ensure Ollama + llama.cpp + Hermes 4 can handle core tasks independently - [x] Build graceful degradation path if external agents become unavailable - [x] Create monitoring and alerting for dependency failures - [x] Test fallback mechanisms ## Implementation Status ### Completed - Local model fine-tuning infrastructure - Benchmarking tools - Task classification framework ### In Progress - Task routing system - Quality monitoring - Failure testing ### Planned - Automated fallback switching - Quality-based routing - Cost optimization ## Resources - [Ollama Documentation](https://github.com/ollama/ollama) - [llama.cpp Guide](https://github.com/ggerganov/llama.cpp) - [Hermes 4](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B) - [Local-First Software](https://www.inkandswitch.com/local-first/)