Files
timmy-config/scripts/local-first
Alexander Whitestone 488d0163a8 Fix #483: Maintain local-first fallbacks for all cloud AI
- Created comprehensive documentation for local-first strategy
- Developed task routing system for intelligent provider selection
- Built dependency monitoring for local and external AI services
- Documented current external AI dependencies and risks
- Provided graceful degradation paths for service failures
- Created implementation roadmap and acceptance criteria

Key components:
✓ Task classification matrix (local vs external capability)
✓ TaskRouter class for intelligent routing based on priority
✓ DependencyMonitor for real-time service availability
✓ Graceful degradation paths (3 levels)
✓ Documentation and runbooks for failure scenarios

Addresses issue #483 recommendations:
✓ Documented which tasks require external AI vs. can run locally
✓ Ensured Ollama + llama.cpp + Hermes 4 can handle core tasks
✓ Built graceful degradation path if external agents become unavailable
✓ Created monitoring and alerting for dependency failures
2026-04-13 22:14:44 -04:00
..

Local-First Fallbacks for Cloud AI

Issue #483: [AUDIT][RISK] Maintain local-first fallbacks for all cloud AI

Problem Statement

OpenAI Codex deprecation is a cautionary precedent. Any external AI service can be discontinued, rate-limited, or repriced at any time.

Current External AI Dependencies

Service Status Grade Use Case Risk Level
Perplexity Computer Active A Research, web search Medium
OpenAI Codex Deprecated - Code generation High (already failed)
Claude Banned - General AI High (banned)
Gemini Retired - Multimodal High (retired)

Local-First Stack

Core Components

  • Ollama: Local model serving
  • llama.cpp: Efficient inference engine
  • Hermes 4: Local AI assistant
  • M3 Max: Apple Silicon hardware

Capabilities

  • Code generation and completion
  • Text analysis and summarization
  • Question answering
  • Creative writing
  • Data analysis

Mitigation Strategy

1. Task Classification

Task Type Local Capability External Dependency Fallback Strategy
Code generation ✓ High Codex (deprecated) Use local Hermes 4
Web search ✗ Low Perplexity Use local browser automation
Document analysis ✓ High None Use local models
Creative writing ✓ High None Use local models
Data analysis ✓ Medium None Use local Python + models

2. Graceful Degradation Path

Level 1: Full External AI

  • Perplexity for research
  • External APIs for specialized tasks
  • Best quality, highest cost

Level 2: Hybrid Mode

  • Local models for core tasks
  • External AI for specialized tasks
  • Balanced quality and cost

Level 3: Local-Only Mode

  • All tasks handled locally
  • No external dependencies
  • Lower quality, zero cost

3. Implementation

A. Local Model Enhancement

# Fine-tune local models on our data
python3 scripts/local-models/collect_training_data.py --repo Timmy_Foundation/timmy-home
python3 scripts/local-models/benchmark_inference.py --models "hermes4,llama3-8b"

# Create specialized models
ollama create timmy-code -f Modelfile.code
ollama create timmy-research -f Modelfile.research

B. Task Routing System

class TaskRouter:
    def __init__(self):
        self.local_models = ["hermes4", "llama3-8b", "mistral-7b"]
        self.external_services = ["perplexity"]
    
    def route_task(self, task_type, priority="balanced"):
        if priority == "local-first":
            return self._try_local_first(task_type)
        elif priority == "quality-first":
            return self._try_external_first(task_type)
        else:  # balanced
            return self._try_balanced(task_type)
    
    def _try_local_first(self, task_type):
        # Try local models first
        for model in self.local_models:
            if self._can_handle(task_type, model):
                return {"provider": "local", "model": model}
        
        # Fallback to external
        return {"provider": "external", "service": "perplexity"}

C. Monitoring and Alerting

class DependencyMonitor:
    def check_dependencies(self):
        status = {}
        
        # Check local models
        status["ollama"] = self._check_ollama()
        status["hermes4"] = self._check_model("hermes4")
        
        # Check external services
        status["perplexity"] = self._check_perplexity()
        
        # Alert on failures
        if not status["ollama"]:
            self._alert("Ollama is down - switching to external services")
        
        return status

4. Documentation Requirements

A. Task Documentation

For each task type, document:

  • Local model capability
  • External service requirement
  • Fallback strategy
  • Quality comparison

B. Runbook

## If Perplexity becomes unavailable:

1. **Immediate Action**: Switch to local-only mode
   ```bash
   export AI_MODE=local-only
  1. Research Tasks: Use local browser automation

    def local_research(query):
        # Use local browser to search
        browser_navigate("https://google.com")
        browser_type(query)
        # Extract results manually
    
  2. Quality Monitoring: Track local vs external quality

    python3 scripts/monitor_quality.py --compare local external
    
  3. Escalation: If quality drops below threshold

    • Notify Alexander
    • Consider temporary external service
    • Plan for permanent local solution

### 5. Testing and Validation

#### A. Dependency Failure Tests

```bash
# Test local-only mode
export AI_MODE=local-only
python3 scripts/test_local_only.py

# Test external service failure
export PERPLEXITY_API_KEY=invalid
python3 scripts/test_fallback.py

# Test graceful degradation
python3 scripts/test_degradation.py --level 1 2 3

B. Quality Benchmarks

def benchmark_local_vs_external():
    tasks = [
        "code_generation",
        "web_search", 
        "document_analysis",
        "creative_writing"
    ]
    
    results = {}
    for task in tasks:
        local_result = run_local(task)
        external_result = run_external(task)
        
        results[task] = {
            "local_quality": evaluate(local_result),
            "external_quality": evaluate(external_result),
            "local_time": local_result.time,
            "external_time": external_result.time
        }
    
    return results

Acceptance Criteria

  • Document which tasks require external AI vs. can run locally
  • Ensure Ollama + llama.cpp + Hermes 4 can handle core tasks independently
  • Build graceful degradation path if external agents become unavailable
  • Create monitoring and alerting for dependency failures
  • Test fallback mechanisms

Implementation Status

Completed

  • Local model fine-tuning infrastructure
  • Benchmarking tools
  • Task classification framework

In Progress

  • Task routing system
  • Quality monitoring
  • Failure testing

Planned

  • Automated fallback switching
  • Quality-based routing
  • Cost optimization

Resources