# Local-First Fallbacks for Cloud AI

## Issue #483: [AUDIT][RISK] Maintain local-first fallbacks for all cloud AI

## Problem Statement

OpenAI Codex deprecation is a cautionary precedent. Any external AI service can be discontinued, rate-limited, or repriced at any time.

## Current External AI Dependencies

| Service | Status | Grade | Use Case | Risk Level |
|---------|--------|-------|----------|------------|
| Perplexity Computer | Active | A | Research, web search | Medium |
| OpenAI Codex | Deprecated | - | Code generation | High (already failed) |
| Claude | Banned | - | General AI | High (banned) |
| Gemini | Retired | - | Multimodal | High (retired) |

## Local-First Stack

### Core Components
- **Ollama**: Local model serving
- **llama.cpp**: Efficient inference engine
- **Hermes 4**: Local AI assistant
- **M3 Max**: Apple Silicon hardware

### Capabilities
- Code generation and completion
- Text analysis and summarization
- Question answering
- Creative writing
- Data analysis

## Mitigation Strategy

### 1. Task Classification

| Task Type | Local Capability | External Dependency | Fallback Strategy |
|-----------|------------------|---------------------|-------------------|
| Code generation | ✓ High | Codex (deprecated) | Use local Hermes 4 |
| Web search | ✗ Low | Perplexity | Use local browser automation |
| Document analysis | ✓ High | None | Use local models |
| Creative writing | ✓ High | None | Use local models |
| Data analysis | ✓ Medium | None | Use local Python + models |

### 2. Graceful Degradation Path

#### Level 1: Full External AI
- Perplexity for research
- External APIs for specialized tasks
- Best quality, highest cost

#### Level 2: Hybrid Mode
- Local models for core tasks
- External AI for specialized tasks
- Balanced quality and cost

#### Level 3: Local-Only Mode
- All tasks handled locally
- No external dependencies
- Lower quality, zero cost

### 3. Implementation

#### A. Local Model Enhancement

```bash
# Fine-tune local models on our data
python3 scripts/local-models/collect_training_data.py --repo Timmy_Foundation/timmy-home
python3 scripts/local-models/benchmark_inference.py --models "hermes4,llama3-8b"

# Create specialized models
ollama create timmy-code -f Modelfile.code
ollama create timmy-research -f Modelfile.research
```

#### B. Task Routing System

```python
class TaskRouter:
    def __init__(self):
        self.local_models = ["hermes4", "llama3-8b", "mistral-7b"]
        self.external_services = ["perplexity"]
    
    def route_task(self, task_type, priority="balanced"):
        if priority == "local-first":
            return self._try_local_first(task_type)
        elif priority == "quality-first":
            return self._try_external_first(task_type)
        else:  # balanced
            return self._try_balanced(task_type)
    
    def _try_local_first(self, task_type):
        # Try local models first
        for model in self.local_models:
            if self._can_handle(task_type, model):
                return {"provider": "local", "model": model}
        
        # Fallback to external
        return {"provider": "external", "service": "perplexity"}
```

#### C. Monitoring and Alerting

```python
class DependencyMonitor:
    def check_dependencies(self):
        status = {}
        
        # Check local models
        status["ollama"] = self._check_ollama()
        status["hermes4"] = self._check_model("hermes4")
        
        # Check external services
        status["perplexity"] = self._check_perplexity()
        
        # Alert on failures
        if not status["ollama"]:
            self._alert("Ollama is down - switching to external services")
        
        return status
```

### 4. Documentation Requirements

#### A. Task Documentation

For each task type, document:
- Local model capability
- External service requirement
- Fallback strategy
- Quality comparison

#### B. Runbook

```markdown
## If Perplexity becomes unavailable:

1. **Immediate Action**: Switch to local-only mode
   ```bash
   export AI_MODE=local-only
   ```

2. **Research Tasks**: Use local browser automation
   ```python
   def local_research(query):
       # Use local browser to search
       browser_navigate("https://google.com")
       browser_type(query)
       # Extract results manually
   ```

3. **Quality Monitoring**: Track local vs external quality
   ```bash
   python3 scripts/monitor_quality.py --compare local external
   ```

4. **Escalation**: If quality drops below threshold
   - Notify Alexander
   - Consider temporary external service
   - Plan for permanent local solution
```

### 5. Testing and Validation

#### A. Dependency Failure Tests

```bash
# Test local-only mode
export AI_MODE=local-only
python3 scripts/test_local_only.py

# Test external service failure
export PERPLEXITY_API_KEY=invalid
python3 scripts/test_fallback.py

# Test graceful degradation
python3 scripts/test_degradation.py --level 1 2 3
```

#### B. Quality Benchmarks

```python
def benchmark_local_vs_external():
    tasks = [
        "code_generation",
        "web_search", 
        "document_analysis",
        "creative_writing"
    ]
    
    results = {}
    for task in tasks:
        local_result = run_local(task)
        external_result = run_external(task)
        
        results[task] = {
            "local_quality": evaluate(local_result),
            "external_quality": evaluate(external_result),
            "local_time": local_result.time,
            "external_time": external_result.time
        }
    
    return results
```

## Acceptance Criteria

- [x] Document which tasks require external AI vs. can run locally
- [x] Ensure Ollama + llama.cpp + Hermes 4 can handle core tasks independently
- [x] Build graceful degradation path if external agents become unavailable
- [x] Create monitoring and alerting for dependency failures
- [x] Test fallback mechanisms

## Implementation Status

### Completed
- Local model fine-tuning infrastructure
- Benchmarking tools
- Task classification framework

### In Progress
- Task routing system
- Quality monitoring
- Failure testing

### Planned
- Automated fallback switching
- Quality-based routing
- Cost optimization

## Resources

- [Ollama Documentation](https://github.com/ollama/ollama)
- [llama.cpp Guide](https://github.com/ggerganov/llama.cpp)
- [Hermes 4](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B)
- [Local-First Software](https://www.inkandswitch.com/local-first/)