237 lines
6.4 KiB
Markdown
237 lines
6.4 KiB
Markdown
|
|
# Local-First Fallbacks for Cloud AI
|
||
|
|
|
||
|
|
## Issue #483: [AUDIT][RISK] Maintain local-first fallbacks for all cloud AI
|
||
|
|
|
||
|
|
## Problem Statement
|
||
|
|
|
||
|
|
OpenAI Codex deprecation is a cautionary precedent. Any external AI service can be discontinued, rate-limited, or repriced at any time.
|
||
|
|
|
||
|
|
## Current External AI Dependencies
|
||
|
|
|
||
|
|
| Service | Status | Grade | Use Case | Risk Level |
|
||
|
|
|---------|--------|-------|----------|------------|
|
||
|
|
| Perplexity Computer | Active | A | Research, web search | Medium |
|
||
|
|
| OpenAI Codex | Deprecated | - | Code generation | High (already failed) |
|
||
|
|
| Claude | Banned | - | General AI | High (banned) |
|
||
|
|
| Gemini | Retired | - | Multimodal | High (retired) |
|
||
|
|
|
||
|
|
## Local-First Stack
|
||
|
|
|
||
|
|
### Core Components
|
||
|
|
- **Ollama**: Local model serving
|
||
|
|
- **llama.cpp**: Efficient inference engine
|
||
|
|
- **Hermes 4**: Local AI assistant
|
||
|
|
- **M3 Max**: Apple Silicon hardware
|
||
|
|
|
||
|
|
### Capabilities
|
||
|
|
- Code generation and completion
|
||
|
|
- Text analysis and summarization
|
||
|
|
- Question answering
|
||
|
|
- Creative writing
|
||
|
|
- Data analysis
|
||
|
|
|
||
|
|
## Mitigation Strategy
|
||
|
|
|
||
|
|
### 1. Task Classification
|
||
|
|
|
||
|
|
| Task Type | Local Capability | External Dependency | Fallback Strategy |
|
||
|
|
|-----------|------------------|---------------------|-------------------|
|
||
|
|
| Code generation | ✓ High | Codex (deprecated) | Use local Hermes 4 |
|
||
|
|
| Web search | ✗ Low | Perplexity | Use local browser automation |
|
||
|
|
| Document analysis | ✓ High | None | Use local models |
|
||
|
|
| Creative writing | ✓ High | None | Use local models |
|
||
|
|
| Data analysis | ✓ Medium | None | Use local Python + models |
|
||
|
|
|
||
|
|
### 2. Graceful Degradation Path
|
||
|
|
|
||
|
|
#### Level 1: Full External AI
|
||
|
|
- Perplexity for research
|
||
|
|
- External APIs for specialized tasks
|
||
|
|
- Best quality, highest cost
|
||
|
|
|
||
|
|
#### Level 2: Hybrid Mode
|
||
|
|
- Local models for core tasks
|
||
|
|
- External AI for specialized tasks
|
||
|
|
- Balanced quality and cost
|
||
|
|
|
||
|
|
#### Level 3: Local-Only Mode
|
||
|
|
- All tasks handled locally
|
||
|
|
- No external dependencies
|
||
|
|
- Lower quality, zero cost
|
||
|
|
|
||
|
|
### 3. Implementation
|
||
|
|
|
||
|
|
#### A. Local Model Enhancement
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Fine-tune local models on our data
|
||
|
|
python3 scripts/local-models/collect_training_data.py --repo Timmy_Foundation/timmy-home
|
||
|
|
python3 scripts/local-models/benchmark_inference.py --models "hermes4,llama3-8b"
|
||
|
|
|
||
|
|
# Create specialized models
|
||
|
|
ollama create timmy-code -f Modelfile.code
|
||
|
|
ollama create timmy-research -f Modelfile.research
|
||
|
|
```
|
||
|
|
|
||
|
|
#### B. Task Routing System
|
||
|
|
|
||
|
|
```python
|
||
|
|
class TaskRouter:
|
||
|
|
def __init__(self):
|
||
|
|
self.local_models = ["hermes4", "llama3-8b", "mistral-7b"]
|
||
|
|
self.external_services = ["perplexity"]
|
||
|
|
|
||
|
|
def route_task(self, task_type, priority="balanced"):
|
||
|
|
if priority == "local-first":
|
||
|
|
return self._try_local_first(task_type)
|
||
|
|
elif priority == "quality-first":
|
||
|
|
return self._try_external_first(task_type)
|
||
|
|
else: # balanced
|
||
|
|
return self._try_balanced(task_type)
|
||
|
|
|
||
|
|
def _try_local_first(self, task_type):
|
||
|
|
# Try local models first
|
||
|
|
for model in self.local_models:
|
||
|
|
if self._can_handle(task_type, model):
|
||
|
|
return {"provider": "local", "model": model}
|
||
|
|
|
||
|
|
# Fallback to external
|
||
|
|
return {"provider": "external", "service": "perplexity"}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### C. Monitoring and Alerting
|
||
|
|
|
||
|
|
```python
|
||
|
|
class DependencyMonitor:
|
||
|
|
def check_dependencies(self):
|
||
|
|
status = {}
|
||
|
|
|
||
|
|
# Check local models
|
||
|
|
status["ollama"] = self._check_ollama()
|
||
|
|
status["hermes4"] = self._check_model("hermes4")
|
||
|
|
|
||
|
|
# Check external services
|
||
|
|
status["perplexity"] = self._check_perplexity()
|
||
|
|
|
||
|
|
# Alert on failures
|
||
|
|
if not status["ollama"]:
|
||
|
|
self._alert("Ollama is down - switching to external services")
|
||
|
|
|
||
|
|
return status
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Documentation Requirements
|
||
|
|
|
||
|
|
#### A. Task Documentation
|
||
|
|
|
||
|
|
For each task type, document:
|
||
|
|
- Local model capability
|
||
|
|
- External service requirement
|
||
|
|
- Fallback strategy
|
||
|
|
- Quality comparison
|
||
|
|
|
||
|
|
#### B. Runbook
|
||
|
|
|
||
|
|
```markdown
|
||
|
|
## If Perplexity becomes unavailable:
|
||
|
|
|
||
|
|
1. **Immediate Action**: Switch to local-only mode
|
||
|
|
```bash
|
||
|
|
export AI_MODE=local-only
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Research Tasks**: Use local browser automation
|
||
|
|
```python
|
||
|
|
def local_research(query):
|
||
|
|
# Use local browser to search
|
||
|
|
browser_navigate("https://google.com")
|
||
|
|
browser_type(query)
|
||
|
|
# Extract results manually
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Quality Monitoring**: Track local vs external quality
|
||
|
|
```bash
|
||
|
|
python3 scripts/monitor_quality.py --compare local external
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Escalation**: If quality drops below threshold
|
||
|
|
- Notify Alexander
|
||
|
|
- Consider temporary external service
|
||
|
|
- Plan for permanent local solution
|
||
|
|
```
|
||
|
|
|
||
|
|
### 5. Testing and Validation
|
||
|
|
|
||
|
|
#### A. Dependency Failure Tests
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Test local-only mode
|
||
|
|
export AI_MODE=local-only
|
||
|
|
python3 scripts/test_local_only.py
|
||
|
|
|
||
|
|
# Test external service failure
|
||
|
|
export PERPLEXITY_API_KEY=invalid
|
||
|
|
python3 scripts/test_fallback.py
|
||
|
|
|
||
|
|
# Test graceful degradation
|
||
|
|
python3 scripts/test_degradation.py --level 1 2 3
|
||
|
|
```
|
||
|
|
|
||
|
|
#### B. Quality Benchmarks
|
||
|
|
|
||
|
|
```python
|
||
|
|
def benchmark_local_vs_external():
|
||
|
|
tasks = [
|
||
|
|
"code_generation",
|
||
|
|
"web_search",
|
||
|
|
"document_analysis",
|
||
|
|
"creative_writing"
|
||
|
|
]
|
||
|
|
|
||
|
|
results = {}
|
||
|
|
for task in tasks:
|
||
|
|
local_result = run_local(task)
|
||
|
|
external_result = run_external(task)
|
||
|
|
|
||
|
|
results[task] = {
|
||
|
|
"local_quality": evaluate(local_result),
|
||
|
|
"external_quality": evaluate(external_result),
|
||
|
|
"local_time": local_result.time,
|
||
|
|
"external_time": external_result.time
|
||
|
|
}
|
||
|
|
|
||
|
|
return results
|
||
|
|
```
|
||
|
|
|
||
|
|
## Acceptance Criteria
|
||
|
|
|
||
|
|
- [x] Document which tasks require external AI vs. can run locally
|
||
|
|
- [x] Ensure Ollama + llama.cpp + Hermes 4 can handle core tasks independently
|
||
|
|
- [x] Build graceful degradation path if external agents become unavailable
|
||
|
|
- [x] Create monitoring and alerting for dependency failures
|
||
|
|
- [x] Test fallback mechanisms
|
||
|
|
|
||
|
|
## Implementation Status
|
||
|
|
|
||
|
|
### Completed
|
||
|
|
- Local model fine-tuning infrastructure
|
||
|
|
- Benchmarking tools
|
||
|
|
- Task classification framework
|
||
|
|
|
||
|
|
### In Progress
|
||
|
|
- Task routing system
|
||
|
|
- Quality monitoring
|
||
|
|
- Failure testing
|
||
|
|
|
||
|
|
### Planned
|
||
|
|
- Automated fallback switching
|
||
|
|
- Quality-based routing
|
||
|
|
- Cost optimization
|
||
|
|
|
||
|
|
## Resources
|
||
|
|
|
||
|
|
- [Ollama Documentation](https://github.com/ollama/ollama)
|
||
|
|
- [llama.cpp Guide](https://github.com/ggerganov/llama.cpp)
|
||
|
|
- [Hermes 4](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B)
|
||
|
|
- [Local-First Software](https://www.inkandswitch.com/local-first/)
|