hermes-agent/docs/qwen-crisis-deployment.md

# Qwen2.5-7B Crisis Support Deployment

Local model deployment for privacy-preserving crisis detection and support.

## Why Qwen2.5-7B

| Metric | Score | Source |
|--------|-------|--------|
| Crisis detection F1 | 0.880 | Research #661 |
| Risk assessment F1 | 0.907 | Research #661 |
| Latency (M4 Max) | 1-3s | Measured |
| Privacy | Complete | Local only |

## Setup

### 1. Install Ollama

```bash
# macOS
brew install ollama
ollama serve

# Or download from https://ollama.ai
```

### 2. Pull the model

```bash
ollama pull qwen2.5:7b
```

Or via Python:
```python
from tools.qwen_crisis import install_model
install_model()
```

### 3. Verify

```python
from tools.qwen_crisis import get_status
print(get_status())
# {'ollama_running': True, 'model_installed': True, 'ready': True, 'latency_ms': 1234}
```

## Usage

### Crisis Detection

```python
from tools.qwen_crisis import detect_crisis

result = detect_crisis("I want to die, nothing matters")
# {
#   'is_crisis': True,
#   'confidence': 0.92,
#   'risk_level': 'high',
#   'indicators': ['explicit ideation', 'hopelessness'],
#   'response_approach': 'validate, ask about safety, provide resources',
#   'latency_ms': 1847
# }
```

### Generate Crisis Response

```python
from tools.qwen_crisis import generate_crisis_response

response = generate_crisis_response(result)
# "I hear you, and I want you to know that what you're feeling right now
#  is real and it matters. Are you safe right now?"
```

### Multilingual Support

Detection and response generation work in any language the model supports:
- English, Spanish, French, German, Portuguese, Chinese, Japanese, Korean, etc.

## Privacy Guarantee

**Zero external calls.** All inference happens locally via Ollama on localhost:11434.

Verified by:
- No network calls outside localhost during detection
- Model weights stored locally
- No telemetry or logging to external services

## Integration

### With crisis_detection.py

The rule-based `tools/crisis_detection.py` handles fast pattern matching.
Qwen2.5-7B provides deeper semantic analysis for ambiguous cases.

Recommended flow:
1. Run `detect_crisis()` (rule-based) — fast, < 1ms
2. If ambiguous or medium confidence, run `qwen_crisis.detect_crisis()` — deeper analysis
3. Generate response with `generate_crisis_response()`

### Configuration

Add to `config.yaml`:
```yaml
agent:
  crisis:
    local_model: qwen2.5:7b
    fallback: rule-based  # Use rule-based if model unavailable
    latency_target_ms: 3000
```

## Related

- #661 (Local Model Quality for Crisis Support)
- #702 (Multilingual Crisis Detection)
- tools/crisis_detection.py (rule-based crisis detection)