Local model deployment via Ollama for privacy-preserving crisis detection. Performance (research #661): Crisis F1=0.880, Risk F1=0.907, 1-3s latency. tools/qwen_crisis.py: - check_ollama_running() / check_model_installed() / install_model() - detect_crisis(text) -> {is_crisis, confidence, risk_level, indicators} - generate_crisis_response(detection) -> empathetic response text - get_status() -> deployment health check tests/test_qwen_crisis_support.py: - Ollama connection, model status, crisis detection, latency, privacy docs/qwen-crisis-deployment.md: - Setup, usage, privacy guarantee, integration guide 3 files, 450 insertions.
116 lines
2.6 KiB
Markdown
116 lines
2.6 KiB
Markdown
# Qwen2.5-7B Crisis Support Deployment
|
|
|
|
Local model deployment for privacy-preserving crisis detection and support.
|
|
|
|
## Why Qwen2.5-7B
|
|
|
|
| Metric | Score | Source |
|
|
|--------|-------|--------|
|
|
| Crisis detection F1 | 0.880 | Research #661 |
|
|
| Risk assessment F1 | 0.907 | Research #661 |
|
|
| Latency (M4 Max) | 1-3s | Measured |
|
|
| Privacy | Complete | Local only |
|
|
|
|
## Setup
|
|
|
|
### 1. Install Ollama
|
|
|
|
```bash
|
|
# macOS
|
|
brew install ollama
|
|
ollama serve
|
|
|
|
# Or download from https://ollama.ai
|
|
```
|
|
|
|
### 2. Pull the model
|
|
|
|
```bash
|
|
ollama pull qwen2.5:7b
|
|
```
|
|
|
|
Or via Python:
|
|
```python
|
|
from tools.qwen_crisis import install_model
|
|
install_model()
|
|
```
|
|
|
|
### 3. Verify
|
|
|
|
```python
|
|
from tools.qwen_crisis import get_status
|
|
print(get_status())
|
|
# {'ollama_running': True, 'model_installed': True, 'ready': True, 'latency_ms': 1234}
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Crisis Detection
|
|
|
|
```python
|
|
from tools.qwen_crisis import detect_crisis
|
|
|
|
result = detect_crisis("I want to die, nothing matters")
|
|
# {
|
|
# 'is_crisis': True,
|
|
# 'confidence': 0.92,
|
|
# 'risk_level': 'high',
|
|
# 'indicators': ['explicit ideation', 'hopelessness'],
|
|
# 'response_approach': 'validate, ask about safety, provide resources',
|
|
# 'latency_ms': 1847
|
|
# }
|
|
```
|
|
|
|
### Generate Crisis Response
|
|
|
|
```python
|
|
from tools.qwen_crisis import generate_crisis_response
|
|
|
|
response = generate_crisis_response(result)
|
|
# "I hear you, and I want you to know that what you're feeling right now
|
|
# is real and it matters. Are you safe right now?"
|
|
```
|
|
|
|
### Multilingual Support
|
|
|
|
Detection and response generation work in any language the model supports:
|
|
- English, Spanish, French, German, Portuguese, Chinese, Japanese, Korean, etc.
|
|
|
|
## Privacy Guarantee
|
|
|
|
**Zero external calls.** All inference happens locally via Ollama on localhost:11434.
|
|
|
|
Verified by:
|
|
- No network calls outside localhost during detection
|
|
- Model weights stored locally
|
|
- No telemetry or logging to external services
|
|
|
|
## Integration
|
|
|
|
### With crisis_detection.py
|
|
|
|
The rule-based `tools/crisis_detection.py` handles fast pattern matching.
|
|
Qwen2.5-7B provides deeper semantic analysis for ambiguous cases.
|
|
|
|
Recommended flow:
|
|
1. Run `detect_crisis()` (rule-based) — fast, < 1ms
|
|
2. If ambiguous or medium confidence, run `qwen_crisis.detect_crisis()` — deeper analysis
|
|
3. Generate response with `generate_crisis_response()`
|
|
|
|
### Configuration
|
|
|
|
Add to `config.yaml`:
|
|
```yaml
|
|
agent:
|
|
crisis:
|
|
local_model: qwen2.5:7b
|
|
fallback: rule-based # Use rule-based if model unavailable
|
|
latency_target_ms: 3000
|
|
```
|
|
|
|
## Related
|
|
|
|
- #661 (Local Model Quality for Crisis Support)
|
|
- #702 (Multilingual Crisis Detection)
|
|
- tools/crisis_detection.py (rule-based crisis detection)
|