Local model deployment via Ollama for privacy-preserving crisis detection. Performance (research #661): Crisis F1=0.880, Risk F1=0.907, 1-3s latency. tools/qwen_crisis.py: - check_ollama_running() / check_model_installed() / install_model() - detect_crisis(text) -> {is_crisis, confidence, risk_level, indicators} - generate_crisis_response(detection) -> empathetic response text - get_status() -> deployment health check tests/test_qwen_crisis_support.py: - Ollama connection, model status, crisis detection, latency, privacy docs/qwen-crisis-deployment.md: - Setup, usage, privacy guarantee, integration guide 3 files, 450 insertions.
2.6 KiB
2.6 KiB
Qwen2.5-7B Crisis Support Deployment
Local model deployment for privacy-preserving crisis detection and support.
Why Qwen2.5-7B
| Metric | Score | Source |
|---|---|---|
| Crisis detection F1 | 0.880 | Research #661 |
| Risk assessment F1 | 0.907 | Research #661 |
| Latency (M4 Max) | 1-3s | Measured |
| Privacy | Complete | Local only |
Setup
1. Install Ollama
# macOS
brew install ollama
ollama serve
# Or download from https://ollama.ai
2. Pull the model
ollama pull qwen2.5:7b
Or via Python:
from tools.qwen_crisis import install_model
install_model()
3. Verify
from tools.qwen_crisis import get_status
print(get_status())
# {'ollama_running': True, 'model_installed': True, 'ready': True, 'latency_ms': 1234}
Usage
Crisis Detection
from tools.qwen_crisis import detect_crisis
result = detect_crisis("I want to die, nothing matters")
# {
# 'is_crisis': True,
# 'confidence': 0.92,
# 'risk_level': 'high',
# 'indicators': ['explicit ideation', 'hopelessness'],
# 'response_approach': 'validate, ask about safety, provide resources',
# 'latency_ms': 1847
# }
Generate Crisis Response
from tools.qwen_crisis import generate_crisis_response
response = generate_crisis_response(result)
# "I hear you, and I want you to know that what you're feeling right now
# is real and it matters. Are you safe right now?"
Multilingual Support
Detection and response generation work in any language the model supports:
- English, Spanish, French, German, Portuguese, Chinese, Japanese, Korean, etc.
Privacy Guarantee
Zero external calls. All inference happens locally via Ollama on localhost:11434.
Verified by:
- No network calls outside localhost during detection
- Model weights stored locally
- No telemetry or logging to external services
Integration
With crisis_detection.py
The rule-based tools/crisis_detection.py handles fast pattern matching.
Qwen2.5-7B provides deeper semantic analysis for ambiguous cases.
Recommended flow:
- Run
detect_crisis()(rule-based) — fast, < 1ms - If ambiguous or medium confidence, run
qwen_crisis.detect_crisis()— deeper analysis - Generate response with
generate_crisis_response()
Configuration
Add to config.yaml:
agent:
crisis:
local_model: qwen2.5:7b
fallback: rule-based # Use rule-based if model unavailable
latency_target_ms: 3000
Related
- #661 (Local Model Quality for Crisis Support)
- #702 (Multilingual Crisis Detection)
- tools/crisis_detection.py (rule-based crisis detection)