Files

Alexander Whitestone fc381211c8 fix: deploy Qwen2.5-7B for local crisis support (closes #668 )

Local model deployment via Ollama for privacy-preserving crisis detection.
Performance (research #661): Crisis F1=0.880, Risk F1=0.907, 1-3s latency.

tools/qwen_crisis.py:
- check_ollama_running() / check_model_installed() / install_model()
- detect_crisis(text) -> {is_crisis, confidence, risk_level, indicators}
- generate_crisis_response(detection) -> empathetic response text
- get_status() -> deployment health check

tests/test_qwen_crisis_support.py:
- Ollama connection, model status, crisis detection, latency, privacy

docs/qwen-crisis-deployment.md:
- Setup, usage, privacy guarantee, integration guide

3 files, 450 insertions.

2026-04-14 23:04:15 -04:00

2.6 KiB

Raw Blame History

Qwen2.5-7B Crisis Support Deployment

Local model deployment for privacy-preserving crisis detection and support.

Why Qwen2.5-7B

Metric	Score	Source
Crisis detection F1	0.880	Research #661
Risk assessment F1	0.907	Research #661
Latency (M4 Max)	1-3s	Measured
Privacy	Complete	Local only

Setup

1. Install Ollama

# macOS
brew install ollama
ollama serve

# Or download from https://ollama.ai

2. Pull the model

ollama pull qwen2.5:7b

Or via Python:

from tools.qwen_crisis import install_model
install_model()

3. Verify

from tools.qwen_crisis import get_status
print(get_status())
# {'ollama_running': True, 'model_installed': True, 'ready': True, 'latency_ms': 1234}

Usage

Crisis Detection

from tools.qwen_crisis import detect_crisis

result = detect_crisis("I want to die, nothing matters")
# {
#   'is_crisis': True,
#   'confidence': 0.92,
#   'risk_level': 'high',
#   'indicators': ['explicit ideation', 'hopelessness'],
#   'response_approach': 'validate, ask about safety, provide resources',
#   'latency_ms': 1847
# }

Generate Crisis Response

from tools.qwen_crisis import generate_crisis_response

response = generate_crisis_response(result)
# "I hear you, and I want you to know that what you're feeling right now 
#  is real and it matters. Are you safe right now?"

Multilingual Support

Detection and response generation work in any language the model supports:

English, Spanish, French, German, Portuguese, Chinese, Japanese, Korean, etc.

Privacy Guarantee

Zero external calls. All inference happens locally via Ollama on localhost:11434.

Verified by:

No network calls outside localhost during detection
Model weights stored locally
No telemetry or logging to external services

Integration

With crisis_detection.py

The rule-based tools/crisis_detection.py handles fast pattern matching. Qwen2.5-7B provides deeper semantic analysis for ambiguous cases.

Recommended flow:

Run detect_crisis() (rule-based) — fast, < 1ms
If ambiguous or medium confidence, run qwen_crisis.detect_crisis() — deeper analysis
Generate response with generate_crisis_response()

Configuration

Add to config.yaml:

agent:
  crisis:
    local_model: qwen2.5:7b
    fallback: rule-based  # Use rule-based if model unavailable
    latency_target_ms: 3000

#661 (Local Model Quality for Crisis Support)
#702 (Multilingual Crisis Detection)
tools/crisis_detection.py (rule-based crisis detection)

2.6 KiB Raw Blame History