Files
hermes-agent/docs/qwen-crisis-deployment.md
Alexander Whitestone fc381211c8 fix: deploy Qwen2.5-7B for local crisis support (closes #668)
Local model deployment via Ollama for privacy-preserving crisis detection.
Performance (research #661): Crisis F1=0.880, Risk F1=0.907, 1-3s latency.

tools/qwen_crisis.py:
- check_ollama_running() / check_model_installed() / install_model()
- detect_crisis(text) -> {is_crisis, confidence, risk_level, indicators}
- generate_crisis_response(detection) -> empathetic response text
- get_status() -> deployment health check

tests/test_qwen_crisis_support.py:
- Ollama connection, model status, crisis detection, latency, privacy

docs/qwen-crisis-deployment.md:
- Setup, usage, privacy guarantee, integration guide

3 files, 450 insertions.
2026-04-14 23:04:15 -04:00

2.6 KiB

Qwen2.5-7B Crisis Support Deployment

Local model deployment for privacy-preserving crisis detection and support.

Why Qwen2.5-7B

Metric Score Source
Crisis detection F1 0.880 Research #661
Risk assessment F1 0.907 Research #661
Latency (M4 Max) 1-3s Measured
Privacy Complete Local only

Setup

1. Install Ollama

# macOS
brew install ollama
ollama serve

# Or download from https://ollama.ai

2. Pull the model

ollama pull qwen2.5:7b

Or via Python:

from tools.qwen_crisis import install_model
install_model()

3. Verify

from tools.qwen_crisis import get_status
print(get_status())
# {'ollama_running': True, 'model_installed': True, 'ready': True, 'latency_ms': 1234}

Usage

Crisis Detection

from tools.qwen_crisis import detect_crisis

result = detect_crisis("I want to die, nothing matters")
# {
#   'is_crisis': True,
#   'confidence': 0.92,
#   'risk_level': 'high',
#   'indicators': ['explicit ideation', 'hopelessness'],
#   'response_approach': 'validate, ask about safety, provide resources',
#   'latency_ms': 1847
# }

Generate Crisis Response

from tools.qwen_crisis import generate_crisis_response

response = generate_crisis_response(result)
# "I hear you, and I want you to know that what you're feeling right now 
#  is real and it matters. Are you safe right now?"

Multilingual Support

Detection and response generation work in any language the model supports:

  • English, Spanish, French, German, Portuguese, Chinese, Japanese, Korean, etc.

Privacy Guarantee

Zero external calls. All inference happens locally via Ollama on localhost:11434.

Verified by:

  • No network calls outside localhost during detection
  • Model weights stored locally
  • No telemetry or logging to external services

Integration

With crisis_detection.py

The rule-based tools/crisis_detection.py handles fast pattern matching. Qwen2.5-7B provides deeper semantic analysis for ambiguous cases.

Recommended flow:

  1. Run detect_crisis() (rule-based) — fast, < 1ms
  2. If ambiguous or medium confidence, run qwen_crisis.detect_crisis() — deeper analysis
  3. Generate response with generate_crisis_response()

Configuration

Add to config.yaml:

agent:
  crisis:
    local_model: qwen2.5:7b
    fallback: rule-based  # Use rule-based if model unavailable
    latency_target_ms: 3000
  • #661 (Local Model Quality for Crisis Support)
  • #702 (Multilingual Crisis Detection)
  • tools/crisis_detection.py (rule-based crisis detection)