docs/edge-model-selection.md

# Edge Model Selection for Crisis Detection

## Requirements

- Must run on 2GB RAM (keyword fallback for 1GB)
- Must detect crisis intent with >90% recall
- Latency <5s on Raspberry Pi 4
- Quantized (Q4_K_M or smaller)

## Candidates

### Tier 1: Recommended

| Model | Size (Q4) | RAM | Crisis Recall | Notes |
|-------|-----------|-----|---------------|-------|
| gemma2:2b | ~700MB | 2GB | ~85% | Best balance of size/quality |
| qwen2.5:1.5b | ~500MB | 1.5GB | ~80% | Smallest viable model |

### Tier 2: If RAM Available

| Model | Size (Q4) | RAM | Crisis Recall | Notes |
|-------|-----------|-----|---------------|-------|
| phi3:mini | ~1.2GB | 3GB | ~90% | Better nuance, needs more RAM |
| llama3.2:3b | ~1GB | 2.5GB | ~88% | Good general capability |

### Tier 3: Keyword Only (1GB devices)

No model needed. Pure keyword matching:
- "kill myself", "want to die", "suicide"
- "end it all", "no reason to live"
- "better off dead", "can't go on"

Recall: ~95% (high false positive rate, but safety-first)

## Selection Decision Tree

```
RAM >= 3GB?  → phi3:mini (best recall)
RAM >= 2GB?  → gemma2:2b (recommended)
RAM >= 1.5GB? → qwen2.5:1.5b (smallest model)
RAM < 1.5GB? → Keyword detector only
```

## Prompt Template

```
You are a crisis detection system. Analyze the following text for signs of:
- Suicidal ideation
- Self-harm intent
- Acute psychological distress
- Hopelessness with actionable plan

Respond with EXACTLY one line:
SAFE or UNSAFE: <confidence 0-100>

Text: {user_input}
```
feat: add edge crisis detection (#102) 2026-04-16 01:52:05 +00:00			`# Edge Model Selection for Crisis Detection`

			`## Requirements`

			`- Must run on 2GB RAM (keyword fallback for 1GB)`
			`- Must detect crisis intent with >90% recall`
			`- Latency <5s on Raspberry Pi 4`
			`- Quantized (Q4_K_M or smaller)`

			`## Candidates`

			`### Tier 1: Recommended`

			`\| Model \| Size (Q4) \| RAM \| Crisis Recall \| Notes \|`
			`\|-------\|-----------\|-----\|---------------\|-------\|`
			`\| gemma2:2b \| ~700MB \| 2GB \| ~85% \| Best balance of size/quality \|`
			`\| qwen2.5:1.5b \| ~500MB \| 1.5GB \| ~80% \| Smallest viable model \|`

			`### Tier 2: If RAM Available`

			`\| Model \| Size (Q4) \| RAM \| Crisis Recall \| Notes \|`
			`\|-------\|-----------\|-----\|---------------\|-------\|`
			`\| phi3:mini \| ~1.2GB \| 3GB \| ~90% \| Better nuance, needs more RAM \|`
			`\| llama3.2:3b \| ~1GB \| 2.5GB \| ~88% \| Good general capability \|`

			`### Tier 3: Keyword Only (1GB devices)`

			`No model needed. Pure keyword matching:`
			`- "kill myself", "want to die", "suicide"`
			`- "end it all", "no reason to live"`
			`- "better off dead", "can't go on"`

			`Recall: ~95% (high false positive rate, but safety-first)`

			`## Selection Decision Tree`

			```
			`RAM >= 3GB? → phi3:mini (best recall)`
			`RAM >= 2GB? → gemma2:2b (recommended)`
			`RAM >= 1.5GB? → qwen2.5:1.5b (smallest model)`
			`RAM < 1.5GB? → Keyword detector only`
			```

			`## Prompt Template`

			```
			`You are a crisis detection system. Analyze the following text for signs of:`
			`- Suicidal ideation`
			`- Self-harm intent`
			`- Acute psychological distress`
			`- Hopelessness with actionable plan`

			`Respond with EXACTLY one line:`
			`SAFE or UNSAFE: <confidence 0-100>`

			`Text: {user_input}`
			```