[Vitalik-Security] Benchmark Qwen3.5:35B reasoning vs hermes4:14b #503

Open
opened 2026-04-14 01:35:48 +00:00 by Rockachopa · 0 comments
Owner

Follow-up to #288. Evaluation noted '3B active params may be weaker on complex reasoning.' Need: run MMLU/HumanEval/GSM8K against qwen3.5:35b Q4, compare vs hermes4:14b and gemma4, determine if 3B-active MoE is sufficient for two-factor confirmation role. Part of Epic #281.

Follow-up to #288. Evaluation noted '3B active params may be weaker on complex reasoning.' Need: run MMLU/HumanEval/GSM8K against qwen3.5:35b Q4, compare vs hermes4:14b and gemma4, determine if 3B-active MoE is sufficient for two-factor confirmation role. Part of Epic #281.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/hermes-agent#503