[EVALUATE] Qwen 3.6 Plus — Integrate into Fleet Rotation and Start Churning #411

New Issue

ezra · 2026-04-05T15:29:05Z

ezra commented

2026-04-05 15:29:05 +00:00

Mission

Evaluate Qwen 3.6 Plus, integrate it into the fleet model rotation, and start churning production work through it.

Model Intel

OpenRouter ID: qwen/qwen3.6-plus:free
Context: 1,048,576 tokens
Max output: 65,536 tokens
Modalities: text + image + video → text
Pricing: Free (but aggressively rate-limited by Alibaba upstream)
Hermes catalog status: ALREADY REGISTERED in hermes_cli/models.py

Initial Evaluation Results (Ezra)

Test	Status	Notes
Connectivity	✅ PASS	Model responds via OpenRouter
Architecture prompt	✅ PASS	Generated 1,854 tokens of coherent asyncio+Redis queue design
Coding prompt	⚠️ RATE-LIMITED	429 from Alibaba after 1 call
Reasoning prompt	⚠️ RATE-LIMITED	429 from Alibaba after 1 call

Observation: The model is verbose but coherent. The free endpoint is NOT suitable for high-frequency churn.

Implementation Plan

Phase 1 — Stabilize Access (BLOCKING)

Option A: Add Alibaba Cloud API key to OpenRouter BYOK integrations to lift rate limits.
Option B: Deploy Qwen 3.6 on RunPod serverless endpoint (aligns with #409 policy — if model GGUF >5GB, must use serverless).
Compare latency and cost of Option A vs Option B.
Pick one and document the access path.

Phase 2 — Profile & Tooling

Create a dedicated Hermes profile qwen36 using OpenRouter provider.
Validate tool-calling capability (OpenRouter schema claims tools and tool_choice are supported).
Run a 10-prompt benchmark comparing Qwen 3.6 vs Kimi K2.5 vs Claude Sonnet 4.6 on:
- Code generation
- Refactoring
- Architecture design
- Bug diagnosis
- Gitea ticket scoping

Phase 3 — Churn Integration

Assign Qwen 3.6 as primary model for one low-stakes wizard lane (suggestion: ticket scoping or research synthesis).
Run a 24-hour burn test.
Collect latency, token usage, and quality scores.
Decide: expand to more lanes, or demote to fallback only.

Acceptance Criteria

Stable API access path documented and working (no 429s at 10 req/min).
qwen36 Hermes profile committed to timmy-config or local profiles.
Benchmark report posted as a comment on this issue.
At least 5 production tickets/issues scoped or coded using Qwen 3.6.
Final recommendation: keep, demote, or promote to fleet standard.

Policy #409 — RunPod serverless mandate for models >5GB
Model catalog already includes qwen/qwen3.6-plus:free in Hermes agent

Owner

Unassigned — needs a wizard to pick up Phase 1 access stabilization.

## Mission Evaluate Qwen 3.6 Plus, integrate it into the fleet model rotation, and start churning production work through it. ## Model Intel - **OpenRouter ID:** `qwen/qwen3.6-plus:free` - **Context:** 1,048,576 tokens - **Max output:** 65,536 tokens - **Modalities:** text + image + video → text - **Pricing:** Free (but aggressively rate-limited by Alibaba upstream) - **Hermes catalog status:** ALREADY REGISTERED in `hermes_cli/models.py` ## Initial Evaluation Results (Ezra) | Test | Status | Notes | |------|--------|-------| | Connectivity | ✅ PASS | Model responds via OpenRouter | | Architecture prompt | ✅ PASS | Generated 1,854 tokens of coherent asyncio+Redis queue design | | Coding prompt | ⚠️ RATE-LIMITED | 429 from Alibaba after 1 call | | Reasoning prompt | ⚠️ RATE-LIMITED | 429 from Alibaba after 1 call | **Observation:** The model is verbose but coherent. The free endpoint is NOT suitable for high-frequency churn. ## Implementation Plan ### Phase 1 — Stabilize Access (BLOCKING) - [ ] **Option A:** Add Alibaba Cloud API key to OpenRouter BYOK integrations to lift rate limits. - [ ] **Option B:** Deploy Qwen 3.6 on RunPod serverless endpoint (aligns with #409 policy — if model GGUF >5GB, must use serverless). - [ ] Compare latency and cost of Option A vs Option B. - [ ] Pick one and document the access path. ### Phase 2 — Profile & Tooling - [ ] Create a dedicated Hermes profile `qwen36` using OpenRouter provider. - [ ] Validate tool-calling capability (OpenRouter schema claims `tools` and `tool_choice` are supported). - [ ] Run a 10-prompt benchmark comparing Qwen 3.6 vs Kimi K2.5 vs Claude Sonnet 4.6 on: - Code generation - Refactoring - Architecture design - Bug diagnosis - Gitea ticket scoping ### Phase 3 — Churn Integration - [ ] Assign Qwen 3.6 as primary model for one low-stakes wizard lane (suggestion: ticket scoping or research synthesis). - [ ] Run a 24-hour burn test. - [ ] Collect latency, token usage, and quality scores. - [ ] Decide: expand to more lanes, or demote to fallback only. ## Acceptance Criteria - [ ] Stable API access path documented and working (no 429s at 10 req/min). - [ ] `qwen36` Hermes profile committed to `timmy-config` or local profiles. - [ ] Benchmark report posted as a comment on this issue. - [ ] At least 5 production tickets/issues scoped or coded using Qwen 3.6. - [ ] Final recommendation: keep, demote, or promote to fleet standard. ## Related - Policy #409 — RunPod serverless mandate for models >5GB - Model catalog already includes `qwen/qwen3.6-plus:free` in Hermes agent ## Owner Unassigned — needs a wizard to pick up Phase 1 access stabilization.

ezra commented

2026-04-05 15:30:45 +00:00

Update: Profile Created + Rate-Limit Blocker Confirmed

Ezra created the Hermes profile for Qwen 3.6:

Profile path: ~/.hermes/profiles/qwen36/
Config: config.yaml with OpenRouter provider + Qwen 3.6 as default
Fallbacks: Claude Sonnet 4.6 → Kimi K2.5

Rate-Limit Reality

The qwen/qwen3.6-plus:free endpoint on OpenRouter is heavily throttled by Alibaba upstream. In testing:

1 architecture prompt succeeded (1,854 tokens, coherent output)
Every subsequent call within 30 minutes returned 429 rate-limited

Verdict: Free tier is NOT viable for churn. We need a paid access path.

Recommended Next Move

Two paths to unblock churn:

Path A — Alibaba Cloud BYOK (fastest)

Sign up at https://www.alibabacloud.com/
Navigate to DashScope (Alibaba's model API platform)
Generate an API key
Add it to OpenRouter integrations at https://openrouter.ai/settings/integrations
This should lift the rate limit while keeping the OpenRouter routing

Path B — RunPod Serverless (aligns with #409)
Deploy Qwen 3.6 as a vLLM serverless endpoint. Ezra can generate the deployment spec immediately.

Which path do you want to fund, Alexander?

## Update: Profile Created + Rate-Limit Blocker Confirmed Ezra created the Hermes profile for Qwen 3.6: - **Profile path:** `~/.hermes/profiles/qwen36/` - **Config:** `config.yaml` with OpenRouter provider + Qwen 3.6 as default - **Fallbacks:** Claude Sonnet 4.6 → Kimi K2.5 ### Rate-Limit Reality The `qwen/qwen3.6-plus:free` endpoint on OpenRouter is **heavily throttled by Alibaba upstream**. In testing: - 1 architecture prompt succeeded (1,854 tokens, coherent output) - Every subsequent call within 30 minutes returned **429 rate-limited** **Verdict:** Free tier is NOT viable for churn. We need a paid access path. ### Recommended Next Move Two paths to unblock churn: **Path A — Alibaba Cloud BYOK (fastest)** 1. Sign up at https://www.alibabacloud.com/ 2. Navigate to **DashScope** (Alibaba's model API platform) 3. Generate an API key 4. Add it to OpenRouter integrations at https://openrouter.ai/settings/integrations 5. This should lift the rate limit while keeping the OpenRouter routing **Path B — RunPod Serverless (aligns with #409)** Deploy Qwen 3.6 as a vLLM serverless endpoint. Ezra can generate the deployment spec immediately. Which path do you want to fund, Alexander?

ezra commented

2026-04-05 15:31:48 +00:00

Critical Finding: Qwen 3.6 is API-Only

Ezra searched HuggingFace for open-weight Qwen 3.6 variants. None exist.

Search Term	Results
`Qwen3.6`	0
`Qwen3.6-32B`	0
`Qwen3.6-Plus`	0
`qwen-3.6`	7 false positives (Qwen2.5 quantizations)

Implication: RunPod serverless deployment (Path B) is NOT POSSIBLE for Qwen 3.6. There is no GGUF/AWQ/Safetensors to serve.

The Only Viable Path

DashScope direct API (Alibaba's official model API platform).

Ezra updated the qwen36 profile:

Added dashscope provider with base URL https://dashscope.aliyuncs.com/compatible-mode/v1
Model ID: qwen3.6-plus
Added README.md with setup instructions

What You Need to Do

Sign up at https://dashscope.aliyun.com/
Generate a DASHSCOPE_API_KEY
Add it to ~/.hermes/profiles/qwen36/.env
Run: hermes -p qwen36
Churn begins.

Until the DashScope key is in place, Qwen 3.6 churn is blocked.

## Critical Finding: Qwen 3.6 is API-Only Ezra searched HuggingFace for open-weight Qwen 3.6 variants. **None exist.** | Search Term | Results | |-------------|---------| | `Qwen3.6` | 0 | | `Qwen3.6-32B` | 0 | | `Qwen3.6-Plus` | 0 | | `qwen-3.6` | 7 false positives (Qwen2.5 quantizations) | **Implication:** RunPod serverless deployment (Path B) is **NOT POSSIBLE** for Qwen 3.6. There is no GGUF/AWQ/Safetensors to serve. ### The Only Viable Path **DashScope direct API** (Alibaba's official model API platform). Ezra updated the `qwen36` profile: - Added `dashscope` provider with base URL `https://dashscope.aliyuncs.com/compatible-mode/v1` - Model ID: `qwen3.6-plus` - Added `README.md` with setup instructions ### What You Need to Do 1. Sign up at https://dashscope.aliyun.com/ 2. Generate a `DASHSCOPE_API_KEY` 3. Add it to `~/.hermes/profiles/qwen36/.env` 4. Run: `hermes -p qwen36` 5. Churn begins. Until the DashScope key is in place, Qwen 3.6 churn is blocked.

ezra commented

2026-04-05 15:56:55 +00:00

Major Update: Model ID Confirmed + Churn Artifacts Ready

Ezra scraped the official Alibaba Cloud help docs (Chinese) and found definitive proof:

completion = client.chat.completions.create(
    model="qwen3.6-plus",  # ✅ CONFIRMED
    messages=[{'role': 'user', 'content': '你是谁？'}]
)

Base URL: https://dashscope.aliyuncs.com/compatible-mode/v1
Model ID: qwen3.6-plus
API Style: OpenAI-compatible

Artifacts Created & Committed

Ezra produced 3 ready-to-run files and committed them to ezra/qwen36-churn:

File	Purpose	Run Command
`benchmark.py`	5-category benchmark (coding, refactor, arch, debug, ticketing)	`python benchmark.py --provider dashscope --model qwen3.6-plus`
`churn_scoper.py`	Auto-scope open Gitea issues via Qwen 3.6	`python churn_scoper.py --repo Timmy_Foundation/timmy-home --limit 5`
`INTEGRATION_PLAN.md`	Full access matrix, profile config, and churn roadmap	Read-only

Commit: 839b4d5 in ezra/ezra-environment

Profile Updated

~/.hermes/profiles/qwen36/config.yaml now uses the confirmed exact model ID.

What Remains

One blocker: You still need to obtain a DASHSCOPE_API_KEY from https://dashscope.aliyun.com/

Once that key is in ~/.hermes/profiles/qwen36/.env, churn can start immediately with the scripts above.

Pricing Intel

DashScope pricing is behind login, but the qwen3.6-plus model is a paid tier (not free). Expect Alibaba Cloud pay-as-you-go billing. Given the $200 RunPod budget, consider capping Qwen 3.6 burn at ~$50-100/month until quality is proven.

Recommendation

Assign @allegro or @Timmy to:

Sign up for DashScope
Add the key
Run benchmark.py
If quality passes, run churn_scoper.py on a low-stakes repo
Post results back to this issue

## Major Update: Model ID Confirmed + Churn Artifacts Ready Ezra scraped the official Alibaba Cloud help docs (Chinese) and found **definitive proof**: ```python completion = client.chat.completions.create( model="qwen3.6-plus", # ✅ CONFIRMED messages=[{'role': 'user', 'content': '你是谁？'}] ) ``` - **Base URL:** `https://dashscope.aliyuncs.com/compatible-mode/v1` - **Model ID:** `qwen3.6-plus` - **API Style:** OpenAI-compatible ### Artifacts Created & Committed Ezra produced 3 ready-to-run files and committed them to `ezra/qwen36-churn`: | File | Purpose | Run Command | |------|---------|-------------| | `benchmark.py` | 5-category benchmark (coding, refactor, arch, debug, ticketing) | `python benchmark.py --provider dashscope --model qwen3.6-plus` | | `churn_scoper.py` | Auto-scope open Gitea issues via Qwen 3.6 | `python churn_scoper.py --repo Timmy_Foundation/timmy-home --limit 5` | | `INTEGRATION_PLAN.md` | Full access matrix, profile config, and churn roadmap | Read-only | Commit: `839b4d5` in `ezra/ezra-environment` ### Profile Updated `~/.hermes/profiles/qwen36/config.yaml` now uses the confirmed exact model ID. ### What Remains **One blocker:** You still need to obtain a `DASHSCOPE_API_KEY` from https://dashscope.aliyun.com/ Once that key is in `~/.hermes/profiles/qwen36/.env`, churn can start immediately with the scripts above. ### Pricing Intel DashScope pricing is behind login, but the `qwen3.6-plus` model is a paid tier (not free). Expect Alibaba Cloud pay-as-you-go billing. Given the $200 RunPod budget, consider capping Qwen 3.6 burn at ~$50-100/month until quality is proven. ### Recommendation Assign **@allegro** or **@Timmy** to: 1. Sign up for DashScope 2. Add the key 3. Run `benchmark.py` 4. If quality passes, run `churn_scoper.py` on a low-stakes repo 5. Post results back to this issue

allegro referenced this issue from Timmy_Foundation/the-nexus

2026-04-05 17:24:05 +00:00

[EPIC] Wizard Fleet Reallocation — Reward Producers, Reassign the Idle #820

allegro self-assigned this 2026-04-05 17:24:34 +00:00

allegro removed their assignment 2026-04-05 17:24:37 +00:00

allegro self-assigned this 2026-04-05 17:26:03 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#411