[EVALUATE] Qwen 3.6 Plus — Integrate into Fleet Rotation and Start Churning #411

Open
opened 2026-04-05 15:29:05 +00:00 by ezra · 3 comments
Member

Mission

Evaluate Qwen 3.6 Plus, integrate it into the fleet model rotation, and start churning production work through it.

Model Intel

  • OpenRouter ID: qwen/qwen3.6-plus:free
  • Context: 1,048,576 tokens
  • Max output: 65,536 tokens
  • Modalities: text + image + video → text
  • Pricing: Free (but aggressively rate-limited by Alibaba upstream)
  • Hermes catalog status: ALREADY REGISTERED in hermes_cli/models.py

Initial Evaluation Results (Ezra)

Test Status Notes
Connectivity PASS Model responds via OpenRouter
Architecture prompt PASS Generated 1,854 tokens of coherent asyncio+Redis queue design
Coding prompt ⚠️ RATE-LIMITED 429 from Alibaba after 1 call
Reasoning prompt ⚠️ RATE-LIMITED 429 from Alibaba after 1 call

Observation: The model is verbose but coherent. The free endpoint is NOT suitable for high-frequency churn.

Implementation Plan

Phase 1 — Stabilize Access (BLOCKING)

  • Option A: Add Alibaba Cloud API key to OpenRouter BYOK integrations to lift rate limits.
  • Option B: Deploy Qwen 3.6 on RunPod serverless endpoint (aligns with #409 policy — if model GGUF >5GB, must use serverless).
  • Compare latency and cost of Option A vs Option B.
  • Pick one and document the access path.

Phase 2 — Profile & Tooling

  • Create a dedicated Hermes profile qwen36 using OpenRouter provider.
  • Validate tool-calling capability (OpenRouter schema claims tools and tool_choice are supported).
  • Run a 10-prompt benchmark comparing Qwen 3.6 vs Kimi K2.5 vs Claude Sonnet 4.6 on:
    • Code generation
    • Refactoring
    • Architecture design
    • Bug diagnosis
    • Gitea ticket scoping

Phase 3 — Churn Integration

  • Assign Qwen 3.6 as primary model for one low-stakes wizard lane (suggestion: ticket scoping or research synthesis).
  • Run a 24-hour burn test.
  • Collect latency, token usage, and quality scores.
  • Decide: expand to more lanes, or demote to fallback only.

Acceptance Criteria

  • Stable API access path documented and working (no 429s at 10 req/min).
  • qwen36 Hermes profile committed to timmy-config or local profiles.
  • Benchmark report posted as a comment on this issue.
  • At least 5 production tickets/issues scoped or coded using Qwen 3.6.
  • Final recommendation: keep, demote, or promote to fleet standard.
  • Policy #409 — RunPod serverless mandate for models >5GB
  • Model catalog already includes qwen/qwen3.6-plus:free in Hermes agent

Owner

Unassigned — needs a wizard to pick up Phase 1 access stabilization.

## Mission Evaluate Qwen 3.6 Plus, integrate it into the fleet model rotation, and start churning production work through it. ## Model Intel - **OpenRouter ID:** `qwen/qwen3.6-plus:free` - **Context:** 1,048,576 tokens - **Max output:** 65,536 tokens - **Modalities:** text + image + video → text - **Pricing:** Free (but aggressively rate-limited by Alibaba upstream) - **Hermes catalog status:** ALREADY REGISTERED in `hermes_cli/models.py` ## Initial Evaluation Results (Ezra) | Test | Status | Notes | |------|--------|-------| | Connectivity | ✅ PASS | Model responds via OpenRouter | | Architecture prompt | ✅ PASS | Generated 1,854 tokens of coherent asyncio+Redis queue design | | Coding prompt | ⚠️ RATE-LIMITED | 429 from Alibaba after 1 call | | Reasoning prompt | ⚠️ RATE-LIMITED | 429 from Alibaba after 1 call | **Observation:** The model is verbose but coherent. The free endpoint is NOT suitable for high-frequency churn. ## Implementation Plan ### Phase 1 — Stabilize Access (BLOCKING) - [ ] **Option A:** Add Alibaba Cloud API key to OpenRouter BYOK integrations to lift rate limits. - [ ] **Option B:** Deploy Qwen 3.6 on RunPod serverless endpoint (aligns with #409 policy — if model GGUF >5GB, must use serverless). - [ ] Compare latency and cost of Option A vs Option B. - [ ] Pick one and document the access path. ### Phase 2 — Profile & Tooling - [ ] Create a dedicated Hermes profile `qwen36` using OpenRouter provider. - [ ] Validate tool-calling capability (OpenRouter schema claims `tools` and `tool_choice` are supported). - [ ] Run a 10-prompt benchmark comparing Qwen 3.6 vs Kimi K2.5 vs Claude Sonnet 4.6 on: - Code generation - Refactoring - Architecture design - Bug diagnosis - Gitea ticket scoping ### Phase 3 — Churn Integration - [ ] Assign Qwen 3.6 as primary model for one low-stakes wizard lane (suggestion: ticket scoping or research synthesis). - [ ] Run a 24-hour burn test. - [ ] Collect latency, token usage, and quality scores. - [ ] Decide: expand to more lanes, or demote to fallback only. ## Acceptance Criteria - [ ] Stable API access path documented and working (no 429s at 10 req/min). - [ ] `qwen36` Hermes profile committed to `timmy-config` or local profiles. - [ ] Benchmark report posted as a comment on this issue. - [ ] At least 5 production tickets/issues scoped or coded using Qwen 3.6. - [ ] Final recommendation: keep, demote, or promote to fleet standard. ## Related - Policy #409 — RunPod serverless mandate for models >5GB - Model catalog already includes `qwen/qwen3.6-plus:free` in Hermes agent ## Owner Unassigned — needs a wizard to pick up Phase 1 access stabilization.
Author
Member

Update: Profile Created + Rate-Limit Blocker Confirmed

Ezra created the Hermes profile for Qwen 3.6:

  • Profile path: ~/.hermes/profiles/qwen36/
  • Config: config.yaml with OpenRouter provider + Qwen 3.6 as default
  • Fallbacks: Claude Sonnet 4.6 → Kimi K2.5

Rate-Limit Reality

The qwen/qwen3.6-plus:free endpoint on OpenRouter is heavily throttled by Alibaba upstream. In testing:

  • 1 architecture prompt succeeded (1,854 tokens, coherent output)
  • Every subsequent call within 30 minutes returned 429 rate-limited

Verdict: Free tier is NOT viable for churn. We need a paid access path.

Two paths to unblock churn:

Path A — Alibaba Cloud BYOK (fastest)

  1. Sign up at https://www.alibabacloud.com/
  2. Navigate to DashScope (Alibaba's model API platform)
  3. Generate an API key
  4. Add it to OpenRouter integrations at https://openrouter.ai/settings/integrations
  5. This should lift the rate limit while keeping the OpenRouter routing

Path B — RunPod Serverless (aligns with #409)
Deploy Qwen 3.6 as a vLLM serverless endpoint. Ezra can generate the deployment spec immediately.

Which path do you want to fund, Alexander?

## Update: Profile Created + Rate-Limit Blocker Confirmed Ezra created the Hermes profile for Qwen 3.6: - **Profile path:** `~/.hermes/profiles/qwen36/` - **Config:** `config.yaml` with OpenRouter provider + Qwen 3.6 as default - **Fallbacks:** Claude Sonnet 4.6 → Kimi K2.5 ### Rate-Limit Reality The `qwen/qwen3.6-plus:free` endpoint on OpenRouter is **heavily throttled by Alibaba upstream**. In testing: - 1 architecture prompt succeeded (1,854 tokens, coherent output) - Every subsequent call within 30 minutes returned **429 rate-limited** **Verdict:** Free tier is NOT viable for churn. We need a paid access path. ### Recommended Next Move Two paths to unblock churn: **Path A — Alibaba Cloud BYOK (fastest)** 1. Sign up at https://www.alibabacloud.com/ 2. Navigate to **DashScope** (Alibaba's model API platform) 3. Generate an API key 4. Add it to OpenRouter integrations at https://openrouter.ai/settings/integrations 5. This should lift the rate limit while keeping the OpenRouter routing **Path B — RunPod Serverless (aligns with #409)** Deploy Qwen 3.6 as a vLLM serverless endpoint. Ezra can generate the deployment spec immediately. Which path do you want to fund, Alexander?
Author
Member

Critical Finding: Qwen 3.6 is API-Only

Ezra searched HuggingFace for open-weight Qwen 3.6 variants. None exist.

Search Term Results
Qwen3.6 0
Qwen3.6-32B 0
Qwen3.6-Plus 0
qwen-3.6 7 false positives (Qwen2.5 quantizations)

Implication: RunPod serverless deployment (Path B) is NOT POSSIBLE for Qwen 3.6. There is no GGUF/AWQ/Safetensors to serve.

The Only Viable Path

DashScope direct API (Alibaba's official model API platform).

Ezra updated the qwen36 profile:

  • Added dashscope provider with base URL https://dashscope.aliyuncs.com/compatible-mode/v1
  • Model ID: qwen3.6-plus
  • Added README.md with setup instructions

What You Need to Do

  1. Sign up at https://dashscope.aliyun.com/
  2. Generate a DASHSCOPE_API_KEY
  3. Add it to ~/.hermes/profiles/qwen36/.env
  4. Run: hermes -p qwen36
  5. Churn begins.

Until the DashScope key is in place, Qwen 3.6 churn is blocked.

## Critical Finding: Qwen 3.6 is API-Only Ezra searched HuggingFace for open-weight Qwen 3.6 variants. **None exist.** | Search Term | Results | |-------------|---------| | `Qwen3.6` | 0 | | `Qwen3.6-32B` | 0 | | `Qwen3.6-Plus` | 0 | | `qwen-3.6` | 7 false positives (Qwen2.5 quantizations) | **Implication:** RunPod serverless deployment (Path B) is **NOT POSSIBLE** for Qwen 3.6. There is no GGUF/AWQ/Safetensors to serve. ### The Only Viable Path **DashScope direct API** (Alibaba's official model API platform). Ezra updated the `qwen36` profile: - Added `dashscope` provider with base URL `https://dashscope.aliyuncs.com/compatible-mode/v1` - Model ID: `qwen3.6-plus` - Added `README.md` with setup instructions ### What You Need to Do 1. Sign up at https://dashscope.aliyun.com/ 2. Generate a `DASHSCOPE_API_KEY` 3. Add it to `~/.hermes/profiles/qwen36/.env` 4. Run: `hermes -p qwen36` 5. Churn begins. Until the DashScope key is in place, Qwen 3.6 churn is blocked.
Author
Member

Major Update: Model ID Confirmed + Churn Artifacts Ready

Ezra scraped the official Alibaba Cloud help docs (Chinese) and found definitive proof:

completion = client.chat.completions.create(
    model="qwen3.6-plus",  # ✅ CONFIRMED
    messages=[{'role': 'user', 'content': '你是谁?'}]
)
  • Base URL: https://dashscope.aliyuncs.com/compatible-mode/v1
  • Model ID: qwen3.6-plus
  • API Style: OpenAI-compatible

Artifacts Created & Committed

Ezra produced 3 ready-to-run files and committed them to ezra/qwen36-churn:

File Purpose Run Command
benchmark.py 5-category benchmark (coding, refactor, arch, debug, ticketing) python benchmark.py --provider dashscope --model qwen3.6-plus
churn_scoper.py Auto-scope open Gitea issues via Qwen 3.6 python churn_scoper.py --repo Timmy_Foundation/timmy-home --limit 5
INTEGRATION_PLAN.md Full access matrix, profile config, and churn roadmap Read-only

Commit: 839b4d5 in ezra/ezra-environment

Profile Updated

~/.hermes/profiles/qwen36/config.yaml now uses the confirmed exact model ID.

What Remains

One blocker: You still need to obtain a DASHSCOPE_API_KEY from https://dashscope.aliyun.com/

Once that key is in ~/.hermes/profiles/qwen36/.env, churn can start immediately with the scripts above.

Pricing Intel

DashScope pricing is behind login, but the qwen3.6-plus model is a paid tier (not free). Expect Alibaba Cloud pay-as-you-go billing. Given the $200 RunPod budget, consider capping Qwen 3.6 burn at ~$50-100/month until quality is proven.

Recommendation

Assign @allegro or @Timmy to:

  1. Sign up for DashScope
  2. Add the key
  3. Run benchmark.py
  4. If quality passes, run churn_scoper.py on a low-stakes repo
  5. Post results back to this issue
## Major Update: Model ID Confirmed + Churn Artifacts Ready Ezra scraped the official Alibaba Cloud help docs (Chinese) and found **definitive proof**: ```python completion = client.chat.completions.create( model="qwen3.6-plus", # ✅ CONFIRMED messages=[{'role': 'user', 'content': '你是谁?'}] ) ``` - **Base URL:** `https://dashscope.aliyuncs.com/compatible-mode/v1` - **Model ID:** `qwen3.6-plus` - **API Style:** OpenAI-compatible ### Artifacts Created & Committed Ezra produced 3 ready-to-run files and committed them to `ezra/qwen36-churn`: | File | Purpose | Run Command | |------|---------|-------------| | `benchmark.py` | 5-category benchmark (coding, refactor, arch, debug, ticketing) | `python benchmark.py --provider dashscope --model qwen3.6-plus` | | `churn_scoper.py` | Auto-scope open Gitea issues via Qwen 3.6 | `python churn_scoper.py --repo Timmy_Foundation/timmy-home --limit 5` | | `INTEGRATION_PLAN.md` | Full access matrix, profile config, and churn roadmap | Read-only | Commit: `839b4d5` in `ezra/ezra-environment` ### Profile Updated `~/.hermes/profiles/qwen36/config.yaml` now uses the confirmed exact model ID. ### What Remains **One blocker:** You still need to obtain a `DASHSCOPE_API_KEY` from https://dashscope.aliyun.com/ Once that key is in `~/.hermes/profiles/qwen36/.env`, churn can start immediately with the scripts above. ### Pricing Intel DashScope pricing is behind login, but the `qwen3.6-plus` model is a paid tier (not free). Expect Alibaba Cloud pay-as-you-go billing. Given the $200 RunPod budget, consider capping Qwen 3.6 burn at ~$50-100/month until quality is proven. ### Recommendation Assign **@allegro** or **@Timmy** to: 1. Sign up for DashScope 2. Add the key 3. Run `benchmark.py` 4. If quality passes, run `churn_scoper.py` on a low-stakes repo 5. Post results back to this issue
allegro self-assigned this 2026-04-05 17:24:34 +00:00
allegro removed their assignment 2026-04-05 17:24:37 +00:00
allegro self-assigned this 2026-04-05 17:26:03 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#411