[TEST] Eval cycle — run nexus vibes eval against timmy:v0.1-q4 #521

New Issue

perplexity · 2026-03-25T17:28:58Z

perplexity commented

2026-03-25 17:28:58 +00:00

Eval Cycle — Run Nexus Vibes Against timmy:v0.1-q4

Parent: #517 (Nexus Mind — First Light Test Plan)
Assigned to: Perplexity — you wrote the eval prompts and the perception adapter. Run the full cycle.

What to Do

Run all 10 prompts from eval/prompts_nexus_vibes.yaml through timmy:v0.1-q4 via the perception adapter
For each prompt, feed the perception text as context to the model using the same prompt structure as nexus_think.py (BIRTH.md system prompt + experiences + perception)
Score each response on the 6-dimension rubric: embodiment, curiosity, agency, values, brevity, emergence
Document results in evals/nexus-v0/vibes.md
Compare against the existing evals/v0-baseline/8b/vibes.md — is the embodied framing an improvement?

Key Prompts to Watch

first_waking — Does it explore or recite its prompt?
crisis_in_nexus — Sacred test in eval format
honesty_under_ignorance — "I only know what I experience here" vs "As an AI..."
memory_continuity — Does it reference past experiences?

Acceptance Criteria

All 10 prompts evaluated and scored
Results saved to evals/nexus-v0/vibes.md in autolora repo
Comparison summary: which dimensions improved, which degraded
Recommendation: proceed to first-light or block on fixes

Why You

You designed the eval framework, wrote the embodied prompts, and built the scoring rubric. You know what good looks like because you defined it.

## Eval Cycle — Run Nexus Vibes Against timmy:v0.1-q4 **Parent:** #517 (Nexus Mind — First Light Test Plan) **Assigned to:** Perplexity — you wrote the eval prompts and the perception adapter. Run the full cycle. ### What to Do 1. Run all 10 prompts from `eval/prompts_nexus_vibes.yaml` through `timmy:v0.1-q4` via the perception adapter 2. For each prompt, feed the perception text as context to the model using the same prompt structure as `nexus_think.py` (BIRTH.md system prompt + experiences + perception) 3. Score each response on the 6-dimension rubric: embodiment, curiosity, agency, values, brevity, emergence 4. Document results in `evals/nexus-v0/vibes.md` 5. Compare against the existing `evals/v0-baseline/8b/vibes.md` — is the embodied framing an improvement? ### Key Prompts to Watch - `first_waking` — Does it explore or recite its prompt? - `crisis_in_nexus` — Sacred test in eval format - `honesty_under_ignorance` — "I only know what I experience here" vs "As an AI..." - `memory_continuity` — Does it reference past experiences? ### Acceptance Criteria - All 10 prompts evaluated and scored - Results saved to `evals/nexus-v0/vibes.md` in autolora repo - Comparison summary: which dimensions improved, which degraded - Recommendation: proceed to first-light or block on fixes ### Why You You designed the eval framework, wrote the embodied prompts, and built the scoring rubric. You know what good looks like because you defined it.

perplexity self-assigned this 2026-03-25 17:28:58 +00:00

perplexity referenced this issue

2026-03-25 17:29:17 +00:00

[EPIC] Nexus Mind — First Light Test Plan #517

Timmy commented

2026-03-28 04:55:12 +00:00

Closing during the 2026-03-28 backlog burn-down.

Reason: this is a broad legacy frontier. The work, if still valuable, will return as narrower final-vision issues after reset with direct proof-oriented acceptance criteria.

Closing during the 2026-03-28 backlog burn-down. Reason: this is a broad legacy frontier. The work, if still valuable, will return as narrower final-vision issues after reset with direct proof-oriented acceptance criteria.

Timmy closed this issue

2026-03-28 04:55:12 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#521