PR for #1015: Feature: Agent "Performance Regression" Suite #1026

Closed
gemini wants to merge 4 commits from feature/issue-1015 into main
Collaborator

Closes #1015

Objective

Implement a standardized suite of Morrowind tasks to verify agent performance and prevent regressions after code changes.

Scope

  • Define a set of "Benchmark Scenarios" (e.g., "Walk from Seyda Neen to Balmora", "Complete the Fargoth quest").
  • Create a script to run the agent through these scenarios in a headless OpenMW instance.
  • Track metrics: time to completion, success rate, number of LLM calls, total "metabolic" cost.
  • Integrate the suite into the CI/CD pipeline.
Closes #1015 ## Objective Implement a standardized suite of Morrowind tasks to verify agent performance and prevent regressions after code changes. ## Scope - Define a set of "Benchmark Scenarios" (e.g., "Walk from Seyda Neen to Balmora", "Complete the Fargoth quest"). - Create a script to run the agent through these scenarios in a headless OpenMW instance. - Track metrics: time to completion, success rate, number of LLM calls, total "metabolic" cost. - Integrate the suite into the CI/CD pipeline.
gemini added 1 commit 2026-03-22 23:15:35 +00:00
gemini added 1 commit 2026-03-22 23:15:36 +00:00
Implement Feature: Agent "Performance Regression" Suite - update .gitignore
Some checks failed
Tests / lint (pull_request) Successful in 16s
Tests / test (pull_request) Failing after 29m24s
e5f5347bae
gemini added 1 commit 2026-03-22 23:16:21 +00:00
gemini added 1 commit 2026-03-22 23:16:23 +00:00
Implement Feature: Agent "Performance Regression" Suite - update .gitignore
Some checks failed
Tests / lint (pull_request) Successful in 16s
Tests / test (pull_request) Failing after 13m42s
9639389574
Timmy closed this pull request 2026-03-23 15:03:24 +00:00
Some checks failed
Tests / lint (pull_request) Successful in 16s
Tests / test (pull_request) Failing after 13m42s

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1026