[kimi] Refactor autoresearch.py -> SystemExperiment class (#906) #1244

Merged
Rockachopa merged 1 commits from kimi/issue-906 into main 2026-03-23 23:28:55 +00:00
Collaborator

Fixes #906

Summary

Refactored src/timmy/autoresearch.py from ML-training-only scope to a generalized SystemExperiment class that supports arbitrary metrics.

Changes

SystemExperiment class enhancements:

  • New run() method: Encapsulates the full experiment loop (hypothesis → edit → tox → evaluate → commit/revert)
  • New create_branch() method: Git branch management for experiments
  • New metric_fn parameter: Support for custom metric extraction callables
  • New attributes: revert_on_failure, hypothesis, results, baseline

CLI updates:

  • Updated timmy learn command to use SystemExperiment.run() method
  • Maintains same user experience while simplifying implementation

Tests added:

  • test_create_branch_success - Git branch creation
  • test_create_branch_failure - Error handling
  • test_run_dry_run_mode - Dry-run functionality
  • test_run_with_custom_metric_fn - Custom metric extraction
  • test_run_single_iteration_success - Single iteration flow
  • test_run_stores_baseline_on_first_success - Baseline tracking

Backward Compatibility

All standalone functions preserved:

  • prepare_experiment()
  • run_experiment()
  • evaluate_result()
  • get_experiment_history()
  • _extract_metric(), _extract_pass_rate(), _extract_coverage()

Testing

All 49 autoresearch tests pass.

Fixes #906 ## Summary Refactored `src/timmy/autoresearch.py` from ML-training-only scope to a generalized `SystemExperiment` class that supports arbitrary metrics. ## Changes ### SystemExperiment class enhancements: - **New `run()` method**: Encapsulates the full experiment loop (hypothesis → edit → tox → evaluate → commit/revert) - **New `create_branch()` method**: Git branch management for experiments - **New `metric_fn` parameter**: Support for custom metric extraction callables - **New attributes**: `revert_on_failure`, `hypothesis`, `results`, `baseline` ### CLI updates: - Updated `timmy learn` command to use `SystemExperiment.run()` method - Maintains same user experience while simplifying implementation ### Tests added: - `test_create_branch_success` - Git branch creation - `test_create_branch_failure` - Error handling - `test_run_dry_run_mode` - Dry-run functionality - `test_run_with_custom_metric_fn` - Custom metric extraction - `test_run_single_iteration_success` - Single iteration flow - `test_run_stores_baseline_on_first_success` - Baseline tracking ## Backward Compatibility All standalone functions preserved: - `prepare_experiment()` - `run_experiment()` - `evaluate_result()` - `get_experiment_history()` - `_extract_metric()`, `_extract_pass_rate()`, `_extract_coverage()` ## Testing All 49 autoresearch tests pass.
kimi added 1 commit 2026-03-23 23:24:13 +00:00
refactor(autoresearch): refactor autoresearch.py -> SystemExperiment class
Some checks failed
Tests / lint (pull_request) Failing after 16s
Tests / test (pull_request) Has been skipped
b2f970b77c
- Add run() method to SystemExperiment class for full experiment loop
- Add create_branch() method for git branch management
- Add metric_fn parameter for custom metric extraction
- Add revert_on_failure and hypothesis attributes
- Update CLI learn command to use new SystemExperiment.run() method
- Add tests for new functionality (run, create_branch, custom metric_fn)
- Maintain backward compatibility with standalone functions

Refs #906
Rockachopa merged commit 261b7be468 into main 2026-03-23 23:28:55 +00:00
Rockachopa deleted branch kimi/issue-906 2026-03-23 23:28:56 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1244