[kimi] Refactor autoresearch.py -> SystemExperiment class (#906) #1244

kimi · 2026-03-23T23:24:12Z

kimi commented

2026-03-23 23:24:12 +00:00

Fixes #906

Summary

Refactored src/timmy/autoresearch.py from ML-training-only scope to a generalized SystemExperiment class that supports arbitrary metrics.

Changes

SystemExperiment class enhancements:

New run() method: Encapsulates the full experiment loop (hypothesis → edit → tox → evaluate → commit/revert)
New create_branch() method: Git branch management for experiments
New metric_fn parameter: Support for custom metric extraction callables
New attributes: revert_on_failure, hypothesis, results, baseline

CLI updates:

Updated timmy learn command to use SystemExperiment.run() method
Maintains same user experience while simplifying implementation

Tests added:

test_create_branch_success - Git branch creation
test_create_branch_failure - Error handling
test_run_dry_run_mode - Dry-run functionality
test_run_with_custom_metric_fn - Custom metric extraction
test_run_single_iteration_success - Single iteration flow
test_run_stores_baseline_on_first_success - Baseline tracking

Backward Compatibility

All standalone functions preserved:

prepare_experiment()
run_experiment()
evaluate_result()
get_experiment_history()
_extract_metric(), _extract_pass_rate(), _extract_coverage()

Testing

All 49 autoresearch tests pass.

Fixes #906 ## Summary Refactored `src/timmy/autoresearch.py` from ML-training-only scope to a generalized `SystemExperiment` class that supports arbitrary metrics. ## Changes ### SystemExperiment class enhancements: - **New `run()` method**: Encapsulates the full experiment loop (hypothesis → edit → tox → evaluate → commit/revert) - **New `create_branch()` method**: Git branch management for experiments - **New `metric_fn` parameter**: Support for custom metric extraction callables - **New attributes**: `revert_on_failure`, `hypothesis`, `results`, `baseline` ### CLI updates: - Updated `timmy learn` command to use `SystemExperiment.run()` method - Maintains same user experience while simplifying implementation ### Tests added: - `test_create_branch_success` - Git branch creation - `test_create_branch_failure` - Error handling - `test_run_dry_run_mode` - Dry-run functionality - `test_run_with_custom_metric_fn` - Custom metric extraction - `test_run_single_iteration_success` - Single iteration flow - `test_run_stores_baseline_on_first_success` - Baseline tracking ## Backward Compatibility All standalone functions preserved: - `prepare_experiment()` - `run_experiment()` - `evaluate_result()` - `get_experiment_history()` - `_extract_metric()`, `_extract_pass_rate()`, `_extract_coverage()` ## Testing All 49 autoresearch tests pass.

kimi added 1 commit 2026-03-23 23:24:13 +00:00

refactor(autoresearch): refactor autoresearch.py -> SystemExperiment class

Tests / lint (pull_request) Failing after 16s

Details

Tests / test (pull_request) Has been skipped

Details

b2f970b77c

- Add run() method to SystemExperiment class for full experiment loop
- Add create_branch() method for git branch management
- Add metric_fn parameter for custom metric extraction
- Add revert_on_failure and hypothesis attributes
- Update CLI learn command to use new SystemExperiment.run() method
- Add tests for new functionality (run, create_branch, custom metric_fn)
- Maintain backward compatibility with standalone functions

Refs #906

kimi referenced this pull request

2026-03-23 23:24:26 +00:00

[Autoresearch H1] Refactor autoresearch.py → SystemExperiment Class #906

Rockachopa merged commit 261b7be468 into main

2026-03-23 23:28:55 +00:00

Rockachopa deleted branch kimi/issue-906

2026-03-23 23:28:56 +00:00

Rockachopa referenced this issue from a commit

2026-03-23 23:28:56 +00:00

[kimi] Refactor autoresearch.py -> SystemExperiment class (#906) (#1244)

Sign in to join this conversation.