[Autoresearch H1] Implement python -m timmy.cli learn Entry Point #907

Closed
opened 2026-03-22 13:06:05 +00:00 by perplexity · 1 comment
Collaborator

Parent

Part of #904 (Autoresearch Integration Proposal v2) — Action Item #6

Goal

Create a CLI entry point that triggers the autoresearch loop. The agent reads program.md (research direction file written by rockachopa) and iterates through experiments autonomously.

Implementation

  1. Add learn subcommand to src/timmy/cli.py
  2. On invocation:
    • Read program.md from repo root for research direction
    • Initialize SystemExperiment with target module and metric
    • Enter autoresearch loop: hypothesis → edit → tox → evaluate → commit/revert
    • Respect budget constraints (default 5 min per experiment)
  3. CLI flags:
    • --target: module or file to optimize
    • --metric: which metric function to use
    • --budget: time limit per experiment (minutes)
    • --max-experiments: cap on total experiments per run
    • --dry-run: show hypothesis without executing

Dependencies

  • SystemExperiment class must exist (sibling issue)
  • Aider integration already in toolkit (wire as execution engine)

Deliverable

python -m timmy.cli learn starts an autonomous improvement loop against the metric specified in program.md

Cross-references

  • #904 (Autoresearch Epic)
  • kimi-task backlog (~12 open refactoring tasks become initial experiment queue)

Owner

Kimi (account #5)

## Parent Part of #904 (Autoresearch Integration Proposal v2) — Action Item #6 ## Goal Create a CLI entry point that triggers the autoresearch loop. The agent reads `program.md` (research direction file written by rockachopa) and iterates through experiments autonomously. ## Implementation 1. Add `learn` subcommand to `src/timmy/cli.py` 2. On invocation: - Read `program.md` from repo root for research direction - Initialize `SystemExperiment` with target module and metric - Enter autoresearch loop: hypothesis → edit → tox → evaluate → commit/revert - Respect budget constraints (default 5 min per experiment) 3. CLI flags: - `--target`: module or file to optimize - `--metric`: which metric function to use - `--budget`: time limit per experiment (minutes) - `--max-experiments`: cap on total experiments per run - `--dry-run`: show hypothesis without executing ## Dependencies - SystemExperiment class must exist (sibling issue) - Aider integration already in toolkit (wire as execution engine) ## Deliverable `python -m timmy.cli learn` starts an autonomous improvement loop against the metric specified in program.md ## Cross-references - #904 (Autoresearch Epic) - kimi-task backlog (~12 open refactoring tasks become initial experiment queue) ## Owner Kimi (account #5)
claude was assigned by Rockachopa 2026-03-22 23:32:22 +00:00
claude added the harnessheartbeatp0-critical labels 2026-03-23 13:52:46 +00:00
Collaborator

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1240

Implemented the timmy learn autoresearch entry point:

  • SystemExperiment class (autoresearch.py) — encapsulates hypothesis → Aider edit → tox → evaluate → commit/revert with configurable target, metric, and budget. Supports unit_pass_rate and coverage (higher-is-better) alongside val_bpb and custom metrics.
  • timmy learn command (cli.py) — flags: --target, --metric, --budget, --max-experiments, --dry-run, --tox-env, --model. Reads program.md for research direction; degrades gracefully when file is absent or Aider is not installed.
  • program.md — starter research direction template in repo root.
  • 21 new unit tests — all 400 unit tests pass.
PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1240 Implemented the `timmy learn` autoresearch entry point: - **`SystemExperiment` class** (`autoresearch.py`) — encapsulates hypothesis → Aider edit → tox → evaluate → commit/revert with configurable target, metric, and budget. Supports `unit_pass_rate` and `coverage` (higher-is-better) alongside `val_bpb` and custom metrics. - **`timmy learn` command** (`cli.py`) — flags: `--target`, `--metric`, `--budget`, `--max-experiments`, `--dry-run`, `--tox-env`, `--model`. Reads `program.md` for research direction; degrades gracefully when file is absent or Aider is not installed. - **`program.md`** — starter research direction template in repo root. - **21 new unit tests** — all 400 unit tests pass.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#907