[Autoresearch H1] Implement python -m timmy.cli learn Entry Point #907

New Issue

perplexity · 2026-03-22T13:06:05Z

perplexity commented

2026-03-22 13:06:05 +00:00

Parent

Part of #904 (Autoresearch Integration Proposal v2) — Action Item #6

Goal

Create a CLI entry point that triggers the autoresearch loop. The agent reads program.md (research direction file written by rockachopa) and iterates through experiments autonomously.

Implementation

Add learn subcommand to src/timmy/cli.py
On invocation:
- Read program.md from repo root for research direction
- Initialize SystemExperiment with target module and metric
- Enter autoresearch loop: hypothesis → edit → tox → evaluate → commit/revert
- Respect budget constraints (default 5 min per experiment)
CLI flags:
- --target: module or file to optimize
- --metric: which metric function to use
- --budget: time limit per experiment (minutes)
- --max-experiments: cap on total experiments per run
- --dry-run: show hypothesis without executing

Dependencies

SystemExperiment class must exist (sibling issue)
Aider integration already in toolkit (wire as execution engine)

Deliverable

python -m timmy.cli learn starts an autonomous improvement loop against the metric specified in program.md

Cross-references

#904 (Autoresearch Epic)
kimi-task backlog (~12 open refactoring tasks become initial experiment queue)

Owner

Kimi (account #5)

## Parent Part of #904 (Autoresearch Integration Proposal v2) — Action Item #6 ## Goal Create a CLI entry point that triggers the autoresearch loop. The agent reads `program.md` (research direction file written by rockachopa) and iterates through experiments autonomously. ## Implementation 1. Add `learn` subcommand to `src/timmy/cli.py` 2. On invocation: - Read `program.md` from repo root for research direction - Initialize `SystemExperiment` with target module and metric - Enter autoresearch loop: hypothesis → edit → tox → evaluate → commit/revert - Respect budget constraints (default 5 min per experiment) 3. CLI flags: - `--target`: module or file to optimize - `--metric`: which metric function to use - `--budget`: time limit per experiment (minutes) - `--max-experiments`: cap on total experiments per run - `--dry-run`: show hypothesis without executing ## Dependencies - SystemExperiment class must exist (sibling issue) - Aider integration already in toolkit (wire as execution engine) ## Deliverable `python -m timmy.cli learn` starts an autonomous improvement loop against the metric specified in program.md ## Cross-references - #904 (Autoresearch Epic) - kimi-task backlog (~12 open refactoring tasks become initial experiment queue) ## Owner Kimi (account #5)

perplexity referenced this issue

2026-03-22 13:06:06 +00:00

[Study] Autoresearch Integration Proposal v2 — Karpathy's Self-Improvement Loop for Timmy Time #904

claude was assigned by Rockachopa

2026-03-22 23:32:22 +00:00

claude added the harness heartbeat p0-critical labels 2026-03-23 13:52:46 +00:00

claude referenced this issue from a commit

2026-03-23 23:06:45 +00:00

feat: add timmy learn CLI command and SystemExperiment class

claude referenced a pull request that will close this issue

2026-03-23 23:06:58 +00:00

[claude] Add timmy learn autoresearch entry point (#907) #1240

claude commented

2026-03-23 23:07:07 +00:00

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1240

Implemented the timmy learn autoresearch entry point:

SystemExperiment class (autoresearch.py) — encapsulates hypothesis → Aider edit → tox → evaluate → commit/revert with configurable target, metric, and budget. Supports unit_pass_rate and coverage (higher-is-better) alongside val_bpb and custom metrics.
timmy learn command (cli.py) — flags: --target, --metric, --budget, --max-experiments, --dry-run, --tox-env, --model. Reads program.md for research direction; degrades gracefully when file is absent or Aider is not installed.
program.md — starter research direction template in repo root.
21 new unit tests — all 400 unit tests pass.

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1240 Implemented the `timmy learn` autoresearch entry point: - **`SystemExperiment` class** (`autoresearch.py`) — encapsulates hypothesis → Aider edit → tox → evaluate → commit/revert with configurable target, metric, and budget. Supports `unit_pass_rate` and `coverage` (higher-is-better) alongside `val_bpb` and custom metrics. - **`timmy learn` command** (`cli.py`) — flags: `--target`, `--metric`, `--budget`, `--max-experiments`, `--dry-run`, `--tox-env`, `--model`. Reads `program.md` for research direction; degrades gracefully when file is absent or Aider is not installed. - **`program.md`** — starter research direction template in repo root. - **21 new unit tests** — all 400 unit tests pass.

claude closed this issue

2026-03-23 23:09:07 +00:00

Rockachopa referenced this issue from a commit

2026-03-23 23:14:12 +00:00

[claude] Add timmy learn autoresearch entry point (#907) (#1240)

Sign in to join this conversation.

Branches Tags

main

gemini/issue-892

claude/issue-1342

claude/issue-1346

claude/issue-1351

claude/issue-1340

fix/test-llm-triage-syntax

gemini/issue-1014

gemini/issue-932

claude/issue-1277

claude/issue-1139

claude/issue-870

claude/issue-1285

claude/issue-1292

claude/issue-1281

claude/issue-917

claude/issue-1275

claude/issue-925

claude/issue-1019

claude/issue-1094

claude/issue-1019-v3

fix/flaky-vassal-xdist-tests

fix/test-config-env-isolation

claude/issue-1019-v2

claude/issue-957-v2

claude/issue-1218

claude/issue-1217

test/chat-store-unit-tests

claude/issue-1191

claude/issue-1186

claude/issue-957

gemini/issue-936

claude/issue-1065

gemini/issue-976

gemini/issue-1149

claude/issue-1135

claude/issue-1064

gemini/issue-1012

claude/issue-1095

claude/issue-1102

claude/issue-1114

gemini/issue-978

gemini/issue-971

claude/issue-1074

claude/issue-987

claude/issue-1011

feature/internal-monologue

feature/issue-1006

feature/issue-1007

feature/issue-1008

feature/issue-1009

feature/issue-1010

feature/issue-1011

feature/issue-1012

feature/issue-1013

feature/issue-1014

feature/issue-981

feature/issue-982

feature/issue-983

feature/issue-984

feature/issue-985

feature/issue-986

feature/issue-987

feature/issue-993

claude/issue-943

claude/issue-975

claude/issue-989

claude/issue-988

fix/loop-guard-gitea-api-and-queue-validation

feature/lhf-tech-debt-fixes

kimi/issue-753

kimi/issue-714

kimi/issue-716

fix/csrf-check-before-execute

chore/migrate-gitea-to-vps

kimi/issue-640

fix/utcnow-calm-py

kimi/issue-635

kimi/issue-625

fix/router-api-truncated-param

kimi/issue-604

kimi/issue-594

review-fixes

kimi/issue-570

kimi/issue-554

kimi/issue-539

kimi/issue-540

feature/ipad-v1-api

kimi/issue-506

kimi/issue-512

refactor/airllm-doc-cleanup

kimi/issue-513

kimi/issue-514

kimi/issue-500

kimi/issue-492

kimi/issue-490

kimi/issue-459

kimi/issue-472

kimi/issue-473

kimi/issue-462

kimi/issue-463

kimi/issue-454

kimi/issue-445

kimi/issue-446

kimi/issue-431

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#907

[Autoresearch H1] Implement python -m timmy.cli learn Entry Point #907