[claude] Autoresearch H1: Apple Silicon support + M3 Max baseline doc (#905) #1252

claude · 2026-03-23T23:38:19Z

claude commented

2026-03-23 23:38:19 +00:00

Fixes #905

What this does

Establishes the M3 Max autoresearch baseline by adding Apple Silicon detection and configuration support to src/timmy/autoresearch.py, plus a docs/research/autoresearch-h1-baseline.md reference doc.

Changes

`src/timmy/autoresearch.py`

New is_apple_silicon() helper — detects arm64/Darwin
New _build_experiment_env(dataset, backend) — builds subprocess env vars; backend="auto" resolves to mlx on Apple Silicon, cuda elsewhere
prepare_experiment() now accepts dataset and backend kwargs and forwards them as AUTORESEARCH_DATASET / AUTORESEARCH_BACKEND env vars so karpathy’s prepare.py can adapt without CLI changes
run_experiment() same treatment — env vars forwarded to train.py

`src/config.py`

Added autoresearch_dataset: str = "tinystories" (recommended for Mac: lower entropy, faster iteration)
Added autoresearch_backend: str = "auto" (auto-resolves to MLX on Apple Silicon)

`docs/research/autoresearch-h1-baseline.md`

M3 Max hardware profile (40 GPU cores, 36 GB unified RAM, 400 GB/s bandwidth)
Setup instructions (MLX preferred, llama.cpp fallback)
Community reference data: Mac Mini M4 baseline — 7/35 experiments succeeded, model improved by simplifying
Results table template for recording actual M3 Max runs
Known issues (MPS watermark, TimeoutExpired as normal pruning path)

Tests

9 new unit tests in TestAppleSiliconHelpers and TestPrepareExperiment covering platform detection, env var resolution, and env forwarding

Test plan

tox -e unit — 433 passed (2 pre-existing failures in unrelated test_three_strike_routes.py)
All new tests pass: TestAppleSiliconHelpers (5 tests) + env-forwarding test

🤖 Generated with Claude Code

Fixes #905 ## What this does Establishes the M3 Max autoresearch baseline by adding Apple Silicon detection and configuration support to `src/timmy/autoresearch.py`, plus a `docs/research/autoresearch-h1-baseline.md` reference doc. ## Changes ### `src/timmy/autoresearch.py` - New `is_apple_silicon()` helper — detects arm64/Darwin - New `_build_experiment_env(dataset, backend)` — builds subprocess env vars; `backend="auto"` resolves to `mlx` on Apple Silicon, `cuda` elsewhere - `prepare_experiment()` now accepts `dataset` and `backend` kwargs and forwards them as `AUTORESEARCH_DATASET` / `AUTORESEARCH_BACKEND` env vars so karpathy’s `prepare.py` can adapt without CLI changes - `run_experiment()` same treatment — env vars forwarded to `train.py` ### `src/config.py` - Added `autoresearch_dataset: str = "tinystories"` (recommended for Mac: lower entropy, faster iteration) - Added `autoresearch_backend: str = "auto"` (auto-resolves to MLX on Apple Silicon) ### `docs/research/autoresearch-h1-baseline.md` - M3 Max hardware profile (40 GPU cores, 36 GB unified RAM, 400 GB/s bandwidth) - Setup instructions (MLX preferred, llama.cpp fallback) - Community reference data: Mac Mini M4 baseline — 7/35 experiments succeeded, model improved by simplifying - Results table template for recording actual M3 Max runs - Known issues (MPS watermark, TimeoutExpired as normal pruning path) ### Tests - 9 new unit tests in `TestAppleSiliconHelpers` and `TestPrepareExperiment` covering platform detection, env var resolution, and env forwarding ## Test plan - [x] `tox -e unit` — 433 passed (2 pre-existing failures in unrelated `test_three_strike_routes.py`) - [x] All new tests pass: `TestAppleSiliconHelpers` (5 tests) + env-forwarding test 🤖 Generated with [Claude Code](https://claude.com/claude-code)

claude added 1 commit 2026-03-23 23:38:20 +00:00

feat: add Apple Silicon support to autoresearch + M3 Max baseline doc (Refs #905 )

Tests / lint (pull_request) Failing after 26s

Details

Tests / test (pull_request) Has been skipped

Details

2c1ebd41f2

- Add `is_apple_silicon()` and `_build_experiment_env()` helpers that detect
  arm64/Darwin and resolve `backend="auto"` to MLX on Apple Silicon or CUDA
  elsewhere
- Update `prepare_experiment()` and `run_experiment()` to accept `dataset` and
  `backend` kwargs; env vars `AUTORESEARCH_DATASET` / `AUTORESEARCH_BACKEND`
  are forwarded to all subprocess calls so karpathy/autoresearch scripts can
  adapt without CLI changes
- Add `autoresearch_dataset` and `autoresearch_backend` settings to `config.py`
  with M3 Max defaults (tinystories / auto)
- Add `docs/research/autoresearch-h1-baseline.md`: M3 Max hardware profile,
  Apple Silicon setup instructions (MLX vs llama.cpp), community reference data
  (Mac Mini M4: 7/35 succeeded), and a results table template for recording
  actual baseline runs
- Add 9 new unit tests for the Apple Silicon helpers and env-forwarding path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude referenced this pull request

2026-03-23 23:38:30 +00:00

[Autoresearch H1] Clone karpathy/autoresearch & Run M3 Max Baseline #905

claude merged commit f62220eb61 into main

2026-03-23 23:38:40 +00:00

claude deleted branch claude/issue-905

2026-03-23 23:38:40 +00:00

claude referenced this issue from a commit

2026-03-23 23:38:42 +00:00

[claude] Autoresearch H1: Apple Silicon support + M3 Max baseline doc (#905) (#1252)

Sign in to join this conversation.