Commit Graph

5 Commits

Author SHA1 Message Date
Brooklyn Nicholson
0399d4b976 perf(tui): shave ~190ms off hermes --tui cold start
Two targeted fixes on the critical path from `hermes --tui` launch to
`gateway.ready`:

1. **Defer `@hermes/ink` import in memoryMonitor.ts.** The static top-level
   import dragged the full ~414KB Ink bundle (React + renderer + all
   components/hooks) onto the critical path *before* `gw.start()` could
   spawn the Python gateway — serialising ~155ms of Node work in front of
   it on every launch. `evictInkCaches` only runs inside the 10-second
   tick under heap pressure, so it moves to a lazy dynamic import. First
   tick hits the ESM cache because the app entry has long since imported
   `@hermes/ink`.

2. **Gate `tools.mcp_tool` import on config in tui_gateway/entry.py.**
   Importing the module transitively pulls the MCP SDK + pydantic + httpx
   + jsonschema + starlette formparsers (~200ms). The overwhelming
   majority of users have no `mcp_servers` configured, so this runs for
   nothing. A cheap `load_config()` check (~25ms) skips the 200ms import
   when no servers are declared, with a conservative fallback to the old
   behaviour if the config probe itself fails.

## Measurements (macOS Terminal.app, Apple Silicon, n=12)

| Metric                     | Before (p50) | After (p50) | Δ        |
|----------------------------|--------------|-------------|----------|
| Python gateway boot alone  | 252–365ms    | 105–151ms   | −180ms   |
| `hermes --tui` banner paint | 686ms        | 665ms       | −21ms    |
| `hermes --tui` → ready      | **1843ms**   | **1655ms**  | **−188ms (−10.2%)** |
| `hermes --tui` → ready p90  | 1932ms       | 1778ms      | −154ms   |
| stdev (ready)              | 126ms        | 83ms        | also more consistent |

## Tests

- `scripts/run_tests.sh tests/tui_gateway/ tests/tools/test_mcp_tool.py`:
  195 passed.  (The one pre-existing failure in
  `test_session_resume_returns_hydrated_messages` reproduces on main —
  unrelated, it's a mock-DB kwarg mismatch.)
- `ui-tui` vitest: 430 tests, all pass.
- `npm run type-check` in ui-tui: clean.

## Notes

- Node-side first paint ("banner") didn't move meaningfully because that
  latency is dominated by Ink's render pipeline + React mount, not by
  which imports load first.
- The win shows up entirely in the time from banner to `gateway.ready`
  — exactly where we expected it, since both fixes shorten the Python
  gateway's boot path or let it overlap more with Node startup.
- No user-visible behaviour change. Memory monitoring still fires every
  10s; MCP still works when `mcp_servers` is configured.
2026-04-28 19:42:31 -05:00
Brooklyn Nicholson
b1c49d5e73 chore(tui): /clean recent perf work — KISS/DRY pass
24 files, -319 LoC. Behaviour preserved, 369/369 tests green.

- hermes-ink caches: shared lruEvict helper for the four parallel LRU
  caches (stringWidth, wrapText, sliceAnsi, lineWidth); touch-on-read
  stays inlined per cache; tightened output.ts skip-slice fast path.
- wheelAccel: trimmed provenance header, collapsed env parsing, ternary
  dispatch in computeWheelStep.
- perfPane: folded ensureLogDir into once-flag, spread-with-overrides
  for fastPath/phases instead of full rebuilds.
- env: extracted truthy() (used 4×).
- virtualHeights: collapsed user/diff/slash height bumps; trail+todos
  estimate.
- useInputHandlers: scrollIdleTimer cleanup on unmount, ?? undefined
  shorthand.
- useMainApp: dropped dead liveTailVisible IIFE and liveProgress
  indirection.
- appLayout, markdown, messageLine, entry: vertical rhythm, dropped
  narration comments, inlined one-shot vars.
- fix: empty catch blocks → /* best-effort */ for no-empty lint.
2026-04-26 20:38:47 -05:00
Brooklyn Nicholson
25767513f2 perf(tui): unified Ink cache eviction on memory pressure + session reset
Adds an `evictInkCaches(level)` API that prunes the four hot module-level
caches (`widthCache`, `wrapCache`, `sliceCache`, `lineWidthCache`) with
either a half-keep LRU pass or a full clear. Wired into:

- memoryMonitor: half-prune on 'high', full drop on 'critical', before
  the heap dump / auto-restart path. Gives long sessions a shot at
  recovering RSS instead of hard-exiting.
- useSessionLifecycle.resetSession: half-prune so a /new session starts
  with a half-warm pool and the prior session can resume cheaply.

Also: lineWidthCache now uses LRU half-eviction on overflow instead of a
full `cache.clear()`, matching the other three caches.

Comparison vs claude-code: both forks now share the same `prevScreen`
blit + dirty-cascade machinery in render-node-to-output. Their smoothness
came from sibling-memo discipline (every chrome pane memo'd so dirty
cascade doesn't disable transcript blit) — already in place in our
appLayout.tsx (TranscriptPane / ComposerPane / StatusRulePane all memo'd).
Alt-screen is not the cause; both use it. The remaining gap was per-row
CPU on width/wrap/slice, which the previous commit closed.
2026-04-26 19:41:53 -05:00
Brooklyn Nicholson
82b927777c refactor(tui): /clean pass on memory + resize helpers
KISS/DRY sweep — drops ~90 LOC with no behavior change.

- circularBuffer: drop unused pushAll/toArray/size; fold toArray into drain
- gracefulExit: inline Cleanup type + failsafe const; signal→code as a
  record instead of nested ternary; drop dead .catch on Promise.allSettled;
  drop unused forceExit
- memory: inline heapDumpRoot() + writeSnapshot() (single-use); collapse
  the two fd/smaps try/catch blocks behind one `swallow` helper; build
  potentialLeaks functionally (array+filter) instead of imperative
  push-chain; UNITS at file bottom
- memoryMonitor: inline DEFAULTS; drop unused onSnapshot; collapse
  dumpedHigh/dumpedCritical bools to a single Set; single callback
  dispatch line instead of duplicated if-chains
- entry.tsx: factor `dumpNotice` formatter (used twice by onHigh +
  onCritical)
- useMainApp resize debounce: drop redundant `if (timer)` guards
  (clearTimeout(undefined) is a no-op); init as undefined not null
- useVirtualHistory: trim wall-of-text comment to one-line intent; hoist
  `const n = items.length`; split comma-declared lets; remove the
  `;[start, end] = frozenRange` destructure in favor of direct Math.min
  clamps; hoist `hi` init in upperBound for consistency

Validation: tsc clean (both configs), eslint clean on touched files,
vitest 102/102, build produces shebang-preserved dist/entry.js,
performHeapDump smoke-test still writes valid snapshot + diagnostics.
2026-04-20 18:58:44 -05:00
Brooklyn Nicholson
0785aec444 fix(tui): harden against Node V8 OOM + GatewayClient memory leaks
Long TUI sessions were crashing Node via V8 fatal-OOM once transcripts +
reasoning blobs crossed the default 1.5–4GB heap cap. This adds defense
in depth: a bigger heap, leak-proofing the RPC hot path, bounded
diagnostic buffers, automatic heap dumps at high-water marks, and
graceful signal / uncaught handlers.

## Changes

### Heap budget
- hermes_cli/main.py: `_launch_tui` now injects `NODE_OPTIONS=
  --max-old-space-size=8192 --expose-gc` (appended — does not clobber
  user-supplied NODE_OPTIONS). Covers both `node dist/entry.js` and
  `tsx src/entry.tsx` launch paths.
- ui-tui/src/entry.tsx: shebang rewritten to
  `#!/usr/bin/env -S node --max-old-space-size=8192 --expose-gc` as a
  fallback when the binary is invoked directly.

### GatewayClient (ui-tui/src/gatewayClient.ts)
- `setMaxListeners(0)` — silences spurious warnings from React hook
  subscribers.
- `logs` and `bufferedEvents` replaced with fixed-capacity
  CircularBuffer — O(1) push, no splice(0, …) copies under load.
- RPC timeout refactor: `setTimeout(this.onTimeout.bind(this), …, id)`
  replaces the inline arrow closure that captured `method`/`params`/
  `resolve`/`reject` for the full 120 s request timeout. Each Pending
  record now stores its own timeout handle, `.unref()`'d so stuck
  timers never keep the event loop alive, and `rejectPending()` clears
  them (previously leaked the timer itself).

### Memory diagnostics (new)
- ui-tui/src/lib/memory.ts: `performHeapDump()` +
  `captureMemoryDiagnostics()`. Writes heap snapshot + JSON diag
  sidecar to `~/.hermes/heapdumps/` (override via
  `HERMES_HEAPDUMP_DIR`). Diagnostics are written first so we still get
  useful data if the snapshot crashes on very large heaps.
  Captures: detached V8 contexts (closure-leak signal), active
  handles/requests (`process._getActiveHandles/_getActiveRequests`),
  Linux `/proc/self/fd` count + `/proc/self/smaps_rollup`, heap growth
  rate (MB/hr), and auto-classifies likely leak sources.
- ui-tui/src/lib/memoryMonitor.ts: 10 s interval polling heapUsed. At
  1.5 GB writes an auto heap dump (trigger=`auto-high`); at 2.5 GB
  writes a final dump and exits 137 before V8 fatal-OOMs so the user
  can restart cleanly. Handle is `.unref()`'d so it never holds the
  process open.

### Graceful exit (new)
- ui-tui/src/lib/gracefulExit.ts: SIGINT/SIGTERM/SIGHUP run registered
  cleanups through a 4 s failsafe `setTimeout` that hard-exits if
  cleanup hangs.
  `uncaughtException` / `unhandledRejection` are logged to stderr
  instead of crashing — a transient TUI render error should not kill
  an in-flight agent turn.

### Slash commands (new)
- ui-tui/src/app/slash/commands/debug.ts:
  - `/heapdump` — manual snapshot + diagnostics.
  - `/mem` — live heap / rss / external / array-buffer / uptime panel.
- Registered in `ui-tui/src/app/slash/registry.ts`.

### Utility (new)
- ui-tui/src/lib/circularBuffer.ts: small fixed-capacity ring buffer
  with `push` / `tail(n)` / `drain()` / `clear()`. Replaces the ad-hoc
  `array.splice(0, len - MAX)` pattern.

## Validation

- tsc `--noEmit` clean
- `vitest run`: 15 files, 102 tests passing
- eslint clean on all touched/new files
- build produces executable `dist/entry.js` with preserved shebang
- smoke-tested: `HERMES_HEAPDUMP_DIR=… performHeapDump('manual')`
  writes both a valid `.heapsnapshot` and a `.diagnostics.json`
  containing detached-contexts, active-handles, smaps_rollup.

## Env knobs
- `HERMES_HEAPDUMP_DIR` — override snapshot output dir
- `HERMES_HEAPDUMP_ON_START=1` — dump once at boot
- existing `NODE_OPTIONS` is respected and appended, not replaced
2026-04-20 18:58:44 -05:00