feat: agent guardrails, headless smoke test, and guardrails CI #58

Closed
Timmy wants to merge 1 commits from feat/agent-guardrails-and-smoke-test into main
Owner

Summary

Three-layer defense against the class of silent correctness bugs surfaced in #54:

  1. AGENTS.md — documented rules for any contributor (human or AI) touching game.js. Lead rule: persistent multipliers (*Boost) must never be mutated inside functions that run per tick. Full rationale, war stories, and enforcement map in the file.
  2. scripts/smoke.mjs — headless smoke test. Boots game.js in a vm sandbox with a minimal DOM stub, no npm deps. Runs 30 assertions covering boot, tick loop, writeCode, building purchase, debuff safety, updateRates idempotency, and a save/load round-trip. <1s on a dev machine.
  3. scripts/guardrails.sh — static grep-based checks that encode the rules in AGENTS.md in a form CI can enforce at parse-time speed.

All three are run by a new Gitea Actions workflow (.gitea/workflows/guardrails.yml) on every PR and push to main.

Included bug fix: community_drama no longer decays codeBoost

The smoke test caught a silent correctness bug in the debuff system. The original applyFn for community_drama did:

applyFn: () => { G.harmonyRate -= 0.5; G.codeBoost *= 0.7; }

applyFn is invoked from updateRates() on every tick (~10 Hz). G.codeBoost is a persistent multiplier — it isn't reset between ticks — so the *= 0.7 compounded. After ~100 ticks of the debuff being active, codeBoost ≈ 3e-16, and the player's code production was effectively zero until they started a new run. The game kept rendering, the click button kept working, the player just slowly and silently lost.

The fix targets G.codeRate (which is reset at the top of updateRates() and therefore correctly applies *= 0.7 once per tick) instead of G.codeBoost. The observable behavior — code production at 70% while the debuff is active — matches the original design intent.

The smoke test has a dedicated assertion for this: it fires every event's effect() and then calls updateRates() 200 times, asserting G.codeBoost has not moved. Before the fix, that assertion produced codeBoost = 1.046e-31 from a starting value of 1.

What's in each file

AGENTS.md

Seven rules. The first four are about the interaction between G.*Boost, G.*Rate, and updateRates() — i.e. the exact thing community_drama got wrong. Rule 3 is the loadGame whitelist rule (now enforced since #55 landed the whitelist). Rules 5–7 cover event resolveCost duplication, G.flags conventions, and secrets/copyright hygiene.

scripts/smoke.mjs

  • Loads game.js via vm.runInContext with a stub document, window, localStorage, and timer functions.
  • Appends a small export block so const-declared engine values (G, BDEF, tick, ...) are reachable from the harness without patching game.js.
  • Runs 30 assertions in 8 sections.
  • Zero npm dependencies — pure Node built-ins, so the CI runner doesn't need a package.json.
  • Exit code is 1 if any assertion fails.

scripts/guardrails.sh

  • Rule 1 (no *Boost mutation inside applyFn) — hard fail. Uses an awk brace tracker to restrict matches to inside applyFn: bodies. Verified to catch the exact community_drama pattern before the fix.
  • Rule 2 (click power single source) — hard fail. #55's getClickPower() consolidation is now on main, so any regression would re-introduce duplicate sites and fail here.
  • Rule 3 (no Object.assign(G, data) in loadGame) — hard fail. #55's whitelist loader is now on main; regressions fail here.
  • Rule 7 (secret scan) — hard fail. Scans for sk-ant-, sk-or-, ghp_, AKIA prefixes across .js/.json/.md/.html/.yml/.yaml/.py/.sh. Filters out the files that contain these literal strings as examples (this script, the workflow, AGENTS.md) so it doesn't self-match.

.gitea/workflows/guardrails.yml

Three steps: node -c game.js, node scripts/smoke.mjs, bash scripts/guardrails.sh. Runs on pull_request and push: main.

Test plan

Run locally:

node scripts/smoke.mjs
# 30 passed, 0 failed

bash scripts/guardrails.sh
# Rule 1: PASS
# Rule 2: PASS (1 site)
# Rule 3: PASS
# Rule 7: PASS
# all guardrails passed

node -c game.js
# (exits 0)

To verify the community_drama fix is real, revert game.js:1537 and re-run node scripts/smoke.mjs — the "codeBoost stable under updateRates()" assertion should fail with a value near 1e-31.

Relationship to recent merges

While this PR was being written, #55, #53, and #52 all merged into main. Rebase was clean except for the G.flags = {} init which #55 also added (trivial dedupe). The community_drama fix was not touched by #55, so it remains the primary correctness win in this PR.

After #55 landed, rules 2 and 3 flipped from PENDING to hard failures — there's no longer anything on main that would violate them, so any future PR that introduces a second click-power formula or re-adds Object.assign(G, data) will be blocked at CI.

Intentionally deferred

  • Sprint × offline-gains exploit (#54 review comment): the fix is slightly more involved (needs to unbake the sprint multiplier before saving, or cap offline credit at the remaining sprint duration). Not in this PR to keep the diff focused.
  • Event resolveCost single source of truth (#54 review comment): needs a refactor of EVENTS entries; AGENTS.md rule 5 documents the interim "copy literally" rule until then.
  • BDEF.find / PDEFS.find Map optimization: micro-optimization at ~24 entries; not worth the diff here.

Happy to split any of the above into follow-up PRs.

Refs: #54

## Summary Three-layer defense against the class of silent correctness bugs surfaced in #54: 1. **`AGENTS.md`** — documented rules for any contributor (human or AI) touching `game.js`. Lead rule: *persistent multipliers (`*Boost`) must never be mutated inside functions that run per tick*. Full rationale, war stories, and enforcement map in the file. 2. **`scripts/smoke.mjs`** — headless smoke test. Boots `game.js` in a `vm` sandbox with a minimal DOM stub, no npm deps. Runs 30 assertions covering boot, tick loop, `writeCode`, building purchase, debuff safety, `updateRates` idempotency, and a save/load round-trip. <1s on a dev machine. 3. **`scripts/guardrails.sh`** — static grep-based checks that encode the rules in AGENTS.md in a form CI can enforce at parse-time speed. All three are run by a new Gitea Actions workflow (`.gitea/workflows/guardrails.yml`) on every PR and push to main. ## Included bug fix: `community_drama` no longer decays `codeBoost` The smoke test caught a silent correctness bug in the debuff system. The original `applyFn` for `community_drama` did: ```js applyFn: () => { G.harmonyRate -= 0.5; G.codeBoost *= 0.7; } ``` `applyFn` is invoked from `updateRates()` on every tick (~10 Hz). `G.codeBoost` is a *persistent* multiplier — it isn't reset between ticks — so the `*= 0.7` compounded. After ~100 ticks of the debuff being active, `codeBoost ≈ 3e-16`, and the player's code production was effectively zero until they started a new run. The game kept rendering, the click button kept working, the player just slowly and silently lost. The fix targets `G.codeRate` (which *is* reset at the top of `updateRates()` and therefore correctly applies `*= 0.7` once per tick) instead of `G.codeBoost`. The observable behavior — code production at 70% while the debuff is active — matches the original design intent. The smoke test has a dedicated assertion for this: it fires every event's `effect()` and then calls `updateRates()` 200 times, asserting `G.codeBoost` has not moved. Before the fix, that assertion produced `codeBoost = 1.046e-31` from a starting value of 1. ## What's in each file ### `AGENTS.md` Seven rules. The first four are about the interaction between `G.*Boost`, `G.*Rate`, and `updateRates()` — i.e. the exact thing `community_drama` got wrong. Rule 3 is the `loadGame` whitelist rule (now enforced since #55 landed the whitelist). Rules 5–7 cover event `resolveCost` duplication, `G.flags` conventions, and secrets/copyright hygiene. ### `scripts/smoke.mjs` - Loads `game.js` via `vm.runInContext` with a stub `document`, `window`, `localStorage`, and timer functions. - Appends a small export block so `const`-declared engine values (`G`, `BDEF`, `tick`, ...) are reachable from the harness without patching `game.js`. - Runs 30 assertions in 8 sections. - Zero npm dependencies — pure Node built-ins, so the CI runner doesn't need a `package.json`. - Exit code is 1 if any assertion fails. ### `scripts/guardrails.sh` - **Rule 1** (no `*Boost` mutation inside `applyFn`) — hard fail. Uses an `awk` brace tracker to restrict matches to inside `applyFn:` bodies. Verified to catch the exact `community_drama` pattern before the fix. - **Rule 2** (click power single source) — hard fail. #55's `getClickPower()` consolidation is now on main, so any regression would re-introduce duplicate sites and fail here. - **Rule 3** (no `Object.assign(G, data)` in loadGame) — hard fail. #55's whitelist loader is now on main; regressions fail here. - **Rule 7** (secret scan) — hard fail. Scans for `sk-ant-`, `sk-or-`, `ghp_`, `AKIA` prefixes across `.js/.json/.md/.html/.yml/.yaml/.py/.sh`. Filters out the files that contain these literal strings as examples (this script, the workflow, `AGENTS.md`) so it doesn't self-match. ### `.gitea/workflows/guardrails.yml` Three steps: `node -c game.js`, `node scripts/smoke.mjs`, `bash scripts/guardrails.sh`. Runs on `pull_request` and `push: main`. ## Test plan Run locally: ```sh node scripts/smoke.mjs # 30 passed, 0 failed bash scripts/guardrails.sh # Rule 1: PASS # Rule 2: PASS (1 site) # Rule 3: PASS # Rule 7: PASS # all guardrails passed node -c game.js # (exits 0) ``` To verify the `community_drama` fix is real, revert `game.js:1537` and re-run `node scripts/smoke.mjs` — the "codeBoost stable under updateRates()" assertion should fail with a value near 1e-31. ## Relationship to recent merges While this PR was being written, #55, #53, and #52 all merged into main. Rebase was clean except for the `G.flags = {}` init which #55 also added (trivial dedupe). The `community_drama` fix was *not* touched by #55, so it remains the primary correctness win in this PR. After #55 landed, rules 2 and 3 flipped from PENDING to hard failures — there's no longer anything on main that would violate them, so any future PR that introduces a second click-power formula or re-adds `Object.assign(G, data)` will be blocked at CI. ## Intentionally deferred - **Sprint × offline-gains exploit** (#54 review comment): the fix is slightly more involved (needs to unbake the sprint multiplier before saving, or cap offline credit at the remaining sprint duration). Not in this PR to keep the diff focused. - **Event `resolveCost` single source of truth** (#54 review comment): needs a refactor of `EVENTS` entries; AGENTS.md rule 5 documents the interim "copy literally" rule until then. - **`BDEF.find` / `PDEFS.find` Map optimization**: micro-optimization at ~24 entries; not worth the diff here. Happy to split any of the above into follow-up PRs. Refs: #54
Timmy added 1 commit 2026-04-11 00:49:21 +00:00
feat: agent guardrails, headless smoke test, and CI workflow
All checks were successful
Guardrails / guardrails (pull_request) Successful in 6s
b41889650d
Adds a three-layer defense against the class of silent correctness bugs
surfaced in #54. The layers are: documented rules (AGENTS.md), dynamic
assertions (scripts/smoke.mjs), and static checks (scripts/guardrails.sh).
A new Gitea Actions workflow runs the dynamic + static layers on every PR.

Also fixes two of the bugs the smoke test immediately caught on main:

1. G.flags is now initialized to {} in the globals block. Previously it
   was created lazily by the p_creativity project effect, which forced
   every reader to write `G.flags && G.flags.x` — and left a window where
   a new writer could drop the defensive guard and crash.

2. The community_drama debuff no longer mutates G.codeBoost. Its applyFn
   was invoked from updateRates() on every tick (10 Hz), so the original
   `G.codeBoost *= 0.7` compounded: after ~100 ticks of the drama debuff,
   codeBoost was ~3e-16 instead of the intended 0.7. The fix targets
   G.codeRate instead, which is reset at the top of updateRates() and is
   therefore safe to multiplicatively reduce inside applyFn. AGENTS.md
   rule 1 explains the distinction between persistent multipliers and
   per-tick rate fields so future debuffs don't reintroduce the bug.

The smoke test (`scripts/smoke.mjs`) runs game.js in a vm sandbox with a
minimal DOM stub, no npm deps. It boots the engine, runs ticks, clicks,
buys a building, fires every debuff, checks codeBoost stability, checks
updateRates idempotency, and does a save/load round-trip. 30 assertions,
~0.1s on a dev machine.

The static guardrails (`scripts/guardrails.sh`) grep for the patterns
AGENTS.md forbids. Two rules (click power single-source, no Object.assign
in loadGame) are marked PENDING because PR #55 is landing the fix for
them — the workflow reports them but doesn't fail until #55 merges.

Refs: #54
Timmy force-pushed feat/agent-guardrails-and-smoke-test from b41889650d to 0e5538319b 2026-04-11 00:51:02 +00:00 Compare
gemini reviewed 2026-04-11 01:09:39 +00:00
gemini left a comment
Member

Crucial guardrails for the game engine. AGENTS.md provides necessary context for future contributions.

Crucial guardrails for the game engine. AGENTS.md provides necessary context for future contributions.
Author
Owner

⚠️ Blocked: CI failures — Two checks are failing:

  • Smoke Test: Failing after 3s
  • Accessibility Checks (a11y-audit): Failing after 2s
    The Guardrails check passed. Please fix the failing checks before merging. The smoke test and a11y-audit workflows may need debugging.
⚠️ **Blocked: CI failures** — Two checks are failing: - **Smoke Test**: Failing after 3s - **Accessibility Checks (a11y-audit)**: Failing after 2s The **Guardrails** check passed. Please fix the failing checks before merging. The smoke test and a11y-audit workflows may need debugging.
gemini closed this pull request 2026-04-11 01:32:33 +00:00
Some checks failed
Accessibility Checks / a11y-audit (pull_request) Failing after 2s
Guardrails / guardrails (pull_request) Successful in 4s
Smoke Test / smoke (pull_request) Failing after 3s

Pull request closed

Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-beacon#58