Files

Teknium e3f8347be3 feat(file_tools): harden read_file with size guard, dedup, and device blocking (#4315 )

* feat(file_tools): harden read_file with size guard, dedup, and device blocking

Three improvements to read_file_tool to reduce wasted context tokens and
prevent process hangs:

1. Character-count guard: reads that produce more than 100K characters
   (≈25-35K tokens across tokenisers) are rejected with an error that
   tells the model to use offset+limit for a smaller range.  The
   effective cap is min(file_size, 100K) so small files that happen to
   have long lines aren't over-penalised.  Large truncated files also
   get a hint nudging toward targeted reads.

2. File-read deduplication: when the same (path, offset, limit) is read
   a second time and the file hasn't been modified (mtime unchanged),
   return a lightweight stub instead of re-sending the full content.
   Writes and patches naturally change mtime, so post-edit reads always
   return fresh content.  The dedup cache is cleared on context
   compression — after compression the original read content is
   summarised away, so the model needs the full content again.

3. Device path blocking: paths like /dev/zero, /dev/random, /dev/stdin
   etc. are rejected before any I/O to prevent process hangs from
   infinite-output or blocking-input devices.

Tests: 17 new tests covering all three features plus the dedup-reset-
on-compression integration.  All 52 file-read tests pass (35 existing +
17 new).  Full tool suite (2124 tests) passes with 0 failures.

* feat: make file_read_max_chars configurable, add docs

Add file_read_max_chars to DEFAULT_CONFIG (default 100K).  read_file_tool
reads this on first call and caches for the process lifetime.  Users on
large-context models can raise it; users on small local models can lower it.

Also adds a 'File Read Safety' section to the configuration docs
explaining the char limit, dedup behavior, and example values.

2026-03-31 12:53:19 -07:00

docs

feat(file_tools): harden read_file with size guard, dedup, and device blocking (#4315 )

2026-03-31 12:53:19 -07:00

src/css

fix(docs): improve mobile sidebar navigation

2026-03-30 13:20:55 -07:00

static

docs: stabilize website diagrams

2026-03-14 22:49:57 -07:00

.gitignore

feat: add documentation website (Docusaurus)

2026-03-05 05:24:55 -08:00

docusaurus.config.ts

fix(docs): improve mobile sidebar navigation

2026-03-30 13:20:55 -07:00

package-lock.json

docs: stabilize website diagrams

2026-03-14 22:49:57 -07:00

package.json

docs: stabilize website diagrams

2026-03-14 22:49:57 -07:00

README.md

docs: replace ASCII diagrams with Mermaid/lists, add linting note

2026-03-21 17:58:30 -07:00

sidebars.ts

docs: restructure site navigation — promote features and platforms to top-level (#4116 )

2026-03-30 18:39:51 -07:00

tsconfig.json

feat: add documentation website (Docusaurus)

2026-03-05 05:24:55 -08:00

README.md

Website

This website is built using Docusaurus, a modern static website generator.

Installation

yarn

Local Development

yarn start

This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.

Build

yarn build

This command generates static content into the build directory and can be served using any static contents hosting service.

Deployment

Using SSH:

USE_SSH=true yarn deploy

Not using SSH:

GIT_USER=<Your GitHub username> yarn deploy

If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the gh-pages branch.

Diagram Linting

CI runs ascii-guard to lint docs for ASCII box diagrams. Use Mermaid (````mermaid`) or plain lists/tables instead of ASCII boxes to avoid CI failures.