Files

Teknium 57625329a2 docs+feat: comprehensive local LLM provider guides and context length warning (#4294 )

* docs: update llama.cpp section with --jinja flag and tool calling guide

The llama.cpp docs were missing the --jinja flag which is required for
tool calling to work. Without it, models output tool calls as raw JSON
text instead of structured API responses, making Hermes unable to
execute them.

Changes:
- Add --jinja and -fa flags to the server startup example
- Replace deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with
  hermes model interactive setup
- Add caution block explaining the --jinja requirement and symptoms
- List models with native tool calling support
- Add /props endpoint verification tip

* docs+feat: comprehensive local LLM provider guides and context length warning

Docs (providers.md):
- Rewrote Ollama section with context length warning (defaults to 4k on
  <24GB VRAM), three methods to increase it, and verification steps
- Rewrote vLLM section with --max-model-len, tool calling flags
  (--enable-auto-tool-choice, --tool-call-parser), and context guidance
- Rewrote SGLang section with --context-length, --tool-call-parser,
  and warning about 128-token default max output
- Added LM Studio section (port 1234, context length defaults to 2048,
  tool calling since 0.3.6)
- Added llama.cpp context length flag (-c) and GPU offload (-ngl)
- Added Troubleshooting Local Models section covering:
  - Tool calls appearing as text (with per-server fix table)
  - Silent context truncation and diagnosis commands
  - Low detected context at startup
  - Truncated responses
- Replaced all deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with
  hermes model interactive setup and config.yaml examples
- Added deprecation warning for legacy env vars in General Setup

Code (cli.py):
- Added context length warning in show_banner() when detected context
  is <= 8192 tokens, with server-specific fix hints:
  - Ollama (port 11434): suggests OLLAMA_CONTEXT_LENGTH env var
  - LM Studio (port 1234): suggests model settings adjustment
  - Other servers: suggests config.yaml override

Tests:
- 9 new tests covering warning thresholds, server-specific hints,
  and no-warning cases

2026-03-31 11:42:48 -07:00

docs

docs+feat: comprehensive local LLM provider guides and context length warning (#4294 )

2026-03-31 11:42:48 -07:00

src/css

fix(docs): improve mobile sidebar navigation

2026-03-30 13:20:55 -07:00

static

docs: stabilize website diagrams

2026-03-14 22:49:57 -07:00

.gitignore

feat: add documentation website (Docusaurus)

2026-03-05 05:24:55 -08:00

docusaurus.config.ts

fix(docs): improve mobile sidebar navigation

2026-03-30 13:20:55 -07:00

package-lock.json

docs: stabilize website diagrams

2026-03-14 22:49:57 -07:00

package.json

docs: stabilize website diagrams

2026-03-14 22:49:57 -07:00

README.md

docs: replace ASCII diagrams with Mermaid/lists, add linting note

2026-03-21 17:58:30 -07:00

sidebars.ts

docs: restructure site navigation — promote features and platforms to top-level (#4116 )

2026-03-30 18:39:51 -07:00

tsconfig.json

feat: add documentation website (Docusaurus)

2026-03-05 05:24:55 -08:00

README.md

Website

This website is built using Docusaurus, a modern static website generator.

Installation

yarn

Local Development

yarn start

This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.

Build

yarn build

This command generates static content into the build directory and can be served using any static contents hosting service.

Deployment

Using SSH:

USE_SSH=true yarn deploy

Not using SSH:

GIT_USER=<Your GitHub username> yarn deploy

If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the gh-pages branch.

Diagram Linting

CI runs ascii-guard to lint docs for ASCII box diagrams. Use Mermaid (````mermaid`) or plain lists/tables instead of ASCII boxes to avoid CI failures.