Files
hermes-agent/tools/ansi_strip.py
Teknium 934fbe3c06 fix: strip ANSI at the source — clean terminal output before it reaches the model
Root cause: terminal_tool, execute_code, and process_registry returned raw
subprocess output with ANSI escape sequences intact. The model saw these
in tool results and copied them into file writes.

Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py,
but this was a band-aid — regex on file content risks corrupting legitimate
content, and doesn't prevent ANSI from wasting tokens in the model context.

Source-level fix:
- New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI
  (incl. private-mode, colon-separated, intermediate bytes), OSC (both
  terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1
- terminal_tool.py: strip output before returning to model
- code_execution_tool.py: strip stdout/stderr before returning
- process_registry.py: strip output in poll/read_log/wait
- file_tools.py: remove _strip_ansi band-aid (no longer needed)

Verified: `ls --color=always` output returned as clean text to model,
file written from that output contains zero ESC bytes.
2026-03-23 07:43:12 -07:00

45 lines
1.7 KiB
Python

"""Strip ANSI escape sequences from subprocess output.
Used by terminal_tool, code_execution_tool, and process_registry to clean
command output before returning it to the model. This prevents ANSI codes
from entering the model's context — which is the root cause of models
copying escape sequences into file writes.
Covers the full ECMA-48 spec: CSI (including private-mode ``?`` prefix,
colon-separated params, intermediate bytes), OSC (BEL and ST terminators),
DCS/SOS/PM/APC string sequences, nF multi-byte escapes, Fp/Fe/Fs
single-byte escapes, and 8-bit C1 control characters.
"""
import re
_ANSI_ESCAPE_RE = re.compile(
r"\x1b"
r"(?:"
r"\[[\x30-\x3f]*[\x20-\x2f]*[\x40-\x7e]" # CSI sequence
r"|\][\s\S]*?(?:\x07|\x1b\\)" # OSC (BEL or ST terminator)
r"|[PX^_][\s\S]*?(?:\x1b\\)" # DCS/SOS/PM/APC strings
r"|[\x20-\x2f]+[\x30-\x7e]" # nF escape sequences
r"|[\x30-\x7e]" # Fp/Fe/Fs single-byte
r")"
r"|\x9b[\x30-\x3f]*[\x20-\x2f]*[\x40-\x7e]" # 8-bit CSI
r"|\x9d[\s\S]*?(?:\x07|\x9c)" # 8-bit OSC
r"|[\x80-\x9f]", # Other 8-bit C1 controls
re.DOTALL,
)
# Fast-path check — skip full regex when no escape-like bytes are present.
_HAS_ESCAPE = re.compile(r"[\x1b\x80-\x9f]")
def strip_ansi(text: str) -> str:
"""Remove ANSI escape sequences from text.
Returns the input unchanged (fast path) when no ESC or C1 bytes are
present. Safe to call on any string — clean text passes through
with negligible overhead.
"""
if not text or not _HAS_ESCAPE.search(text):
return text
return _ANSI_ESCAPE_RE.sub("", text)