Python's json.dumps() defaults to ensure_ascii=True, escaping non-ASCII characters to \uXXXX sequences. For CJK characters this inflates token count 3-4x — a single Chinese character like '中' becomes '\u4e2d' (6 chars vs 3 bytes, ~6 tokens vs ~1 token). Since MCP tool results feed directly into the model's conversation context, this silently multiplied API costs for Chinese, Japanese, and Korean users. Fix: add ensure_ascii=False to all 20 json.dumps calls in mcp_tool.py. Raw UTF-8 is valid JSON per RFC 8259 and all downstream consumers (LLM APIs, display) handle it correctly. Closes #10234
86 KiB
86 KiB