fix(gateway): replace os.environ session state with contextvars for concurrency safety
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes #7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-04-10 16:50:56 -07:00
|
|
|
"""
|
|
|
|
|
Session-scoped context variables for the Hermes gateway.
|
|
|
|
|
|
|
|
|
|
Replaces the previous ``os.environ``-based session state
|
|
|
|
|
(``HERMES_SESSION_PLATFORM``, ``HERMES_SESSION_CHAT_ID``, etc.) with
|
|
|
|
|
Python's ``contextvars.ContextVar``.
|
|
|
|
|
|
|
|
|
|
**Why this matters**
|
|
|
|
|
|
|
|
|
|
The gateway processes messages concurrently via ``asyncio``. When two
|
|
|
|
|
messages arrive at the same time the old code did:
|
|
|
|
|
|
|
|
|
|
os.environ["HERMES_SESSION_THREAD_ID"] = str(context.source.thread_id)
|
|
|
|
|
|
|
|
|
|
Because ``os.environ`` is *process-global*, Message A's value was
|
|
|
|
|
silently overwritten by Message B before Message A's agent finished
|
|
|
|
|
running. Background-task notifications and tool calls therefore routed
|
|
|
|
|
to the wrong thread.
|
|
|
|
|
|
|
|
|
|
``contextvars.ContextVar`` values are *task-local*: each ``asyncio``
|
|
|
|
|
task (and any ``run_in_executor`` thread it spawns) gets its own copy,
|
|
|
|
|
so concurrent messages never interfere.
|
|
|
|
|
|
|
|
|
|
**Backward compatibility**
|
|
|
|
|
|
|
|
|
|
The public helper ``get_session_env(name, default="")`` mirrors the old
|
|
|
|
|
``os.getenv("HERMES_SESSION_*", ...)`` calls. Existing tool code only
|
|
|
|
|
needs to replace the import + call site:
|
|
|
|
|
|
|
|
|
|
# before
|
|
|
|
|
import os
|
|
|
|
|
platform = os.getenv("HERMES_SESSION_PLATFORM", "")
|
|
|
|
|
|
|
|
|
|
# after
|
|
|
|
|
from gateway.session_context import get_session_env
|
|
|
|
|
platform = get_session_env("HERMES_SESSION_PLATFORM", "")
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
from contextvars import ContextVar
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Per-task session variables
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
_SESSION_PLATFORM: ContextVar[str] = ContextVar("HERMES_SESSION_PLATFORM", default="")
|
|
|
|
|
_SESSION_CHAT_ID: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_ID", default="")
|
|
|
|
|
_SESSION_CHAT_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_NAME", default="")
|
|
|
|
|
_SESSION_THREAD_ID: ContextVar[str] = ContextVar("HERMES_SESSION_THREAD_ID", default="")
|
fix(gateway): propagate user identity through process watcher pipeline
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes #6341, #6485, #7643
Relates to #6516, #7392
2026-04-11 12:09:01 -07:00
|
|
|
_SESSION_USER_ID: ContextVar[str] = ContextVar("HERMES_SESSION_USER_ID", default="")
|
|
|
|
|
_SESSION_USER_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_USER_NAME", default="")
|
2026-04-11 15:28:41 -07:00
|
|
|
_SESSION_KEY: ContextVar[str] = ContextVar("HERMES_SESSION_KEY", default="")
|
fix(gateway): replace os.environ session state with contextvars for concurrency safety
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes #7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-04-10 16:50:56 -07:00
|
|
|
|
|
|
|
|
_VAR_MAP = {
|
|
|
|
|
"HERMES_SESSION_PLATFORM": _SESSION_PLATFORM,
|
|
|
|
|
"HERMES_SESSION_CHAT_ID": _SESSION_CHAT_ID,
|
|
|
|
|
"HERMES_SESSION_CHAT_NAME": _SESSION_CHAT_NAME,
|
|
|
|
|
"HERMES_SESSION_THREAD_ID": _SESSION_THREAD_ID,
|
fix(gateway): propagate user identity through process watcher pipeline
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes #6341, #6485, #7643
Relates to #6516, #7392
2026-04-11 12:09:01 -07:00
|
|
|
"HERMES_SESSION_USER_ID": _SESSION_USER_ID,
|
|
|
|
|
"HERMES_SESSION_USER_NAME": _SESSION_USER_NAME,
|
2026-04-11 15:28:41 -07:00
|
|
|
"HERMES_SESSION_KEY": _SESSION_KEY,
|
fix(gateway): replace os.environ session state with contextvars for concurrency safety
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes #7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-04-10 16:50:56 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def set_session_vars(
|
|
|
|
|
platform: str = "",
|
|
|
|
|
chat_id: str = "",
|
|
|
|
|
chat_name: str = "",
|
|
|
|
|
thread_id: str = "",
|
fix(gateway): propagate user identity through process watcher pipeline
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes #6341, #6485, #7643
Relates to #6516, #7392
2026-04-11 12:09:01 -07:00
|
|
|
user_id: str = "",
|
|
|
|
|
user_name: str = "",
|
2026-04-11 15:28:41 -07:00
|
|
|
session_key: str = "",
|
fix(gateway): replace os.environ session state with contextvars for concurrency safety
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes #7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-04-10 16:50:56 -07:00
|
|
|
) -> list:
|
|
|
|
|
"""Set all session context variables and return reset tokens.
|
|
|
|
|
|
|
|
|
|
Call ``clear_session_vars(tokens)`` in a ``finally`` block to restore
|
|
|
|
|
the previous values when the handler exits.
|
|
|
|
|
|
|
|
|
|
Returns a list of ``Token`` objects (one per variable) that can be
|
|
|
|
|
passed to ``clear_session_vars``.
|
|
|
|
|
"""
|
|
|
|
|
tokens = [
|
|
|
|
|
_SESSION_PLATFORM.set(platform),
|
|
|
|
|
_SESSION_CHAT_ID.set(chat_id),
|
|
|
|
|
_SESSION_CHAT_NAME.set(chat_name),
|
|
|
|
|
_SESSION_THREAD_ID.set(thread_id),
|
fix(gateway): propagate user identity through process watcher pipeline
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes #6341, #6485, #7643
Relates to #6516, #7392
2026-04-11 12:09:01 -07:00
|
|
|
_SESSION_USER_ID.set(user_id),
|
|
|
|
|
_SESSION_USER_NAME.set(user_name),
|
2026-04-11 15:28:41 -07:00
|
|
|
_SESSION_KEY.set(session_key),
|
fix(gateway): replace os.environ session state with contextvars for concurrency safety
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes #7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-04-10 16:50:56 -07:00
|
|
|
]
|
|
|
|
|
return tokens
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def clear_session_vars(tokens: list) -> None:
|
|
|
|
|
"""Restore session context variables to their pre-handler values."""
|
|
|
|
|
if not tokens:
|
|
|
|
|
return
|
|
|
|
|
vars_in_order = [
|
|
|
|
|
_SESSION_PLATFORM,
|
|
|
|
|
_SESSION_CHAT_ID,
|
|
|
|
|
_SESSION_CHAT_NAME,
|
|
|
|
|
_SESSION_THREAD_ID,
|
fix(gateway): propagate user identity through process watcher pipeline
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes #6341, #6485, #7643
Relates to #6516, #7392
2026-04-11 12:09:01 -07:00
|
|
|
_SESSION_USER_ID,
|
|
|
|
|
_SESSION_USER_NAME,
|
2026-04-11 15:28:41 -07:00
|
|
|
_SESSION_KEY,
|
fix(gateway): replace os.environ session state with contextvars for concurrency safety
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes #7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-04-10 16:50:56 -07:00
|
|
|
]
|
|
|
|
|
for var, token in zip(vars_in_order, tokens):
|
|
|
|
|
var.reset(token)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def get_session_env(name: str, default: str = "") -> str:
|
|
|
|
|
"""Read a session context variable by its legacy ``HERMES_SESSION_*`` name.
|
|
|
|
|
|
|
|
|
|
Drop-in replacement for ``os.getenv("HERMES_SESSION_*", default)``.
|
|
|
|
|
|
|
|
|
|
Resolution order:
|
|
|
|
|
1. Context variable (set by the gateway for concurrency-safe access)
|
|
|
|
|
2. ``os.environ`` (used by CLI, cron scheduler, and tests)
|
|
|
|
|
3. *default*
|
|
|
|
|
"""
|
|
|
|
|
import os
|
|
|
|
|
|
|
|
|
|
var = _VAR_MAP.get(name)
|
|
|
|
|
if var is not None:
|
|
|
|
|
value = var.get()
|
|
|
|
|
if value:
|
|
|
|
|
return value
|
|
|
|
|
# Fall back to os.environ for CLI, cron, and test compatibility
|
|
|
|
|
return os.getenv(name, default)
|