Salvaged from PR #1563 by @kshitijk4poor. Cherry-picked with authorship preserved. - Route-aware pricing architecture replacing static MODEL_PRICING + heuristics - Canonical usage normalization (Anthropic/OpenAI/Codex API shapes) - Cache-aware billing (separate cache_read/cache_write rates) - Cost status tracking (estimated/included/unknown/actual) - OpenRouter live pricing via models API - Schema migration v4→v5 with billing metadata columns - Removed speculative forward-looking entries - Removed cost display from CLI status bar - Threaded OpenRouter metadata pre-warm Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
14 KiB
Pricing Accuracy Architecture
Date: 2026-03-16
Goal
Hermes should only show dollar costs when they are backed by an official source for the user's actual billing path.
This design replaces the current static, heuristic pricing flow in:
run_agent.pyagent/usage_pricing.pyagent/insights.pycli.py
with a provider-aware pricing system that:
- handles cache billing correctly
- distinguishes
actualvsestimatedvsincludedvsunknown - reconciles post-hoc costs when providers expose authoritative billing data
- supports direct providers, OpenRouter, subscriptions, enterprise pricing, and custom endpoints
Problems In The Current Design
Current Hermes behavior has four structural issues:
- It stores only
prompt_tokensandcompletion_tokens, which is insufficient for providers that bill cache reads and cache writes separately. - It uses a static model price table and fuzzy heuristics, which can drift from current official pricing.
- It assumes public API list pricing matches the user's real billing path.
- It has no distinction between live estimates and reconciled billed cost.
Design Principles
- Normalize usage before pricing.
- Never fold cached tokens into plain input cost.
- Track certainty explicitly.
- Treat the billing path as part of the model identity.
- Prefer official machine-readable sources over scraped docs.
- Use post-hoc provider cost APIs when available.
- Show
n/arather than inventing precision.
High-Level Architecture
The new system has four layers:
usage_normalizationConverts raw provider usage into a canonical usage record.pricing_source_resolutionDetermines the billing path, source of truth, and applicable pricing source.cost_estimation_and_reconciliationProduces an immediate estimate when possible, then replaces or annotates it with actual billed cost later.presentation/usage,/insights, and the status bar display cost with certainty metadata.
Canonical Usage Record
Add a canonical usage model that every provider path maps into before any pricing math happens.
Suggested structure:
@dataclass
class CanonicalUsage:
provider: str
billing_provider: str
model: str
billing_route: str
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_write_tokens: int = 0
reasoning_tokens: int = 0
request_count: int = 1
raw_usage: dict[str, Any] | None = None
raw_usage_fields: dict[str, str] | None = None
computed_fields: set[str] | None = None
provider_request_id: str | None = None
provider_generation_id: str | None = None
provider_response_id: str | None = None
Rules:
input_tokensmeans non-cached input only.cache_read_tokensandcache_write_tokensare never merged intoinput_tokens.output_tokensexcludes cache metrics.reasoning_tokensis telemetry unless a provider officially bills it separately.
This is the same normalization pattern used by opencode, extended with provenance and reconciliation ids.
Provider Normalization Rules
OpenAI Direct
Source usage fields:
prompt_tokenscompletion_tokensprompt_tokens_details.cached_tokens
Normalization:
cache_read_tokens = cached_tokensinput_tokens = prompt_tokens - cached_tokenscache_write_tokens = 0unless OpenAI exposes it in the relevant routeoutput_tokens = completion_tokens
Anthropic Direct
Source usage fields:
input_tokensoutput_tokenscache_read_input_tokenscache_creation_input_tokens
Normalization:
input_tokens = input_tokensoutput_tokens = output_tokenscache_read_tokens = cache_read_input_tokenscache_write_tokens = cache_creation_input_tokens
OpenRouter
Estimate-time usage normalization should use the response usage payload with the same rules as the underlying provider when possible.
Reconciliation-time records should also store:
- OpenRouter generation id
- native token fields when available
total_costcache_discountupstream_inference_costis_byok
Gemini / Vertex
Use official Gemini or Vertex usage fields where available.
If cached content tokens are exposed:
- map them to
cache_read_tokens
If a route exposes no cache creation metric:
- store
cache_write_tokens = 0 - preserve the raw usage payload for later extension
DeepSeek And Other Direct Providers
Normalize only the fields that are officially exposed.
If a provider does not expose cache buckets:
- do not infer them unless the provider explicitly documents how to derive them
Subscription / Included-Cost Routes
These still use the canonical usage model.
Tokens are tracked normally. Cost depends on billing mode, not on whether usage exists.
Billing Route Model
Hermes must stop keying pricing solely by model.
Introduce a billing route descriptor:
@dataclass
class BillingRoute:
provider: str
base_url: str | None
model: str
billing_mode: str
organization_hint: str | None = None
billing_mode values:
official_cost_apiofficial_generation_apiofficial_models_apiofficial_docs_snapshotsubscription_includeduser_overridecustom_contractunknown
Examples:
- OpenAI direct API with Costs API access:
official_cost_api - Anthropic direct API with Usage & Cost API access:
official_cost_api - OpenRouter request before reconciliation:
official_models_api - OpenRouter request after generation lookup:
official_generation_api - GitHub Copilot style subscription route:
subscription_included - local OpenAI-compatible server:
unknown - enterprise contract with configured rates:
custom_contract
Cost Status Model
Every displayed cost should have:
@dataclass
class CostResult:
amount_usd: Decimal | None
status: Literal["actual", "estimated", "included", "unknown"]
source: Literal[
"provider_cost_api",
"provider_generation_api",
"provider_models_api",
"official_docs_snapshot",
"user_override",
"custom_contract",
"none",
]
label: str
fetched_at: datetime | None
pricing_version: str | None
notes: list[str]
Presentation rules:
actual: show dollar amount as finalestimated: show dollar amount with estimate labelingincluded: showincludedor$0.00 (included)depending on UX choiceunknown: shown/a
Official Source Hierarchy
Resolve cost using this order:
- Request-level or account-level official billed cost
- Official machine-readable model pricing
- Official docs snapshot
- User override or custom contract
- Unknown
The system must never skip to a lower level if a higher-confidence source exists for the current billing route.
Provider-Specific Truth Rules
OpenAI Direct
Preferred truth:
- Costs API for reconciled spend
- Official pricing page for live estimate
Anthropic Direct
Preferred truth:
- Usage & Cost API for reconciled spend
- Official pricing docs for live estimate
OpenRouter
Preferred truth:
GET /api/v1/generationfor reconciledtotal_costGET /api/v1/modelspricing for live estimate
Do not use underlying provider public pricing as the source of truth for OpenRouter billing.
Gemini / Vertex
Preferred truth:
- official billing export or billing API for reconciled spend when available for the route
- official pricing docs for estimate
DeepSeek
Preferred truth:
- official machine-readable cost source if available in the future
- official pricing docs snapshot today
Subscription-Included Routes
Preferred truth:
- explicit route config marking the model as included in subscription
These should display included, not an API list-price estimate.
Custom Endpoint / Local Model
Preferred truth:
- user override
- custom contract config
- unknown
These should default to unknown.
Pricing Catalog
Replace the current MODEL_PRICING dict with a richer pricing catalog.
Suggested record:
@dataclass
class PricingEntry:
provider: str
route_pattern: str
model_pattern: str
input_cost_per_million: Decimal | None = None
output_cost_per_million: Decimal | None = None
cache_read_cost_per_million: Decimal | None = None
cache_write_cost_per_million: Decimal | None = None
request_cost: Decimal | None = None
image_cost: Decimal | None = None
source: str = "official_docs_snapshot"
source_url: str | None = None
fetched_at: datetime | None = None
pricing_version: str | None = None
The catalog should be route-aware:
openai:gpt-5anthropic:claude-opus-4-6openrouter:anthropic/claude-opus-4.6copilot:gpt-4o
This avoids conflating direct-provider billing with aggregator billing.
Pricing Sync Architecture
Introduce a pricing sync subsystem instead of manually maintaining a single hardcoded table.
Suggested modules:
agent/pricing/catalog.pyagent/pricing/sources.pyagent/pricing/sync.pyagent/pricing/reconcile.pyagent/pricing/types.py
Sync Sources
- OpenRouter models API
- official provider docs snapshots where no API exists
- user overrides from config
Sync Output
Cache pricing entries locally with:
- source URL
- fetch timestamp
- version/hash
- confidence/source type
Sync Frequency
- startup warm cache
- background refresh every 6 to 24 hours depending on source
- manual
hermes pricing sync
Reconciliation Architecture
Live requests may produce only an estimate initially. Hermes should reconcile them later when a provider exposes actual billed cost.
Suggested flow:
- Agent call completes.
- Hermes stores canonical usage plus reconciliation ids.
- Hermes computes an immediate estimate if a pricing source exists.
- A reconciliation worker fetches actual cost when supported.
- Session and message records are updated with
actualcost.
This can run:
- inline for cheap lookups
- asynchronously for delayed provider accounting
Persistence Changes
Session storage should stop storing only aggregate prompt/completion totals.
Add fields for both usage and cost certainty:
input_tokensoutput_tokenscache_read_tokenscache_write_tokensreasoning_tokensestimated_cost_usdactual_cost_usdcost_statuscost_sourcepricing_versionbilling_providerbilling_mode
If schema expansion is too large for one PR, add a new pricing events table:
session_cost_events
id
session_id
request_id
provider
model
billing_mode
input_tokens
output_tokens
cache_read_tokens
cache_write_tokens
estimated_cost_usd
actual_cost_usd
cost_status
cost_source
pricing_version
created_at
updated_at
Hermes Touchpoints
run_agent.py
Current responsibility:
- parse raw provider usage
- update session token counters
New responsibility:
- build
CanonicalUsage - update canonical counters
- store reconciliation ids
- emit usage event to pricing subsystem
agent/usage_pricing.py
Current responsibility:
- static lookup table
- direct cost arithmetic
New responsibility:
- move or replace with pricing catalog facade
- no fuzzy model-family heuristics
- no direct pricing without billing-route context
cli.py
Current responsibility:
- compute session cost directly from prompt/completion totals
New responsibility:
- display
CostResult - show status badges:
actualestimatedincludedn/a
agent/insights.py
Current responsibility:
- recompute historical estimates from static pricing
New responsibility:
- aggregate stored pricing events
- prefer actual cost over estimate
- surface estimates only when reconciliation is unavailable
UX Rules
Status Bar
Show one of:
$1.42~$1.42includedcost n/a
Where:
$1.42meansactual~$1.42meansestimatedincludedmeans subscription-backed or explicitly zero-cost routecost n/ameans unknown
/usage
Show:
- token buckets
- estimated cost
- actual cost if available
- cost status
- pricing source
/insights
Aggregate:
- actual cost totals
- estimated-only totals
- unknown-cost sessions count
- included-cost sessions count
Config And Overrides
Add user-configurable pricing overrides in config:
pricing:
mode: hybrid
sync_on_startup: true
sync_interval_hours: 12
overrides:
- provider: openrouter
model: anthropic/claude-opus-4.6
billing_mode: custom_contract
input_cost_per_million: 4.25
output_cost_per_million: 22.0
cache_read_cost_per_million: 0.5
cache_write_cost_per_million: 6.0
included_routes:
- provider: copilot
model: "*"
- provider: codex-subscription
model: "*"
Overrides must win over catalog defaults for the matching billing route.
Rollout Plan
Phase 1
- add canonical usage model
- split cache token buckets in
run_agent.py - stop pricing cache-inflated prompt totals
- preserve current UI with improved backend math
Phase 2
- add route-aware pricing catalog
- integrate OpenRouter models API sync
- add
estimatedvsincludedvsunknown
Phase 3
- add reconciliation for OpenRouter generation cost
- add actual cost persistence
- update
/insightsto prefer actual cost
Phase 4
- add direct OpenAI and Anthropic reconciliation paths
- add user overrides and contract pricing
- add pricing sync CLI command
Testing Strategy
Add tests for:
- OpenAI cached token subtraction
- Anthropic cache read/write separation
- OpenRouter estimated vs actual reconciliation
- subscription-backed models showing
included - custom endpoints showing
n/a - override precedence
- stale catalog fallback behavior
Current tests that assume heuristic pricing should be replaced with route-aware expectations.
Non-Goals
- exact enterprise billing reconstruction without an official source or user override
- backfilling perfect historical cost for old sessions that lack cache bucket data
- scraping arbitrary provider web pages at request time
Recommendation
Do not expand the existing MODEL_PRICING dict.
That path cannot satisfy the product requirement. Hermes should instead migrate to:
- canonical usage normalization
- route-aware pricing sources
- estimate-then-reconcile cost lifecycle
- explicit certainty states in the UI
This is the minimum architecture that makes the statement "Hermes pricing is backed by official sources where possible, and otherwise clearly labeled" defensible.