Files

Alexander Whitestone 11cc14d707 init: Hermes config, skills, memories, cron

Sovereign backup of all Hermes Agent configuration and data.
Excludes: secrets, auth tokens, sessions, caches, code (separate repo).

Tracked:
- config.yaml (model, fallback chain, toolsets, display prefs)
- SOUL.md (Timmy personality charter)
- memories/ (persistent MEMORY.md + USER.md)
- skills/ (371 files — full skill library)
- cron/jobs.json (scheduled tasks)
- channel_directory.json (platform channels)
- hooks/ (custom hooks)

2026-03-14 14:42:33 -04:00

6.9 KiB

Raw Blame History

Loss Functions

Complete guide to SimPO loss functions and mathematical formulations.

Overview

SimPO supports two loss types:

Sigmoid (default) - Smooth, differentiable loss
Hinge - Margin-based, sparse loss

Both are reference-free (no reference model needed).

SimPO Loss Formula

Core Calculation

Step 1: Log probability ratio:

pi_logratios = log P_θ(y_chosen|x) - log P_θ(y_rejected|x)

Step 2: Apply target margin:

logits = pi_logratios - γ/β

Where:

γ/β = gamma_beta_ratio (target margin)

Step 3: Compute loss (depends on loss type)

Sigmoid Loss (Default)

Formula:

L = -log σ(β * logits) * (1 - ε) - log σ(-β * logits) * ε

Where:

β = beta (reward scaling)
σ = sigmoid function
ε = label_smoothing (default 0.0)

Implementation:

losses = (
    -F.logsigmoid(self.beta * logits) * (1 - self.label_smoothing)
    - F.logsigmoid(-self.beta * logits) * self.label_smoothing
)

Characteristics:

Smooth, continuous gradients
Probabilistic interpretation
Standard choice for most tasks
Works well with higher beta values

Hinge Loss

Formula:

L = max(0, 1 - β * logits)

Implementation:

losses = torch.relu(1 - self.beta * logits)

Characteristics:

Non-smooth (has kink at logits = 1/β)
Margin-based (SVM-style)
Can lead to sparser solutions
Less commonly used

Comparison to DPO

DPO Loss (Reference Model Required)

Formula:

L_DPO = -E[log σ(β * log(π_θ(y_w|x)/π_ref(y_w|x)) - β * log(π_θ(y_l|x)/π_ref(y_l|x)))]

Key features:

Requires reference model π_ref
Normalizes by reference log probabilities
More conservative (stays close to reference)

SimPO Loss (Reference-Free)

Formula:

L_SimPO = -log σ(β * (log π_θ(y_w|x) - log π_θ(y_l|x) - γ/β))

Key features:

No reference model needed
Direct preference optimization
Target margin γ/β controls preference strength
More efficient (fewer model forward passes)

Visual comparison:

DPO:    [Policy] - [Reference] → Loss
SimPO:  [Policy]               → Loss

Average Log Probability Reward

Calculation

Per-token log probabilities:

# Get log probs for each token
per_token_logps = log_softmax(logits).gather(dim=-1, index=labels)

# Create mask to ignore padding
loss_mask = (labels != label_pad_token_id)

Average log probability (if average_log_prob=True):

avg_logp = (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1)

Sum log probability (if average_log_prob=False):

sum_logp = (per_token_logps * loss_mask).sum(-1)

Why average?

Normalizes for sequence length
Prevents bias toward shorter/longer responses
Standard practice in SimPO

Reward Metrics

Chosen reward:

chosen_rewards = beta * policy_chosen_logps.detach()

Rejected reward:

rejected_rewards = beta * policy_rejected_logps.detach()

Reward margin:

reward_margin = chosen_rewards.mean() - rejected_rewards.mean()

Label Smoothing

Formula with Smoothing

Sigmoid loss:

L = -log σ(β * logits) * (1 - ε) - log σ(-β * logits) * ε

Effect:

ε = 0.0: No smoothing (default)
ε = 0.1: 10% smoothing (soft labels)
ε = 0.5: Maximum smoothing

When to use:

Noisy preference labels
Uncertain preferences
Prevent overconfidence

Config:

label_smoothing: 0.1  # 10% smoothing

SFT Regularization

Combined Loss

With SFT component:

L_total = L_SimPO + λ * L_SFT

Where:

L_SFT = cross-entropy loss on chosen responses
λ = sft_weight (0.0 to 1.0)

Implementation:

if self.sft_weight > 0:
    sft_loss = -policy_chosen_logps
    total_loss = simpo_loss + self.sft_weight * sft_loss

When to use:

Preserve model capabilities
Prevent catastrophic forgetting
Fine-tuning instruct models

Trade-off:

Higher sft_weight: Preserve capabilities, less alignment
Lower sft_weight: Stronger alignment, may forget capabilities

Config:

sft_weight: 0.1  # 10% SFT regularization

Loss Type Selection

Sigmoid vs Hinge

Aspect	Sigmoid	Hinge
Smoothness	Smooth	Non-smooth
Gradients	Continuous	Discontinuous at margin
Sparsity	Dense solutions	Sparse solutions
Interpretability	Probabilistic	Geometric margin
Use case	General purpose	Margin-based tasks
Recommendation	Default choice	Experimental

Config:

# Sigmoid (default)
loss_type: sigmoid

# Hinge (alternative)
loss_type: hinge

Mathematical Properties

Gradient Analysis

Sigmoid loss gradient:

∂L/∂logits = -β * σ(-β * logits) * (1 - ε) + β * σ(β * logits) * ε

Hinge loss gradient:

∂L/∂logits = -β   if logits < 1/β
             0     otherwise

Implications:

Sigmoid: Always provides gradient signal
Hinge: No gradient when margin satisfied

Convergence Behavior

Sigmoid:

Asymptotically approaches zero loss
Continues optimizing even with large margins
Smoother training curves

Hinge:

Reaches zero loss at margin
Stops optimizing once margin satisfied
May have training plateaus

Complete Loss Examples

Example 1: Basic SimPO (Sigmoid)

Config:

beta: 2.0
gamma_beta_ratio: 0.5
loss_type: sigmoid
label_smoothing: 0.0
sft_weight: 0.0

Loss calculation:

# Step 1: Compute log probs
chosen_logps = avg_log_prob(policy(chosen))    # e.g., -1.2
rejected_logps = avg_log_prob(policy(rejected)) # e.g., -2.5

# Step 2: Log ratio and margin
pi_logratios = -1.2 - (-2.5) = 1.3
logits = 1.3 - 0.5 = 0.8

# Step 3: Sigmoid loss
loss = -log(sigmoid(2.0 * 0.8))
     = -log(sigmoid(1.6))
     = -log(0.832)
     = 0.184

Example 2: SimPO with SFT

Config:

beta: 2.5
gamma_beta_ratio: 0.5
loss_type: sigmoid
sft_weight: 0.1

Loss calculation:

# SimPO loss (as above)
simpo_loss = 0.184

# SFT loss
sft_loss = -chosen_logps = -(-1.2) = 1.2

# Total loss
total_loss = simpo_loss + 0.1 * sft_loss
           = 0.184 + 0.12
           = 0.304

Debugging

Check Reward Margins

Low margin (< 0.5):

Preferences not being learned
Increase beta or gamma_beta_ratio

High margin (> 5.0):

May be overfitting
Reduce beta or learning rate

Monitor:

reward_margin = chosen_rewards.mean() - rejected_rewards.mean()
print(f"Reward margin: {reward_margin:.2f}")

Check Log Probabilities

Typical values:

Chosen: -1.0 to -2.0 (higher is better)
Rejected: -2.0 to -4.0 (lower is worse)

Warning signs:

Both very negative (< -10): Model not learning
Both very positive (> 0): Numerical instability

References

SimPO paper: https://arxiv.org/abs/2405.14734
DPO paper: https://arxiv.org/abs/2305.18290
Implementation: https://github.com/princeton-nlp/SimPO

6.9 KiB Raw Blame History Unescape Escape

Loss Functions

Overview

SimPO Loss Formula

Core Calculation

Sigmoid Loss (Default)

Hinge Loss

Comparison to DPO

DPO Loss (Reference Model Required)

SimPO Loss (Reference-Free)

Average Log Probability Reward

Calculation

Reward Metrics

Label Smoothing

Formula with Smoothing

SFT Regularization

Combined Loss

Loss Type Selection

Sigmoid vs Hinge

Mathematical Properties

Gradient Analysis

Convergence Behavior

Complete Loss Examples

Example 1: Basic SimPO (Sigmoid)

Example 2: SimPO with SFT

Debugging

Check Reward Margins

Check Log Probabilities

References

6.9 KiB

Raw Blame History