Sovereign backup of all Hermes Agent configuration and data. Excludes: secrets, auth tokens, sessions, caches, code (separate repo). Tracked: - config.yaml (model, fallback chain, toolsets, display prefs) - SOUL.md (Timmy personality charter) - memories/ (persistent MEMORY.md + USER.md) - skills/ (371 files — full skill library) - cron/jobs.json (scheduled tasks) - channel_directory.json (platform channels) - hooks/ (custom hooks)
6.9 KiB
Loss Functions
Complete guide to SimPO loss functions and mathematical formulations.
Overview
SimPO supports two loss types:
- Sigmoid (default) - Smooth, differentiable loss
- Hinge - Margin-based, sparse loss
Both are reference-free (no reference model needed).
SimPO Loss Formula
Core Calculation
Step 1: Log probability ratio:
pi_logratios = log P_θ(y_chosen|x) - log P_θ(y_rejected|x)
Step 2: Apply target margin:
logits = pi_logratios - γ/β
Where:
- γ/β =
gamma_beta_ratio(target margin)
Step 3: Compute loss (depends on loss type)
Sigmoid Loss (Default)
Formula:
L = -log σ(β * logits) * (1 - ε) - log σ(-β * logits) * ε
Where:
- β =
beta(reward scaling) - σ = sigmoid function
- ε =
label_smoothing(default 0.0)
Implementation:
losses = (
-F.logsigmoid(self.beta * logits) * (1 - self.label_smoothing)
- F.logsigmoid(-self.beta * logits) * self.label_smoothing
)
Characteristics:
- Smooth, continuous gradients
- Probabilistic interpretation
- Standard choice for most tasks
- Works well with higher beta values
Hinge Loss
Formula:
L = max(0, 1 - β * logits)
Implementation:
losses = torch.relu(1 - self.beta * logits)
Characteristics:
- Non-smooth (has kink at logits = 1/β)
- Margin-based (SVM-style)
- Can lead to sparser solutions
- Less commonly used
Comparison to DPO
DPO Loss (Reference Model Required)
Formula:
L_DPO = -E[log σ(β * log(π_θ(y_w|x)/π_ref(y_w|x)) - β * log(π_θ(y_l|x)/π_ref(y_l|x)))]
Key features:
- Requires reference model π_ref
- Normalizes by reference log probabilities
- More conservative (stays close to reference)
SimPO Loss (Reference-Free)
Formula:
L_SimPO = -log σ(β * (log π_θ(y_w|x) - log π_θ(y_l|x) - γ/β))
Key features:
- No reference model needed
- Direct preference optimization
- Target margin γ/β controls preference strength
- More efficient (fewer model forward passes)
Visual comparison:
DPO: [Policy] - [Reference] → Loss
SimPO: [Policy] → Loss
Average Log Probability Reward
Calculation
Per-token log probabilities:
# Get log probs for each token
per_token_logps = log_softmax(logits).gather(dim=-1, index=labels)
# Create mask to ignore padding
loss_mask = (labels != label_pad_token_id)
Average log probability (if average_log_prob=True):
avg_logp = (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1)
Sum log probability (if average_log_prob=False):
sum_logp = (per_token_logps * loss_mask).sum(-1)
Why average?
- Normalizes for sequence length
- Prevents bias toward shorter/longer responses
- Standard practice in SimPO
Reward Metrics
Chosen reward:
chosen_rewards = beta * policy_chosen_logps.detach()
Rejected reward:
rejected_rewards = beta * policy_rejected_logps.detach()
Reward margin:
reward_margin = chosen_rewards.mean() - rejected_rewards.mean()
Label Smoothing
Formula with Smoothing
Sigmoid loss:
L = -log σ(β * logits) * (1 - ε) - log σ(-β * logits) * ε
Effect:
- ε = 0.0: No smoothing (default)
- ε = 0.1: 10% smoothing (soft labels)
- ε = 0.5: Maximum smoothing
When to use:
- Noisy preference labels
- Uncertain preferences
- Prevent overconfidence
Config:
label_smoothing: 0.1 # 10% smoothing
SFT Regularization
Combined Loss
With SFT component:
L_total = L_SimPO + λ * L_SFT
Where:
- L_SFT = cross-entropy loss on chosen responses
- λ =
sft_weight(0.0 to 1.0)
Implementation:
if self.sft_weight > 0:
sft_loss = -policy_chosen_logps
total_loss = simpo_loss + self.sft_weight * sft_loss
When to use:
- Preserve model capabilities
- Prevent catastrophic forgetting
- Fine-tuning instruct models
Trade-off:
- Higher sft_weight: Preserve capabilities, less alignment
- Lower sft_weight: Stronger alignment, may forget capabilities
Config:
sft_weight: 0.1 # 10% SFT regularization
Loss Type Selection
Sigmoid vs Hinge
| Aspect | Sigmoid | Hinge |
|---|---|---|
| Smoothness | Smooth | Non-smooth |
| Gradients | Continuous | Discontinuous at margin |
| Sparsity | Dense solutions | Sparse solutions |
| Interpretability | Probabilistic | Geometric margin |
| Use case | General purpose | Margin-based tasks |
| Recommendation | Default choice | Experimental |
Config:
# Sigmoid (default)
loss_type: sigmoid
# Hinge (alternative)
loss_type: hinge
Mathematical Properties
Gradient Analysis
Sigmoid loss gradient:
∂L/∂logits = -β * σ(-β * logits) * (1 - ε) + β * σ(β * logits) * ε
Hinge loss gradient:
∂L/∂logits = -β if logits < 1/β
0 otherwise
Implications:
- Sigmoid: Always provides gradient signal
- Hinge: No gradient when margin satisfied
Convergence Behavior
Sigmoid:
- Asymptotically approaches zero loss
- Continues optimizing even with large margins
- Smoother training curves
Hinge:
- Reaches zero loss at margin
- Stops optimizing once margin satisfied
- May have training plateaus
Complete Loss Examples
Example 1: Basic SimPO (Sigmoid)
Config:
beta: 2.0
gamma_beta_ratio: 0.5
loss_type: sigmoid
label_smoothing: 0.0
sft_weight: 0.0
Loss calculation:
# Step 1: Compute log probs
chosen_logps = avg_log_prob(policy(chosen)) # e.g., -1.2
rejected_logps = avg_log_prob(policy(rejected)) # e.g., -2.5
# Step 2: Log ratio and margin
pi_logratios = -1.2 - (-2.5) = 1.3
logits = 1.3 - 0.5 = 0.8
# Step 3: Sigmoid loss
loss = -log(sigmoid(2.0 * 0.8))
= -log(sigmoid(1.6))
= -log(0.832)
= 0.184
Example 2: SimPO with SFT
Config:
beta: 2.5
gamma_beta_ratio: 0.5
loss_type: sigmoid
sft_weight: 0.1
Loss calculation:
# SimPO loss (as above)
simpo_loss = 0.184
# SFT loss
sft_loss = -chosen_logps = -(-1.2) = 1.2
# Total loss
total_loss = simpo_loss + 0.1 * sft_loss
= 0.184 + 0.12
= 0.304
Debugging
Check Reward Margins
Low margin (< 0.5):
- Preferences not being learned
- Increase beta or gamma_beta_ratio
High margin (> 5.0):
- May be overfitting
- Reduce beta or learning rate
Monitor:
reward_margin = chosen_rewards.mean() - rejected_rewards.mean()
print(f"Reward margin: {reward_margin:.2f}")
Check Log Probabilities
Typical values:
- Chosen: -1.0 to -2.0 (higher is better)
- Rejected: -2.0 to -4.0 (lower is worse)
Warning signs:
- Both very negative (< -10): Model not learning
- Both very positive (> 0): Numerical instability
References
- SimPO paper: https://arxiv.org/abs/2405.14734
- DPO paper: https://arxiv.org/abs/2305.18290
- Implementation: https://github.com/princeton-nlp/SimPO