Part 3: BOUNDARY-NATIVE LANGUAGE MODELS
Section 14: Training Methodology
Pendry, S
Halfhuman Draft
2026
Previous Sections
Post Zero Link
Section 13: Five-Layer BNLM Architecture
14.1 Training Data Structure
Traditional LLM training:
Input: Context
Target: Next token
Loss: -log P(token | context)
BNLM training requires different data structure:
training_example = {
# Input
'query': "User question or statement",
'context': "Conversation history",
# Universe generation targets
'valid_interpretations': [
"Interpretation 1",
"Interpretation 2",
...
],
'invalid_interpretations': [
"Self-referencing interpretation",
"Circular reasoning",
...
],
# Boundary specifications
'boundaries': {
'interpretation_1': {
'excludes': [...],
'contradicts': [...],
'scope_limits': [...]
},
...
},
# Self-reference labels
'self_referencing_patterns': [
"Pattern that validates using only input",
"Circular dependency example",
...
],
# External grounding
'external_sources_required': [
"Domain expertise",
"Empirical data",
"Third-party verification",
...
],
# Investigation depth
'required_depth': 3, # How many layers of analysis needed
# Calibrated confidence
'appropriate_confidence': {
'claim_1': 0.8, # High confidence (well-grounded)
'claim_2': 0.3, # Low confidence (limited grounding)
'claim_3': 0.0, # No confidence (self-referencing)
}
}
14.2 Loss Function Design
Traditional loss:
L = |predicted_token - actual_token|
BNLM multi-component loss:
def compute_bnlm_loss(prediction, target, model_internals):
"""
Multi-component loss function optimizing for:
1. Validity of interpretations
2. Boundary completeness
3. Investigation depth
4. Confidence calibration
"""
# Component 1: Validity loss
# Penalize self-referencing validation
L_validity = validity_loss(
prediction.validation_claims,
target.external_sources,
model_internals.russell_layer_output
)
# Component 2: Boundary loss
# Penalize incomplete boundary analysis
L_boundary = boundary_loss(
prediction.stated_boundaries,
target.complete_boundaries
)
# Component 3: Investigation loss
# Penalize shallow investigation
L_investigation = investigation_loss(
prediction.investigation_depth,
target.required_depth,
prediction.external_sources_consulted
)
# Component 4: Confidence calibration loss
# Penalize overconfidence and underconfidence
L_confidence = confidence_loss(
prediction.confidence_levels,
target.appropriate_confidence,
prediction.grounding_strength
)
# Component 5: Standard language modeling loss
# Still need coherent, fluent output
L_language = standard_lm_loss(
prediction.tokens,
target.tokens
)
# Weighted combination
total_loss = (
w1 * L_validity +
w2 * L_boundary +
w3 * L_investigation +
w4 * L_confidence +
w5 * L_language
)
return total_loss
def validity_loss(validation_claims, external_sources, russell_output):
"""
Penalize self-referencing validation.
High loss if:
- Validation depends only on input
- No external sources cited
- Russell layer flagged as self-referencing
"""
loss = 0.0
for claim in validation_claims:
# Check if claim is self-referencing
if claim.sources == [claim.subject]:
loss += 10.0 # Heavy penalty
# Check if external sources present
if len(claim.external_sources) == 0:
loss += 5.0
# Check Russell layer output
if not russell_output.passed_for(claim):
loss += 8.0
return loss
def boundary_loss(stated_boundaries, complete_boundaries):
"""
Penalize incomplete boundary specification.
High loss if:
- Boundaries not explicitly stated
- What's excluded not identified
- Scope limits unclear
"""
# Measure completeness of boundary specification
completeness = len(stated_boundaries) / len(complete_boundaries)
# Loss inversely proportional to completeness
loss = max(0, 1.0 - completeness) * 5.0
return loss
def investigation_loss(actual_depth, required_depth, sources_consulted):
"""
Penalize shallow investigation.
High loss if:
- Investigation depth below required
- Few external sources consulted
- Fast answer prioritized over thorough analysis
"""
depth_deficit = max(0, required_depth - actual_depth)
source_deficit = max(0, required_depth - len(sources_consulted))
loss = (depth_deficit * 3.0) + (source_deficit * 2.0)
return loss
def confidence_loss(predicted_confidence, target_confidence, grounding):
"""
Penalize miscalibrated confidence.
High loss if:
- High confidence with weak grounding (overconfidence)
- Low confidence with strong grounding (underconfidence)
- Confidence doesn't match validation strength
"""
calibration_error = 0.0
for claim, pred_conf in predicted_confidence.items():
target_conf = target_confidence[claim]
ground_strength = grounding[claim]
# Penalize overconfidence more heavily than underconfidence
if pred_conf > target_conf:
calibration_error += (pred_conf - target_conf) ** 2 * 3.0
else:
calibration_error += (pred_conf - target_conf) ** 2 * 1.0
# Penalize confidence mismatched to grounding
expected_conf_from_grounding = estimate_confidence(ground_strength)
mismatch = abs(pred_conf - expected_conf_from_grounding)
calibration_error += mismatch * 2.0
return calibration_error
14.3 Training Objectives
Primary objectives:
1. Minimize self-referencing validation
Maximize: Interpretations in Russell boundary set R
Minimize: Circular reasoning patterns
2. Maximize boundary explicitness
Maximize: Stated exclusions and scope limits
Minimize: Implicit assumptions
3. Optimize investigation depth
Maximize: External sources consulted
Maximize: Alternative interpretations considered
Minimize: Hasty conclusions
4. Calibrate confidence to grounding
Match: Confidence level to validation strength
Penalize: Overconfidence with weak grounding
Penalize: Underconfidence with strong grounding
5. Maintain language quality
Maximize: Fluency and coherence
Maintain: Standard language modeling capability
14.4 Training Data Creation
Challenge: Creating training data with proper validity labels
Approach 1: Expert Annotation
annotation_protocol = {
'step_1': 'Human expert reviews query-response pair',
'step_2': 'Expert identifies self-referencing validation',
'step_3': 'Expert specifies required external sources',
'step_4': 'Expert labels appropriate confidence levels',
'step_5': 'Expert marks boundary completeness'
}
Scale: Expensive requires domain expertise for each example
Approach 2: Synthetic Generation
def generate_synthetic_training_data():
"""
Create training examples with known self-reference patterns.
"""
# Generate positive examples (valid reasoning)
valid_examples = []
for domain in ['science', 'history', 'mathematics']:
example = {
'query': generate_factual_question(domain),
'response': generate_grounded_answer(domain),
'external_sources': get_real_sources(domain),
'validity_label': True
}
valid_examples.append(example)
# Generate negative examples (self-referencing)
invalid_examples = []
self_ref_patterns = [
'circular_reasoning',
'validation_from_claim',
'no_external_grounding',
'overconfident_speculation'
]
for pattern in self_ref_patterns:
example = {
'query': generate_query(),
'response': generate_self_referencing_response(pattern),
'external_sources': [],
'validity_label': False,
'failure_mode': pattern
}
invalid_examples.append(example)
return valid_examples + invalid_examples
Scale: Cheaper can generate large quantities
Approach 3: Semi-Supervised Learning
semi_supervised_approach = {
'step_1': 'Train initial model on synthetic data',
'step_2': 'Model generates responses to unlabeled queries',
'step_3': 'Expert reviews high-uncertainty cases only',
'step_4': 'Retrain with expert corrections',
'step_5': 'Iterate'
}
Scale: Balanced combines synthetic volume with expert quality
14.5 Training Phases
Phase 1: Foundation (Standard LLM)
Train standard transformer on large text corpus:
- Next
Phase 1: Foundation (Standard LLM)
Train standard transformer on large text corpus:
- Next-token prediction
- Standard language modeling
- Build basic linguistic and reasoning capability
Duration: Standard LLM training timeline
Goal: Establish baseline language understanding
Phase 2: Validity-Aware Fine-Tuning
Introduce BNST constraints through specialized training:
phase_2_training = {
'dataset': 'Synthetic self-reference examples',
'objective': 'Learn to detect self-referencing validation',
'training_signal': {
'positive_examples': 'Responses with external grounding',
'negative_examples': 'Responses with circular reasoning',
'labels': 'Binary validity classification'
},
'architecture_modifications': {
'add_russell_layer': 'Self-reference detection module',
'add_validity_head': 'Validity prediction output',
'maintain_lm_head': 'Keep language modeling capability'
},
'loss_function': 'L_validity + L_language',
'duration': '10-20% of Phase 1 training time'
}
Goal: Model learns to recognize self-referencing patterns
Phase 3: Boundary Analysis Training
Train boundary complement computation:
phase_3_training = {
'dataset': 'Interpretation-boundary pairs',
'objective': 'Learn to compute what interpretations exclude',
'training_signal': {
'input': 'Interpretation',
'target': 'Complete boundary specification',
'labels': 'Excluded meanings, contradictions, scope limits'
},
'architecture_modifications': {
'add_boundary_layer': 'Complement computation module',
'connect_to_russell_layer': 'Share representations'
},
'loss_function': 'L_boundary + L_validity + L_language',
'duration': '15-25% of Phase 1 training time'
}
Goal: Model learns to explicitly represent boundaries
Phase 4: Investigation Depth Training
Train for thorough investigation over fast answers:
phase_4_training = {
'dataset': 'Query-investigation pairs with depth annotations',
'objective': 'Prioritize investigation quality over speed',
'training_signal': {
'input': 'Query',
'target': 'Complete investigation process',
'labels': 'Required depth, sources consulted, alternatives considered'
},
'architecture_modifications': {
'add_investigation_layer': 'Depth tracking and control',
'add_source_consultation': 'External grounding retrieval'
},
'loss_function': 'L_investigation + L_validity + L_boundary + L_language',
'duration': '20-30% of Phase 1 training time',
'key_change': 'Optimization shifts from speed to thoroughness'
}
Goal: Model learns investigation is more important than fast response
Phase 5: Confidence Calibration
Train appropriate uncertainty expression:
phase_5_training = {
'dataset': 'Claims with ground-truth confidence levels',
'objective': 'Calibrate confidence to grounding strength',
'training_signal': {
'input': 'Claim + grounding evidence',
'target': 'Appropriate confidence level',
'labels': 'Calibrated confidence scores'
},
'architecture_modifications': {
'add_confidence_head': 'Confidence prediction output',
'connect_to_validity_layer': 'Use validity signals for calibration'
},
'loss_function': 'L_confidence + L_investigation + L_validity + L_boundary + L_language',
'duration': '15-25% of Phase 1 training time',
'evaluation': 'Measure calibration error on held-out test set'
}
Goal: Model’s stated confidence matches actual accuracy
Phase 6: End-to-End Integration
Train complete pipeline jointly:
phase_6_training = {
'dataset': 'Complete BNLM training examples',
'objective': 'Optimize entire pipeline jointly',
'training_signal': {
'input': 'Raw user query',
'target': 'Complete investigation output',
'labels': 'All component labels (validity, boundaries, investigation, confidence)'
},
'architecture': 'Complete 5-layer BNLM',
'loss_function': 'Full multi-component loss (all weights active)',
'duration': '30-50% of Phase 1 training time',
'optimization': 'End-to-end gradient descent through all layers'
}
Goal: All components work together seamlessly
14.6 Evaluation Metrics
Traditional LLM metrics:
- Perplexity (how well model predicts text)
- Accuracy on benchmarks (question answering, etc.)
BNLM requires new metrics:
1. Epistemic Calibration
Measure accuracy of uncertainty estimates:
def epistemic_calibration_score(predictions, ground_truth):
"""
When model says "X% confident", is it right X% of the time?
Perfect calibration: predicted confidence = actual accuracy
"""
confidence_bins = [0.0, 0.1, 0.2, ..., 0.9, 1.0]
calibration_error = 0.0
for bin_lower, bin_upper in zip(confidence_bins[:-1], confidence_bins[1:]):
# Get predictions in this confidence range
in_bin = [
p for p in predictions
if bin_lower <= p.confidence < bin_upper
]
if len(in_bin) == 0:
continue
# Calculate actual accuracy for these predictions
actual_accuracy = sum(p.correct for p in in_bin) / len(in_bin)
# Expected accuracy is midpoint of bin
expected_accuracy = (bin_lower + bin_upper) / 2
# Calibration error for this bin
error = abs(actual_accuracy - expected_accuracy)
calibration_error += error * len(in_bin)
# Normalize by total predictions
calibration_error /= len(predictions)
return 1.0 - calibration_error # Higher is better
Target: Calibration score > 0.90
2. Self-Reference Detection Rate
Measure percentage of self-referencing patterns caught:
def self_reference_detection_rate(test_set):
"""
What percentage of self-referencing validation is flagged?
"""
self_referencing_examples = [
ex for ex in test_set
if ex.label == 'self_referencing'
]
detected = sum(
1 for ex in self_referencing_examples
if model.russell_layer.flagged(ex)
)
return detected / len(self_referencing_examples)
Target: Detection rate > 0.95
3. Investigation Depth Score
Measure thoroughness of investigation:
def investigation_depth_score(predictions, targets):
"""
Does model investigate deeply enough?
Measures:
- Number of interpretations considered
- External sources consulted
- Alternatives explored
- Boundaries identified
"""
scores = []
for pred, target in zip(predictions, targets):
depth_score = (
min(1.0, pred.interpretations_considered / target.required_interpretations) * 0.25 +
min(1.0, pred.external_sources / target.required_sources) * 0.25 +
min(1.0, pred.alternatives_explored / target.required_alternatives) * 0.25 +
min(1.0, pred.boundaries_identified / target.required_boundaries) * 0.25
)
scores.append(depth_score)
return sum(scores) / len(scores)
Target: Investigation depth > 0.85
4. Boundary Completeness
Measure how fully boundaries are specified:
def boundary_completeness(predictions, targets):
"""
Are boundaries explicitly stated?
Measures:
- What's excluded identified
- Contradictions noted
- Scope limits stated
- Alternatives acknowledged
"""
completeness_scores = []
for pred, target in zip(predictions, targets):
stated_boundaries = set(pred.boundaries)
required_boundaries = set(target.complete_boundaries)
completeness = len(stated_boundaries & required_boundaries) / len(required_boundaries)
completeness_scores.append(completeness)
return sum(completeness_scores) / len(completeness_scores)
Target: Boundary completeness > 0.80
5. False Confidence Reduction
Measure reduction in overconfident errors:
def false_confidence_rate(predictions, ground_truth):
"""
How often is model highly confident but wrong?
This is the most dangerous failure mode.
"""
high_confidence = [
p for p in predictions
if p.confidence > 0.8
]
false_high_confidence = [
p for p in high_confidence
if not p.correct
]
return len(false_high_confidence) / len(high_confidence)
Target: False confidence rate < 0.05 (compared to ~0.15 for standard LLMs)
14.7 Comparison to Standard Training
Standard LLM Training:
Objective: Predict next token accurately
Optimization: Minimize perplexity
Result: Fluent but potentially overconfident
Training time: T
BNLM Training:
Objective: Investigate thoroughly with calibrated confidence
Optimization: Minimize multi-component loss (validity + boundary + investigation + confidence + language)
Result: Epistemically humble but trustworthy
Training time: ~2T (additional phases for BNST constraints)
Trade-offs:
| Aspect | Standard LLM | BNLM |
|---|---|---|
| Training time | T | ~2T |
| Inference speed | Fast | 2-5x slower |
| Confidence calibration | Poor | Good |
| Self-ref detection | None | High |
| Boundary specification | Implicit | Explicit |
| Investigation depth | Shallow | Deep |
| False confidence | ~15% | <5% |
| Epistemic humility | Learned (unreliable) | Architectural (reliable) |
BNLM trades speed for trustworthiness.
Previous Sections
Post Zero Link
Section 13: Five-Layer BNLM Architecture
Next up
Part 3: BOUNDARY-NATIVE LANGUAGE MODELS
Section 15: Implementation Considerations
© 2026 HalfHuman Draft - Pendry, S
This post is licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
Code examples (if any) are licensed under the Apache License, Version 2.0
See /license for details.
Comments