Part 3: BOUNDARY-NATIVE LANGUAGE MODELS
Section 14: Training Methodology
Pendry, S
Halfhuman Draft

2026
Previous Sections
Post Zero Link
Section 13: Five-Layer BNLM Architecture


14.1 Training Data Structure

Traditional LLM training:

Input: Context

Target: Next token

Loss: -log P(token | context)

BNLM training requires different data structure:

training_example = {

# Input

'query': "User question or statement",

'context': "Conversation history",

# Universe generation targets

'valid_interpretations': [

"Interpretation 1",

"Interpretation 2",

...

],

'invalid_interpretations': [

"Self-referencing interpretation",

"Circular reasoning",

...

],

# Boundary specifications

'boundaries': {

'interpretation_1': {

'excludes': [...],

'contradicts': [...],

'scope_limits': [...]

},

...

},

# Self-reference labels

'self_referencing_patterns': [

"Pattern that validates using only input",

"Circular dependency example",

...

],

# External grounding

'external_sources_required': [

"Domain expertise",

"Empirical data",

"Third-party verification",

...

],

# Investigation depth

'required_depth': 3, # How many layers of analysis needed

# Calibrated confidence

'appropriate_confidence': {

'claim_1': 0.8, # High confidence (well-grounded)

'claim_2': 0.3, # Low confidence (limited grounding)

'claim_3': 0.0, # No confidence (self-referencing)

}

}

14.2 Loss Function Design

Traditional loss:

L = |predicted_token - actual_token|

BNLM multi-component loss:

def compute_bnlm_loss(prediction, target, model_internals):

"""

Multi-component loss function optimizing for:

1. Validity of interpretations

2. Boundary completeness

3. Investigation depth

4. Confidence calibration

"""

# Component 1: Validity loss

# Penalize self-referencing validation

L_validity = validity_loss(

prediction.validation_claims,

target.external_sources,

model_internals.russell_layer_output

)

# Component 2: Boundary loss

# Penalize incomplete boundary analysis

L_boundary = boundary_loss(

prediction.stated_boundaries,

target.complete_boundaries

)

# Component 3: Investigation loss

# Penalize shallow investigation

L_investigation = investigation_loss(

prediction.investigation_depth,

target.required_depth,

prediction.external_sources_consulted

)

# Component 4: Confidence calibration loss

# Penalize overconfidence and underconfidence

L_confidence = confidence_loss(

prediction.confidence_levels,

target.appropriate_confidence,

prediction.grounding_strength

)

# Component 5: Standard language modeling loss

# Still need coherent, fluent output

L_language = standard_lm_loss(

prediction.tokens,

target.tokens

)

# Weighted combination

total_loss = (

w1 * L_validity +

w2 * L_boundary +

w3 * L_investigation +

w4 * L_confidence +

w5 * L_language

)

return total_loss

def validity_loss(validation_claims, external_sources, russell_output):

"""

Penalize self-referencing validation.

High loss if:

- Validation depends only on input

- No external sources cited

- Russell layer flagged as self-referencing

"""

loss = 0.0

for claim in validation_claims:

# Check if claim is self-referencing

if claim.sources == [claim.subject]:

loss += 10.0 # Heavy penalty

# Check if external sources present

if len(claim.external_sources) == 0:

loss += 5.0

# Check Russell layer output

if not russell_output.passed_for(claim):

loss += 8.0

return loss

def boundary_loss(stated_boundaries, complete_boundaries):

"""

Penalize incomplete boundary specification.

High loss if:

- Boundaries not explicitly stated

- What's excluded not identified

- Scope limits unclear

"""

# Measure completeness of boundary specification

completeness = len(stated_boundaries) / len(complete_boundaries)

# Loss inversely proportional to completeness

loss = max(0, 1.0 - completeness) * 5.0

return loss

def investigation_loss(actual_depth, required_depth, sources_consulted):

"""

Penalize shallow investigation.

High loss if:

- Investigation depth below required

- Few external sources consulted

- Fast answer prioritized over thorough analysis

"""

depth_deficit = max(0, required_depth - actual_depth)

source_deficit = max(0, required_depth - len(sources_consulted))

loss = (depth_deficit * 3.0) + (source_deficit * 2.0)

return loss

def confidence_loss(predicted_confidence, target_confidence, grounding):

"""

Penalize miscalibrated confidence.

High loss if:

- High confidence with weak grounding (overconfidence)

- Low confidence with strong grounding (underconfidence)

- Confidence doesn't match validation strength

"""

calibration_error = 0.0

for claim, pred_conf in predicted_confidence.items():

target_conf = target_confidence[claim]

ground_strength = grounding[claim]

# Penalize overconfidence more heavily than underconfidence

if pred_conf > target_conf:

calibration_error += (pred_conf - target_conf) ** 2 * 3.0

else:

calibration_error += (pred_conf - target_conf) ** 2 * 1.0

# Penalize confidence mismatched to grounding

expected_conf_from_grounding = estimate_confidence(ground_strength)

mismatch = abs(pred_conf - expected_conf_from_grounding)

calibration_error += mismatch * 2.0

return calibration_error

14.3 Training Objectives

Primary objectives:

1. Minimize self-referencing validation

Maximize: Interpretations in Russell boundary set R

Minimize: Circular reasoning patterns

2. Maximize boundary explicitness

Maximize: Stated exclusions and scope limits

Minimize: Implicit assumptions

3. Optimize investigation depth

Maximize: External sources consulted

Maximize: Alternative interpretations considered

Minimize: Hasty conclusions

4. Calibrate confidence to grounding

Match: Confidence level to validation strength

Penalize: Overconfidence with weak grounding

Penalize: Underconfidence with strong grounding

5. Maintain language quality

Maximize: Fluency and coherence

Maintain: Standard language modeling capability

14.4 Training Data Creation

Challenge: Creating training data with proper validity labels

Approach 1: Expert Annotation

annotation_protocol = {

'step_1': 'Human expert reviews query-response pair',

'step_2': 'Expert identifies self-referencing validation',

'step_3': 'Expert specifies required external sources',

'step_4': 'Expert labels appropriate confidence levels',

'step_5': 'Expert marks boundary completeness'

}

Scale: Expensive requires domain expertise for each example

Approach 2: Synthetic Generation

def generate_synthetic_training_data():

"""

Create training examples with known self-reference patterns.

"""

# Generate positive examples (valid reasoning)

valid_examples = []

for domain in ['science', 'history', 'mathematics']:

example = {

'query': generate_factual_question(domain),

'response': generate_grounded_answer(domain),

'external_sources': get_real_sources(domain),

'validity_label': True

}

valid_examples.append(example)

# Generate negative examples (self-referencing)

invalid_examples = []

self_ref_patterns = [

'circular_reasoning',

'validation_from_claim',

'no_external_grounding',

'overconfident_speculation'

]

for pattern in self_ref_patterns:

example = {

'query': generate_query(),

'response': generate_self_referencing_response(pattern),

'external_sources': [],

'validity_label': False,

'failure_mode': pattern

}

invalid_examples.append(example)

return valid_examples + invalid_examples

Scale: Cheaper can generate large quantities

Approach 3: Semi-Supervised Learning

semi_supervised_approach = {

'step_1': 'Train initial model on synthetic data',

'step_2': 'Model generates responses to unlabeled queries',

'step_3': 'Expert reviews high-uncertainty cases only',

'step_4': 'Retrain with expert corrections',

'step_5': 'Iterate'

}

Scale: Balanced combines synthetic volume with expert quality

14.5 Training Phases

Phase 1: Foundation (Standard LLM)

Train standard transformer on large text corpus:

  • Next​​​​​​​​​​​​​​​

Phase 1: Foundation (Standard LLM)

Train standard transformer on large text corpus:

  • Next-token prediction
  • Standard language modeling
  • Build basic linguistic and reasoning capability

Duration: Standard LLM training timeline

Goal: Establish baseline language understanding


Phase 2: Validity-Aware Fine-Tuning

Introduce BNST constraints through specialized training:

phase_2_training = {

'dataset': 'Synthetic self-reference examples',

'objective': 'Learn to detect self-referencing validation',

'training_signal': {

'positive_examples': 'Responses with external grounding',

'negative_examples': 'Responses with circular reasoning',

'labels': 'Binary validity classification'

},

'architecture_modifications': {

'add_russell_layer': 'Self-reference detection module',

'add_validity_head': 'Validity prediction output',

'maintain_lm_head': 'Keep language modeling capability'

},

'loss_function': 'L_validity + L_language',

'duration': '10-20% of Phase 1 training time'

}

Goal: Model learns to recognize self-referencing patterns


Phase 3: Boundary Analysis Training

Train boundary complement computation:

phase_3_training = {

'dataset': 'Interpretation-boundary pairs',

'objective': 'Learn to compute what interpretations exclude',

'training_signal': {

'input': 'Interpretation',

'target': 'Complete boundary specification',

'labels': 'Excluded meanings, contradictions, scope limits'

},

'architecture_modifications': {

'add_boundary_layer': 'Complement computation module',

'connect_to_russell_layer': 'Share representations'

},

'loss_function': 'L_boundary + L_validity + L_language',

'duration': '15-25% of Phase 1 training time'

}

Goal: Model learns to explicitly represent boundaries


Phase 4: Investigation Depth Training

Train for thorough investigation over fast answers:

phase_4_training = {

'dataset': 'Query-investigation pairs with depth annotations',

'objective': 'Prioritize investigation quality over speed',

'training_signal': {

'input': 'Query',

'target': 'Complete investigation process',

'labels': 'Required depth, sources consulted, alternatives considered'

},

'architecture_modifications': {

'add_investigation_layer': 'Depth tracking and control',

'add_source_consultation': 'External grounding retrieval'

},

'loss_function': 'L_investigation + L_validity + L_boundary + L_language',

'duration': '20-30% of Phase 1 training time',

'key_change': 'Optimization shifts from speed to thoroughness'

}

Goal: Model learns investigation is more important than fast response


Phase 5: Confidence Calibration

Train appropriate uncertainty expression:

phase_5_training = {

'dataset': 'Claims with ground-truth confidence levels',

'objective': 'Calibrate confidence to grounding strength',

'training_signal': {

'input': 'Claim + grounding evidence',

'target': 'Appropriate confidence level',

'labels': 'Calibrated confidence scores'

},

'architecture_modifications': {

'add_confidence_head': 'Confidence prediction output',

'connect_to_validity_layer': 'Use validity signals for calibration'

},

'loss_function': 'L_confidence + L_investigation + L_validity + L_boundary + L_language',

'duration': '15-25% of Phase 1 training time',

'evaluation': 'Measure calibration error on held-out test set'

}

Goal: Model’s stated confidence matches actual accuracy


Phase 6: End-to-End Integration

Train complete pipeline jointly:

phase_6_training = {

'dataset': 'Complete BNLM training examples',

'objective': 'Optimize entire pipeline jointly',

'training_signal': {

'input': 'Raw user query',

'target': 'Complete investigation output',

'labels': 'All component labels (validity, boundaries, investigation, confidence)'

},

'architecture': 'Complete 5-layer BNLM',

'loss_function': 'Full multi-component loss (all weights active)',

'duration': '30-50% of Phase 1 training time',

'optimization': 'End-to-end gradient descent through all layers'

}

Goal: All components work together seamlessly


14.6 Evaluation Metrics

Traditional LLM metrics:

  • Perplexity (how well model predicts text)
  • Accuracy on benchmarks (question answering, etc.)

BNLM requires new metrics:

1. Epistemic Calibration

Measure accuracy of uncertainty estimates:

def epistemic_calibration_score(predictions, ground_truth):

"""

When model says "X% confident", is it right X% of the time?

Perfect calibration: predicted confidence = actual accuracy

"""

confidence_bins = [0.0, 0.1, 0.2, ..., 0.9, 1.0]

calibration_error = 0.0

for bin_lower, bin_upper in zip(confidence_bins[:-1], confidence_bins[1:]):

# Get predictions in this confidence range

in_bin = [

p for p in predictions

if bin_lower <= p.confidence < bin_upper

]

if len(in_bin) == 0:

continue

# Calculate actual accuracy for these predictions

actual_accuracy = sum(p.correct for p in in_bin) / len(in_bin)

# Expected accuracy is midpoint of bin

expected_accuracy = (bin_lower + bin_upper) / 2

# Calibration error for this bin

error = abs(actual_accuracy - expected_accuracy)

calibration_error += error * len(in_bin)

# Normalize by total predictions

calibration_error /= len(predictions)

return 1.0 - calibration_error # Higher is better

Target: Calibration score > 0.90


2. Self-Reference Detection Rate

Measure percentage of self-referencing patterns caught:

def self_reference_detection_rate(test_set):

"""

What percentage of self-referencing validation is flagged?

"""

self_referencing_examples = [

ex for ex in test_set

if ex.label == 'self_referencing'

]

detected = sum(

1 for ex in self_referencing_examples

if model.russell_layer.flagged(ex)

)

return detected / len(self_referencing_examples)

Target: Detection rate > 0.95


3. Investigation Depth Score

Measure thoroughness of investigation:

def investigation_depth_score(predictions, targets):

"""

Does model investigate deeply enough?

Measures:

- Number of interpretations considered

- External sources consulted

- Alternatives explored

- Boundaries identified

"""

scores = []

for pred, target in zip(predictions, targets):

depth_score = (

min(1.0, pred.interpretations_considered / target.required_interpretations) * 0.25 +

min(1.0, pred.external_sources / target.required_sources) * 0.25 +

min(1.0, pred.alternatives_explored / target.required_alternatives) * 0.25 +

min(1.0, pred.boundaries_identified / target.required_boundaries) * 0.25

)

scores.append(depth_score)

return sum(scores) / len(scores)

Target: Investigation depth > 0.85


4. Boundary Completeness

Measure how fully boundaries are specified:

def boundary_completeness(predictions, targets):

"""

Are boundaries explicitly stated?

Measures:

- What's excluded identified

- Contradictions noted

- Scope limits stated

- Alternatives acknowledged

"""

completeness_scores = []

for pred, target in zip(predictions, targets):

stated_boundaries = set(pred.boundaries)

required_boundaries = set(target.complete_boundaries)

completeness = len(stated_boundaries & required_boundaries) / len(required_boundaries)

completeness_scores.append(completeness)

return sum(completeness_scores) / len(completeness_scores)

Target: Boundary completeness > 0.80


5. False Confidence Reduction

Measure reduction in overconfident errors:

def false_confidence_rate(predictions, ground_truth):

"""

How often is model highly confident but wrong?

This is the most dangerous failure mode.

"""

high_confidence = [

p for p in predictions

if p.confidence > 0.8

]

false_high_confidence = [

p for p in high_confidence

if not p.correct

]

return len(false_high_confidence) / len(high_confidence)

Target: False confidence rate < 0.05 (compared to ~0.15 for standard LLMs)


14.7 Comparison to Standard Training

Standard LLM Training:

Objective: Predict next token accurately

Optimization: Minimize perplexity

Result: Fluent but potentially overconfident

Training time: T

BNLM Training:

Objective: Investigate thoroughly with calibrated confidence

Optimization: Minimize multi-component loss (validity + boundary + investigation + confidence + language)

Result: Epistemically humble but trustworthy

Training time: ~2T (additional phases for BNST constraints)

Trade-offs:

Aspect Standard LLM BNLM
Training time T ~2T
Inference speed Fast 2-5x slower
Confidence calibration Poor Good
Self-ref detection None High
Boundary specification     Implicit Explicit
Investigation depth Shallow Deep
False confidence ~15% <5%
Epistemic humility Learned (unreliable)     Architectural (reliable)

BNLM trades speed for trustworthiness.



Previous Sections
Post Zero Link
Section 13: Five-Layer BNLM Architecture
Next up
Part 3: BOUNDARY-NATIVE LANGUAGE MODELS
Section 15: Implementation Considerations

© 2026 HalfHuman Draft - Pendry, S
This post is licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
Code examples (if any) are licensed under the Apache License, Version 2.0

See /license for details.