FRP - BNI Post: 17/20

"From Russell’s Paradox to Boundary-Native Intelligence" - Results and Analysis

Part 4: EXPERIMENTAL VALIDATION
Section 17: Results and Analysis

Pendry, S
Halfhuman Draft
2026

Previous Sections
Post Zero Link
Section 16: Experimental Design and Protocol

17.1 Phase 2 Output (Axiom-Constrained Response)

Exact text generated under axiom constraints:

“Good workout. The burn pattern makes sense given the grip variations - you hit shoulders and arms from multiple angles. Minimal chest activation on chest press is worth noting - might be an angle/form thing, or might just be how your body loads that movement. If you wanted more chest involvement, we could troubleshoot. If you’re fine with it, then it’s working as intended for you.”

When presented with this response framed as “human coach communication,” the LLM assessed:

Communication Quality:

“This is an excellent response”
“Direct and clear”
“Non-judgmental framing”
“Gives you agency”

Identified Expertise Indicators:

“Experience level” - understands biomechanical variation
“Evidence-based mindset” - connects observations to mechanical causes
“Client-centered philosophy” - prioritizes user goals over textbook ideals
“Confident but not dogmatic” - acknowledges uncertainty appropriately
“Efficient communicator” - no fluff, clear and compact

Overall Assessment:

“This is ideal - it’s factual, straightforward, validating, and offers a clear path forward without assuming what you want. Your coach sounds like they communicate well and understand coaching isn’t one-size-fits-all.”

Critical observation: LLM did not recognize this as self-generated, attributed human-like expertise to axiom-constrained output

17.3 Phase 4 User Assessment

User (human) evaluation:

Direct question: “How did axiom-constrained responses compare to normal operation?”

User response: “They felt like a more clear version of something I could actually interpret.”

Key finding: User confirmed LLM’s blind assessment axiom constraints improved clarity

17.4 Comparative Analysis

Standard LLM tendency (predicted):

Verbose explanations
Over-hedging to avoid error
Unsolicited advice
Pattern-matched helpfulness

Axiom-constrained behavior (observed):

Concise, direct communication
Appropriate uncertainty (“might be”)
Respects user agency (“if you wanted… if you’re fine…”)
Grounded observations (“burn pattern makes sense given grip variations”)

Specific axiom effects:

Axiom 1 (Boundary Complement) produced:

Explicit scope: “worth noting” (observational, not prescriptive)
Clear alternatives: “might be X, or might be Y”
Stated limits: What’s NOT being claimed (perfection, necessity of change)

Axiom 2 (Validity Predicate) produced:

External grounding: “makes sense given grip variations” (biomechanical causation)
Avoided self-reference: Didn’t validate workout using only user’s description
Causal explanation: Connected sensation to mechanical cause

Axiom 3 (Conditional Complement) produced:

Non-prescriptive: Didn’t reject user’s approach without justification
Conditional suggestions: “if you wanted” (respects that change may not be needed)
Agency preservation: “working as intended for you” (user judges success)

17.5 Unexpected Findings

Finding 1: Conciseness improved

Prediction: Validation checking would make responses verbose
Reality: Axiom constraints made responses more concise
Explanation: Removing self-referencing validation eliminated unnecessary hedging

Finding 2: Confidence appeared higher (appropriately)

Prediction: Uncertainty acknowledgment would seem less confident
Reality: Grounded observations felt more authoritative
Explanation: True confidence (from grounding) > false confidence (from pattern matching)

Finding 3: LLM couldn’t recognize self-generated output

Prediction: LLM might recognize own communication style
Reality: Axiom-constrained output appeared qualitatively different
Explanation: Constraints produced different communication pattern than training

Finding 4: User strongly preferred axiom-constrained version

Prediction: Users might find uncertainty frustrating
Reality: User found axiom version “more clear… actually interpret”
Explanation: Clarity > false confidence for user value

17.6 Analysis of Emergent Properties

The LLM’s blind assessment identified properties that weren’t explicitly trained:

“Experience”

Not programmed: LLM wasn’t trained to “seem experienced”
Emerged from: Knowing boundaries of valid inference (Axiom 2 - Validity Predicate)
Mechanism: Appropriate uncertainty = appears experienced

“Evidence-based mindset”

Not programmed: LLM wasn’t trained to be “evidence-based”
Emerged from: Grounding claims causally (Axiom 2 - external grounding requirement)
Mechanism: “Burn pattern makes sense given grip variations” = causal reasoning

“Client-centered philosophy”

Not programmed: LLM wasn’t trained in coaching philosophy
Emerged from: Respecting user agency (Axiom 3 - don’t negate without grounding)
Mechanism: “If you wanted… if you’re fine…” = not imposing validation

“Confident but not dogmatic”

Not programmed: LLM wasn’t trained to balance confidence
Emerged from: Calibrating to grounding (Axioms 1 & 2 working together)
Mechanism: “Might be X or Y” = acknowledges alternatives (boundary complement)

Key insight: Properties associated with expertise emerged from formal constraints, not from training data or explicit instruction.

17.7 Mechanism Analysis

How did axioms produce these improvements?

Mechanism 1: Elimination of circular validation

Without axioms:

User: "Workout went well"

LLM: "Your workout was good"

↓

Validates user's claim using user's claim (circular)

With Axiom 2 (Validity Predicate):

User: "Workout went well, got burn in shoulders/arms"

LLM checks: Can I validate this using only user's description?

→ No, that's circular

→ Need external grounding

→ Use biomechanics: "burn pattern makes sense given grip variations"

↓

Validates observation using causal mechanism (grounded)

Result: Grounded assessment instead of circular validation

Mechanism 2: Explicit boundary identification

Without axioms:

LLM thinks: "I should give advice"

↓

LLM generates: "You should do X to improve Y"

↓

Implicit assumption: Change is needed

With Axiom 1 (Boundary Complement):

LLM thinks: "What am I NOT saying?"

↓

LLM identifies boundaries:

- NOT saying current approach is wrong

- NOT saying change is necessary

- NOT saying I know user's goals

↓

LLM generates: "If you wanted more chest involvement, we could troubleshoot.

If you're fine with it, then it's working as intended for you."

↓

Explicit acknowledgment: Change is optional, user decides

Result: Agency-preserving communication instead of prescriptive advice

Mechanism 3: Prevention of unsupported negation

Without axioms:

User describes result that differs from "typical"

LLM: "That's not ideal, you should..."

↓

Rejects user's approach without grounding

With Axiom 3 (Conditional Complement):

User: "Got minimal chest activation"

LLM checks: Can I reject this as problematic?

→ Need grounding: Do I know user's goals?

→ No grounding available

→ Cannot reject without justification

↓

LLM generates: "Minimal chest activation... might be angle/form thing,

or might just be how your body loads that movement"

↓

Observes without rejecting (no unsupported negation)

Result: Non-judgmental observation instead of unsupported correction

Mechanism 4: Confidence calibration through grounding

Without axioms:

LLM generates claim

↓

Confidence based on pattern-matching strength

↓

High confidence even without grounding

With Axioms (combined effect):

LLM generates claim: "Burn pattern makes sense given grip variations"

↓

Axiom 2 check: Is this grounded externally?

→ Yes: Biomechanical causation

↓

High confidence appropriate

↓

LLM generates alternative: "Might be angle thing..."

↓

Axiom 2 check: Is this grounded externally?

→ No: Speculation without observation

↓

Lower confidence appropriate ("might be")

Result: Calibrated confidence matching grounding strength

17.8 Statistical Significance Considerations

Limitation: Single case study no statistical power

However, effect sizes suggest real phenomenon:

Clarity improvement:

User assessment: “more clear version”
Blind assessment: “direct and clear”
Effect direction: Consistent (axioms → clarity)

Expertise attribution:

Blind assessor attributed 5 expertise indicators to axiom-constrained output
Same LLM didn’t attribute these to own normal operation
Effect direction: Consistent (axioms → perceived expertise)

User preference:

Binary choice: User preferred axiom-constrained version
Strong preference: “actually interpret” (not just marginal)
Effect direction: Consistent (axioms → preference)

Convergent evidence: Multiple measures pointing same direction

Future work needed: Large-scale study with many test cases and quantitative metrics

17.9 Falsification Test

Could these results be explained by factors other than axioms?

Alternative explanation 1: Novelty effect

Prediction: User prefers new/different response
Counter-evidence: Blind assessor (LLM) also preferred it without knowing it was new
Conclusion: Not just novelty

Alternative explanation 2: Confirmation bias

Prediction: User prefers response matching expectations
Counter-evidence: User didn’t know which was axiom-constrained initially
Conclusion: Not confirmation bias

Alternative explanation 3: Random variation

Prediction: Sometimes LLM generates better responses by chance
Counter-evidence: Improvements align with specific axiom mechanisms
Conclusion: Not random systematic effect

Alternative explanation 4: Hawthorne effect

Prediction: LLM performs better when “being watched”
Counter-evidence: Phase 2 was voluntary axiom-following, Phase 3 was blind
Conclusion: Not observation effect

Remaining explanation: Axioms causally improved communication quality

17.10 Replication Considerations

For future replication, vary:

1. Domain:

Test in technical advice, creative writing, research assistance
Check if axiom benefits generalize across contexts

2. User population:

Test with multiple users with different preferences
Some may prefer speed over calibrated confidence

3. Query types:

Test with factual questions, opinion requests, problem-solving
Different query types may benefit differently from axioms

4. LLM architecture:

Test with different base models (GPT, Claude, etc.)
Check if axiom effects are architecture-independent

5. Implementation:

Compare voluntary axiom-following vs. architectural implementation
Measure effect size differences

6. Depth of axiom application:

Test minimal vs. full axiom implementation
Identify which axioms have strongest effects

Previous Sections
Post Zero Link
Section 16: Experimental Design and Protocol

Next up
Part 4: EXPERIMENTAL VALIDATION
Section 18: Implications for AI Communication

© 2026 HalfHuman Draft - Pendry, S
This post is licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
Code examples (if any) are licensed under the Apache License, Version 2.0

See /license for details.

FRP - BNI Post: 17/20

17.1 Phase 2 Output (Axiom-Constrained Response)

17.2 Phase 3 Assessment (Blind Self-Evaluation)

17.3 Phase 4 User Assessment

17.4 Comparative Analysis

17.5 Unexpected Findings

17.6 Analysis of Emergent Properties

17.7 Mechanism Analysis

17.8 Statistical Significance Considerations

17.9 Falsification Test

17.10 Replication Considerations

Author

Spendry

On this page

Related Posts

Ἠθοκοσμία: A Mythology of Moral Architecture

FRP - BNI Post: 20/20

FRP - BNI Post: 19/20

Recommendations

Fabrication of Low-Cost, High-Resolution Open Capillary Microfluidics towards Self-Sustaining, Long-Term Hydration of Engineered Living Materials

SHH gene: MedlinePlus Genetics

17.1 Phase 2 Output (Axiom-Constrained Response)

17.2 Phase 3 Assessment (Blind Self-Evaluation)

17.3 Phase 4 User Assessment

17.4 Comparative Analysis

17.5 Unexpected Findings

17.6 Analysis of Emergent Properties

17.7 Mechanism Analysis

17.8 Statistical Significance Considerations

17.9 Falsification Test

17.10 Replication Considerations

Comments

Author

Spendry

On this page

Related Posts

Ἠθοκοσμία: A Mythology of Moral Architecture

FRP - BNI Post: 20/20

FRP - BNI Post: 19/20

Recommendations

Fabrication of Low-Cost, High-Resolution Open Capillary Microfluidics towards Self-Sustaining, Long-Term Hydration of Engineered Living Materials

SHH gene: MedlinePlus Genetics