The Pendry Sort Paper
Sorting · Part 7 of 7
Five axioms, twelve algorithms, one table. The capstone of a series built from failure, rediscovery, and honest accounting. (Pendry Sort: #2 of 12. 4.6x over introsort.)
Series: 1 · Disorder Clusters | 2 · The C Revelation | 3 · Who Decided What Sorted Is | 4 · Why Look When You Can Just Do | 5 · Where the Thread Ends | 6 · Sieve Sort | 7 · The Pendry Sort Paper
I cannot read C.
The algorithm was written through a collaboration with Claude (an AI by Anthropic). I described what I wanted in plain English. Claude wrote the code. I ran the tests and told it what to fix. Every architectural decision came from me. Every performance regression was diagnosed from benchmark output, not from reading source code. Every claim in this post is backed by running the algorithm on actual data and measuring what happened.
This turned out to be a good thing. Because I could not just read the code and assume it was right, I tested everything.
The collaboration: Human intuition + AI implementation + empirical validation. A high school dropout who can't read C built a sorting algorithm that beats introsort 4.6x in aggregate.
The Five Axioms
Seven posts. Twelve algorithms. Three lineages. Five principles emerged that drove every decision and survived every failure:
- Structure exists; exploit it. Don't assume your data is random.
- Find inversions; fix locally. Don't tear the whole structure down and rebuild it.
- Work scales with disorder. Effort should match how much is actually wrong, not how much data exists.
- The consequence of the action contains the information. Collapse observe-plan-execute-verify into one motion.
- One touch, maximum extraction. Every time you load a value, get everything you can from it.
Axioms 1–3 came from Post 2. Axiom 4 from Post 4 and the FileMode.Create moment. Axiom 5 crystallized when the 16-point sample started extracting inversions AND range AND routing hints from the same 32 values.
The specific algorithms will be superseded. But these five questions are about design, not about sorting.
Origin map: Axiom 1: Post 1. Axiom 2: Post 1. Axiom 3: Post 2. Axiom 4: Post 4. Axiom 5: Post 5. Each emerged from a specific failure or insight.
The Architecture
Pendry Sort takes a source array of int32 elements and writes sorted output to a destination array. Not in-place. Not stable. Three phases:
Phase 0: Merged Scan-Route-Repair. A 16-point sample classifies the data while extracting an approximate value range from the same 32 loaded values. Routes to reverse detection, full merged scan, or straight to Phase 1/2. The merged scan counts inversions AND tracks the dirty window in a single walk. Window carry preserves work on bail. Boundary-only verification: two comparisons instead of O(n).
Phase 1: Direct-Fill Counting Sort. Range ≤ 4n. No prefix sums. Duplicate acceleration: 6.0x at 99% duplicates.
Phase 2: Bucket Scatter + Disorder Repair. Range > 4n. Flash Sort structure with inversion-scanning disorder repair per bucket instead of blind insertion sort.
Axiom 5 in practice: The 16-point sample extracts inversions + min + max. The merged scan extracts inversions + window bounds + bail coordinates. No pass serves only one purpose.
Little Pendry
Can the 384-line algorithm compress to its essentials? Little Pendry is 87 lines. In-place. Keeps the 16-point sample, merged scan-route-repair, and direct-fill counting sort. Drops bucket scatter and introsort; replaces them with shellsort (Ciura gaps).
In the unified benchmark: 474,821 µs total, 2 wins, 33/35 tested (2 FAILs on Two Blocks and Outlier). 2.7x faster than radix-256. 4.1x faster than introsort. The philosophy compresses.
Three proofs: Little Pendry proves the philosophy compresses (minimal machinery). Sieve Sort proves it generalizes (any data type). Pendry Sort proves it performs (maximum speed). One idea, three trade-offs.
The Full Family: 12 Algorithms, 35 Patterns, 1M Elements
Every algorithm from this series plus every standard sort. The progression from SafeSlot (16 N/A) to Pendry (0 N/A) is the series.
| Algorithm | Total (µs) | Wins | Tested | N/A | Fail | Origin |
|---|---|---|---|---|---|---|
| CAS-Binary | 18,205 | 0 | 9 | 26 | 0 | Post 4 |
| CAS-Core | 55,067 | 7 | 9 | 26 | 0 | Post 2/4 |
| MicroSSS | 56,675 | 0 | 9 | 26 | 0 | Post 2 |
| SafeSlot-C | 194,628 | 0 | 19 | 16 | 0 | Post 1/2 |
| Gravity Sort | 241,526 | 10 | 34 | 1 | 0 | Post 3 |
| Pendry Sort | 359,174 | 11 | 35 | 0 | 0 | Post 5 |
| Little Pendry | 474,821 | 2 | 33 | 0 | 2 | Addendum |
| Sieve Sort v3 | 810,240 | 4 | 35 | 0 | 0 | Post 6 |
| radix-256 | 897,877 | 1 | 35 | 0 | 0 | Industry |
| introsort | 1,637,971 | 0 | 35 | 0 | 0 | Industry |
| qsort | 2,736,355 | 0 | 35 | 0 | 0 | C stdlib |
| heapsort | 5,670,212 | 0 | 35 | 0 | 0 | Industry |
Totals are only comparable for algorithms tested on the same number of patterns. CAS-Core's 55K across 9 patterns is brilliant on its turf, but it can't play in the other 26.
Benchmark note: "introsort" in this table refers to GCC's
__gnu_cxx::__introsort_loopimplementation, the algorithm underlyingstd::sortin libstdc++. "qsort" is the C standard libraryqsort(), which uses function-pointer indirection per comparison and is therefore a softer target. The introsort comparison is the fairer one; the qsort number is included for completeness.
The N/A Column Tells the Story
SafeSlot-C: 16 N/A. The problem from Post 1 that started everything. MicroSSS/CAS-Core/CAS-Binary: 26 N/A. Brilliant on structured data, unusable on everything else. Gravity: 1 N/A. Pendry, Sieve, radix-256, introsort, qsort, heapsort: 0 N/A.
The progression from 16 N/A to 0 N/A is the series. Each post filled in more of the table. The algorithm got better by learning to handle patterns it previously couldn't.
The Family Tree
Inversion-scanning lineage: Safe-Slot → MicroSSS → CAS-Core → CAS-Binary. Best on nearly-sorted data. Worst on random. The insertion sort core is both the strength and the limitation.
Counting/distribution lineage: Gravity → Adaptive Sort → Pendry Sort → Little Pendry. Handles everything. Integer only.
Measurement-first lineage: Sieve Sort. Type-agnostic. Stable. Slower on integers than Pendry, but works on anything comparable.
The crossover: Disorder repair (inversion lineage) became a bucket finisher in Pendry (distribution lineage). The 16-point sample (CAS-Core's reverse detection) became Phase 0. The algorithms are a family. Each one carries DNA from the earliest experiments.
Honest Accounting
Integer only. No floats, strings, structs. For generic types, use Sieve Sort. Not stable. Direct-fill trading stability for cache performance. Not in-place. Needs a destination array. For in-place, use Little Pendry. Regime D is the weak point. Radix-256 beats Pendry on Uniform [-1B,1B] (26.9ms vs 38.0ms). Designed, not proven. Behavior class from measurement, not proof. Designed by someone who cannot read the code.
What I Learned
Failure modes are the curriculum. Every algorithm was born from the failure of the previous one. The thing that didn't work turned out to be the thing that worked, once it found the right container.
Honest accounting matters more than good results. Naming the rediscoveries myself kept the work credible.
The philosophy outlasted the code. The questions are about design, not about sorting.
You don't need credentials to contribute. I'm a high school dropout. I can't read C. The collaboration model worked.
Previously: Sieve Sort
This is the final post. Start from the beginning: Disorder Clusters