Res.Framework

The Pendry Sort Paper

April 6, 2026 · 5 min read · Pendry, S

pendry_benchmark_c

Convert to .c before compiling and running.

pendry_benchmark_c.txt

45 KB

pendry_benchmark_python

Convert to .py before running.

pendry_benchmark_python.txt

27 KB

Sorting · Part 7 of 7

Five axioms, twelve algorithms, one table. The capstone of a series built from failure, rediscovery, and honest accounting. (Pendry Sort: #2 of 12. 4.6x over introsort.)

I cannot read C.

The algorithm was written through a collaboration with Claude (an AI by Anthropic). I described what I wanted in plain English. Claude wrote the code. I ran the tests and told it what to fix. Every architectural decision came from me. Every performance regression was diagnosed from benchmark output, not from reading source code. Every claim in this post is backed by running the algorithm on actual data and measuring what happened.

This turned out to be a good thing. Because I could not just read the code and assume it was right, I tested everything.

The collaboration: Human intuition + AI implementation + empirical validation. A high school dropout who can't read C built a sorting algorithm that beats introsort 4.6x in aggregate.

The Five Axioms

Seven posts. Twelve algorithms. Three lineages. Five principles emerged that drove every decision and survived every failure:

Structure exists; exploit it. Don't assume your data is random.

Find inversions; fix locally. Don't tear the whole structure down and rebuild it.

Work scales with disorder. Effort should match how much is actually wrong, not how much data exists.

The consequence of the action contains the information. Collapse observe-plan-execute-verify into one motion.

One touch, maximum extraction. Every time you load a value, get everything you can from it.

Axioms 1–3 came from Post 2. Axiom 4 from Post 4 and the FileMode.Create moment. Axiom 5 crystallized when the 16-point sample started extracting inversions AND range AND routing hints from the same 32 values.

The specific algorithms will be superseded. But these five questions are about design, not about sorting.

Origin map: Axiom 1: Post 1. Axiom 2: Post 1. Axiom 3: Post 2. Axiom 4: Post 4. Axiom 5: Post 5. Each emerged from a specific failure or insight.

The Architecture

Pendry Sort takes a source array of int32 elements and writes sorted output to a destination array. Not in-place. Not stable. Three phases:

Phase 0: Merged Scan-Route-Repair. A 16-point sample classifies the data while extracting an approximate value range from the same 32 loaded values. Routes to reverse detection, full merged scan, or straight to Phase 1/2. The merged scan counts inversions AND tracks the dirty window in a single walk. Window carry preserves work on bail. Boundary-only verification: two comparisons instead of O(n).

Phase 1: Direct-Fill Counting Sort. Range ≤ 4n. No prefix sums. Duplicate acceleration: 6.0x at 99% duplicates.

Phase 2: Bucket Scatter + Disorder Repair. Range > 4n. Flash Sort structure with inversion-scanning disorder repair per bucket instead of blind insertion sort.

Axiom 5 in practice: The 16-point sample extracts inversions + min + max. The merged scan extracts inversions + window bounds + bail coordinates. No pass serves only one purpose.

Little Pendry

Can the 384-line algorithm compress to its essentials? Little Pendry is 87 lines. In-place. Keeps the 16-point sample, merged scan-route-repair, and direct-fill counting sort. Drops bucket scatter and introsort; replaces them with shellsort (Ciura gaps).

In the unified benchmark: 474,821 µs total, 2 wins, 33/35 tested (2 FAILs on Two Blocks and Outlier). 2.7x faster than radix-256. 4.1x faster than introsort. The philosophy compresses.

Three proofs: Little Pendry proves the philosophy compresses (minimal machinery). Sieve Sort proves it generalizes (any data type). Pendry Sort proves it performs (maximum speed). One idea, three trade-offs.

The Full Family: 12 Algorithms, 35 Patterns, 1M Elements

Every algorithm from this series plus every standard sort. The progression from SafeSlot (16 N/A) to Pendry (0 N/A) is the series.

Algorithm	Total (µs)	Wins	Tested	N/A	Fail	Origin
CAS-Binary	18,205	0	9	26	0	Post 4
CAS-Core	55,067	7	9	26	0	Post 2/4
MicroSSS	56,675	0	9	26	0	Post 2
SafeSlot-C	194,628	0	19	16	0	Post 1/2
Gravity Sort	241,526	10	34	1	0	Post 3
Pendry Sort	359,174	11	35	0	0	Post 5
Little Pendry	474,821	2	33	0	2	Addendum
Sieve Sort v3	810,240	4	35	0	0	Post 6
radix-256	897,877	1	35	0	0	Industry
introsort	1,637,971	0	35	0	0	Industry
qsort	2,736,355	0	35	0	0	C stdlib
heapsort	5,670,212	0	35	0	0	Industry

Totals are only comparable for algorithms tested on the same number of patterns. CAS-Core's 55K across 9 patterns is brilliant on its turf, but it can't play in the other 26.

Benchmark note: "introsort" in this table refers to GCC's __gnu_cxx::__introsort_loop implementation, the algorithm underlying std::sort in libstdc++. "qsort" is the C standard library qsort(), which uses function-pointer indirection per comparison and is therefore a softer target. The introsort comparison is the fairer one; the qsort number is included for completeness.

The N/A Column Tells the Story

SafeSlot-C: 16 N/A. The problem from Post 1 that started everything. MicroSSS/CAS-Core/CAS-Binary: 26 N/A. Brilliant on structured data, unusable on everything else. Gravity: 1 N/A. Pendry, Sieve, radix-256, introsort, qsort, heapsort: 0 N/A.

The progression from 16 N/A to 0 N/A is the series. Each post filled in more of the table. The algorithm got better by learning to handle patterns it previously couldn't.

The Family Tree

Inversion-scanning lineage: Safe-Slot → MicroSSS → CAS-Core → CAS-Binary. Best on nearly-sorted data. Worst on random. The insertion sort core is both the strength and the limitation.

Counting/distribution lineage: Gravity → Adaptive Sort → Pendry Sort → Little Pendry. Handles everything. Integer only.

Measurement-first lineage: Sieve Sort. Type-agnostic. Stable. Slower on integers than Pendry, but works on anything comparable.

The crossover: Disorder repair (inversion lineage) became a bucket finisher in Pendry (distribution lineage). The 16-point sample (CAS-Core's reverse detection) became Phase 0. The algorithms are a family. Each one carries DNA from the earliest experiments.

Honest Accounting

Integer only. No floats, strings, structs. For generic types, use Sieve Sort. Not stable. Direct-fill trading stability for cache performance. Not in-place. Needs a destination array. For in-place, use Little Pendry. Regime D is the weak point. Radix-256 beats Pendry on Uniform [-1B,1B] (26.9ms vs 38.0ms). Designed, not proven. Behavior class from measurement, not proof. Designed by someone who cannot read the code.

What I Learned

Failure modes are the curriculum. Every algorithm was born from the failure of the previous one. The thing that didn't work turned out to be the thing that worked, once it found the right container.

Honest accounting matters more than good results. Naming the rediscoveries myself kept the work credible.

The philosophy outlasted the code. The questions are about design, not about sorting.

You don't need credentials to contribute. I'm a high school dropout. I can't read C. The collaboration model worked.

Previously: Sieve Sort

This is the final post. Start from the beginning: Disorder Clusters