HalfHumanDraft Subscribe
44 posts 13 dashboards

The Sorting Series: The Intro

Series Introduction

Seven posts, twelve algorithms, five axioms. How one line of C# turned into a sorting algorithm that beats introsort 4.6x across 35 test patterns.

Series: 1 · Disorder Clusters | 2 · The C Revelation | 3 · Who Decided What Sorted Is | 4 · Why Look When You Can Just Do | 5 · Where the Thread Ends | 6 · Sieve Sort | 7 · The Pendry Sort Paper


Inside of a coworker code i found this:

new FileStream(path, FileMode.Create, FileAccess.Write);

I noticed that there were no checks to see if a file existed first. The standard pattern is check, then look at it, then delete, then create. Four operations.

I figured that his reasoning was, we don't have many files. We have one that we'll replace soon anyway. So why look? Just create.

FileMode.Create doesn't ask if the file exists. It just creates. If something was there, it's gone. If nothing was there, now something is. One action. The state afterward IS the goal.

That snippet helped open my eyes a bit about problems. Not just files. Everything. If you're already doing the thing that produces the right outcome, why spend time observing, planning, and verifying separately? The consequence of the action contains all the information those extra steps would have gathered.

I went home and started asking: what if a sorting algorithm worked the same way? Instead of sorting everything from scratch, why scan then fix, what if you just looked at the data, found the parts that were wrong, and fixed those parts? Everything already in the right order gets left alone. The action (repair the disorder) replaces the traditional action (sort everything). The consequence is the same. The cost is lower.

That question produced seven posts, twelve algorithms, three rediscoveries, two novel contributions, and a set of five design axioms that I think matter more than any of the code.

Who I am: High school dropout. Operations Production Lead, night shift, medical device company. No CS degree. Cannot read C. Built every algorithm in this series by describing what I wanted in plain English, collaborating with Claude (Anthropic), running benchmarks, and being honest about the results. Every claim is backed by measurement.

The Five Axioms

Five principles emerged from the failures and iterations across this series. They are not specific to sorting. They are design principles for any system that processes structured data.

  1. Structure exists; exploit it. Don't assume your data is random. Most data in real systems isn't.
  1. Find inversions; fix locally. When something is broken, fix that thing. Don't tear the whole structure down and rebuild it.
  1. Work scales with disorder. The algorithm's effort should match how much is actually wrong, not how much data exists.
  1. The consequence contains the information. Don't observe, plan, execute, verify as separate phases. Collapse them.
  1. One touch, maximum extraction. Every time you load a value from memory, get everything you can out of it.

Axioms 1 through 3 crystallized during the C port in Post 2. Axiom 4 came from the FileMode.Create moment, formalized in Post 4. Axiom 5 emerged during the Pendry Sort development when I realized the algorithm was throwing away information it had already gathered, documented in Post 7.

These axioms produced an algorithm that beats GCC's introsort (std::sort) 4.6x in aggregate across 35 test patterns (7.6x over C's qsort, which pays additional function-pointer overhead per comparison), handles every data shape without failure, and was designed by someone who cannot read the language it's written in.

The collaboration: Human intuition + AI implementation + empirical validation. I described what I wanted. Claude wrote the code. I ran the benchmarks and diagnosed regressions from output, not from reading source. This turned out to produce more thorough testing than code-reading would have, because I had no way to assume anything was correct. I had to measure everything.

The Benchmark

Every algorithm in this series was tested against every other algorithm and four industry-standard sorts under identical conditions. One million 32-bit integers. 35 data patterns covering sorted, reversed, random, nearly-sorted (at seven corruption levels), adversarial shapes (pipe organ, sawtooth, two-block swaps), value-range variations (heavy duplicates, narrow range, binary, Gaussian, Zipf, wide uniform, huge range), and structural patterns (clustered, plateau, outlier displacement). Multiple trials per test. Median timing. Every result verified for correctness.

The benchmark harness, source code for every algorithm, and raw output are all freely available. Every number in every post comes from this one unified test.

The Full Table: 12 Algorithms, 35 Patterns, 1M Elements

Sorted by total time. N/A = algorithm can't handle that pattern (would produce wrong output or run indefinitely). FAIL = produced incorrect output. Totals are only comparable between algorithms tested on the same number of patterns.

# Algorithm Total (µs) Wins Tested N/A Fail Origin
1 Gravity Sort 241,526 10 34 1 0 Post 3
2 Pendry Sort 359,174 11 35 0 0 Posts 5–7
3 Little Pendry 474,821 2 33 0 2 Post 7
4 CAS-Binary 18,205 0 9 26 0 Post 4
5 SafeSlot-C 194,628 0 19 16 0 Posts 1–2
6 Sieve Sort v3 810,240 4 35 0 0 Post 6
7 radix-256 897,877 1 35 0 0 Industry
8 introsort 1,637,971 0 35 0 0 Industry
9 qsort 2,736,355 0 35 0 0 C stdlib
10 CAS-Core 55,067 7 9 26 0 Post 4
11 MicroSSS 56,675 0 9 26 0 Post 2
12 heapsort 5,670,212 0 35 0 0 Industry

How to Read the Table

Gravity Sort (#1 by total) is a counting sort. It wins on aggregate because it runs in O(n + k) with perfect cache behavior on 34 of 35 patterns. It can't handle data with a value range exceeding four times the element count. It doesn't adapt. It just counts. It's the fastest algorithm in the benchmark for the same reason a hammer is the fastest tool for nails: it does exactly one thing.

Pendry Sort (#2 by total, #1 by wins) is the adaptive dispatch algorithm. It handles every single pattern. 11 wins, zero N/A, zero failures. It observes the data with a 16-point sample, routes to the cheapest strategy (counting sort for tight ranges, bucket scatter for wide ranges, window repair for nearly-sorted data), and extracts maximum information from every pass. Its total time of 359,174 microseconds across all 35 patterns gives it a 4.6x aggregate advantage over GCC's introsort (7.6x over C's qsort, which carries additional function-pointer indirection per comparison).

CAS-Core and CAS-Binary (#10, #4 by per-pattern average) look fast in the total column (18–55K microseconds) but they were only tested on 9 of 35 patterns. Their insertion sort core goes O(n²) on anything that isn't nearly sorted. They dominate their 9 patterns. They can't play in the other 26. The N/A count IS their limitation, documented honestly.

The industry standards (introsort, qsort, heapsort, radix-256) all handle every pattern without failure. They're reliable. They're never the fastest. Introsort comes closest at 1,637,971 microseconds, which is 4.6x slower than Pendry Sort in aggregate. The gap exists because the industry sorts treat all input as potentially random. The adaptive sorts check first and skip the work that doesn't need doing.

About the N/A column: The progression from 16 N/A (SafeSlot) to 0 N/A (Pendry Sort) is the story of the series in one column. Each post filled in more patterns. Each failure mode that got resolved opened up a new row. The algorithm didn't get better by getting faster on patterns it could already handle. It got better by learning to handle patterns it previously couldn't.

What's Novel, What's Rediscovery

Honest accounting matters more than claiming novelty.

Rediscovered: CAS is insertion sort (Post 4). The bucket scatter path is Flash Sort, published by Neubert in 1998 (Post 5). The adaptive counting/bucket switch exists in Spreadsort from the Boost C++ library. Three convergent discoveries, all named honestly in the posts where they were found.

Novel contributions: Disorder repair as a bucket finisher (scanning each bucket for inversions and skipping already-ordered buckets, which I haven't found in published sorting literature). Direct-fill counting sort with no prefix sums (trading stability for sequential cache-friendly writes). The merged scan-route-repair pass (one walk that simultaneously counts inversions, tracks the dirty window, evaluates bail conditions, and produces range hints). The five axioms as a portable framework for adaptive algorithm design.

The real contribution: The path. The sequence of failures, wrong turns, rediscoveries, and honest admissions that teaches why the algorithms are what they are. Textbooks give you the finished algorithm. This series gives you the thirteen wrong turns that led to the right one.

The Posts

1. Disorder Clusters — The origin. SQL homework procrastination, the inversion insight, Safe-Slot Sort in Python and C. Where the three tenets first appear. The algorithm that started everything and all the ways it breaks. #5 of 12

2. The C Revelation — Python was hiding the algorithm. The C port reveals 18x speedups. The version evolution through four variants. Micro SSS emerges as the tenet-distilled core. The three axioms crystallize. SSS #4 · CAS #10 · µSSS #11

3. Who Decided What Sorted Is? — Stop comparing. Start counting. Gravity Sort takes #1 by total time. The beach was never unsorted. Three philosophies of sorting mapped against the benchmark. The first rediscovery. #1 of 12

4. Why Look When You Can Just Do? — The FileMode.Create moment. The 5-line truth. The uncomfortable admission that CAS is insertion sort. The SIMD failure. The parallel attempt. The fourth axiom is born. CAS-BIN #4 by avg

5. Where the Thread Ends — The adaptive sort that became Pendry Sort. Beating counting sort. Flash Sort rediscovery. Disorder repair as the bucket finisher. The full circle: Post 1's idea finds its home. #2 of 12 · 11 wins

6. Sieve Sort — The general-purpose branch. Pendry Sort only works on integers, so what about everything else? Window detection, natural merge routing, and proof the philosophy works on any data type. #6 of 12

7. The Pendry Sort Paper — The capstone. All five axioms. The behavior class definition. Little Pendry as proof the philosophy compresses. The family tree. The full table. Everything on one page. #2 of 12 · 4.6x over introsort

The Benchmark Methodology

Test Environment

Windows 11, AMD Ryzen, GCC (MinGW-w64) with -O2 optimization. All algorithms compiled into a single binary. Same machine, same conditions, same data for every test. Both Python (numpy + CPython) and C benchmarks were run, with the C benchmark serving as the unified reference for the series.

Test Patterns (35 total)

Ideal cases: sorted, reverse, all identical. Nearly-sorted with local adjacent swaps at 0.1%, 1%, 5%, 10%. Nearly-sorted with scattered random swaps at 0.1%, 1%, 5%, 10%, 25%, 50%, 100%. Localized contiguous corruption at 0.1%, 1%, 5%, 10%, 25%. Adversarial shapes: pipe organ (ascending then descending), sawtooth (repeated ramps), two-block swap, outlier displacement, organ with random noise. Value-range variations: heavy duplicates (10 unique values), narrow range (100 values), binary (0/1), ternary (0–2), Gaussian narrow, Zipf-like skew, uniform wide [0, 10N], uniform huge [-1 billion, 1 billion], clustered (8 groups). Structural: plateau (flat middle section).

Timing

High-resolution performance counters (QueryPerformanceCounter on Windows, clock_gettime CLOCK_MONOTONIC on Linux). Multiple trials per test, median reported. Every result verified for correctness with a post-sort scan.

N/A Policy

Algorithms are marked N/A on patterns where they would produce incorrect output (SafeSlot-C on adversarial inputs), require unreasonable memory (Gravity Sort when value range exceeds 4N), or take catastrophically long due to O(n²) behavior (Micro SSS, CAS-Core, CAS-Binary on random/adversarial data at 1M elements). N/A patterns are not timed. The reasons are documented in each post.

Reproducibility: The benchmark source code (both Python and C) is released under Apache 2.0. Download, compile, run, and get your own numbers. The relative performance between algorithms should hold across hardware. The absolute times will vary.

Start reading: Post 1: Disorder Clusters