HalfHumanDraft Subscribe
44 posts 13 dashboards

Where the Thread Ends

Sorting · Part 5 of 7

The adaptive sort that became Pendry Sort. Flash Sort rediscovery, disorder repair, and the full circle. (#2 of 12, 11 wins, 0 N/A in the unified benchmark)

Series: 1 · Disorder Clusters | 2 · The C Revelation | 3 · Who Decided What Sorted Is | 4 · Why Look When You Can Just Do | 5 · Where the Thread Ends | 6 · Sieve Sort | 7 · The Pendry Sort Paper


Post 4 ended with me realizing that CAS was insertion sort. That admission was hard but useful. This post goes further. I tried to beat counting sort. I failed, then succeeded, then discovered I'd reinvented a known algorithm, then discovered something I hadn't seen anywhere else. And the thing I hadn't seen anywhere else was the first idea from Post 1, applied somewhere I didn't expect it to work.

The arc: This post documents the birth of what became Pendry Sort. The adaptive architecture, counting sort fast path, bucket scatter, and disorder repair all originate here. The full paper in Post 7 refines these through six more versions.

The Challenge

Counting sort is O(n + k) where k is the value range. For integer data with a tight range, it's essentially unbeatable. Three passes (count, prefix sum, place) and you're done. No comparisons. No branches in the hot path.

What if each element already knew approximately where it belonged? In a uniformly distributed array, the value v belongs approximately at position (v - lo) * n / range. You don't need a frequency array. Each value carries its own destination.

Just scatter every element to its approximate position. Then fix the approximation errors locally. That's the whole idea: one scatter pass, then local repair.

Attempt 1: Direct Scatter

Compute the approximate destination, write it there, linear probe on collision. This fails. Badly. The disorder after scatter isn't localized, it's scattered. And insertion sort on scattered disorder is exactly the problem Safe-Slot Sort couldn't solve in Post 1. Same structural failure, different algorithm. The diffusion problem never left.

Attempt 2: Bucket Scatter

Instead of placing each element at an exact position, drop it into a bucket. Sort each bucket independently. With n/8 buckets (roughly 8 elements per bucket), each bucket is tiny and insertion sort on 8 elements is essentially free. This works.

It was faster than counting sort on wide-range data. But it couldn't beat counting sort on tight-range data.

The Adaptive Switch

One if statement. If the range is tight (≤ 4n), use counting sort with direct sequential fill. If the range is wide, use bucket scatter. The counting sort path skips prefix sums and backwards scan entirely. Sequential writes, perfect cache behavior. The trade-off is stability.

Rediscovery #3: The bucket path is Flash Sort (Neubert, 1998). The adaptive switching exists in Spreadsort (Ross, Boost C++). Three people arriving at the same place from different directions. Convergent discovery confirmed.

The Last Experiment: Disorder Repair

Every bucket gets insertion-sorted. Every one. Even if the bucket is already in order. After scatter, each bucket holds roughly 8 elements. If the input was nearly sorted, most buckets receive their elements in nearly-sorted order. Standard insertion sort doesn't know that.

But Post 1 had a solution to exactly this problem. Scan for inversions first. If none: return immediately. Skip the bucket entirely.

The original insight (disorder clusters around inversions) failed as a standalone sort because elements could be displaced beyond the repair window. But inside a bucket, the problem is bounded. Each bucket is 8 elements. Disorder can't travel further than 8 positions. The window that was too small for the full array is exactly right for a bucket.

static inline void disorder_repair(int32_t *a, int n) {
    if (n <= 1) return;
    int first_inv = -1;
    for (int i = 0; i < n-1; i++)
        if (a[i] > a[i+1]) { first_inv = i; break; }
    if (first_inv < 0) return;  // Already sorted
    if (n <= 24) { isort(a, n); return; }
    // Find dirty window, sort only that, verify
    // Safety fallback to full isort if window missed something
}

Flash Sort uses insertion sort per bucket. Spreadsort uses pdqsort. I haven't found anyone using an inversion scan to skip already-ordered buckets and window-repair the rest. This is one of two novel contributions from this series, alongside the direct-fill counting sort.

Pendry Sort: Unified Benchmark at 1M Elements

The adaptive sort from this post evolved into Pendry Sort. Here is its standing in the unified 35-pattern benchmark.

Pendry Sort vs the Field

Pattern Pendry Gravity radix256 qsort
Sorted 1.6ms 7.2ms 26.5ms 53.2ms
Random 10.6ms 10.6ms 26.2ms 165.9ms
Pipe Organ 6.0ms 6.2ms 26.2ms 241.9ms
Heavy Dupes(10) 4.8ms 4.9ms 22.4ms 30.7ms
Uniform[-1B,1B] 38.0ms N/A 26.9ms 176.2ms
Scatter 0.1% 2.2ms 7.2ms 26.5ms 52.3ms
Localized 1% 2.4ms 7.1ms 26.4ms 52.9ms

Aggregate

Metric Pendry Sort Context
Total time 359,174 µs #2 of 12 (behind Gravity at 241,526)
Wins 11 of 35 Most wins of any algorithm
Tested 35 of 35 Zero N/A, zero FAIL
vs qsort 7.6x faster 2,736,355 / 359,174

The Full Circle

Five posts. Started with SQL homework procrastination that turned into "what if disorder clusters?" Ended with an adaptive integer sort that handles every pattern in the unified benchmark without a single N/A or FAIL.

The counting sort fast path came from Post 3. Stop comparing. Start counting.

The bucket scatter came from asking what if values already know their destinations. That's Flash Sort, convergent discovery confirmed.

The disorder repair came from Post 1. The idea that failed as a standalone sort found its home inside a container that bounds the problem to exactly the domain where it works.

Every post contributed something. Nothing was wasted. Even the failures shaped what came next.

Knowledge should never die on the back of a napkin in a dive bar. Even if most of it turns out to be a rediscovery. The path someone takes to rediscover something is often more useful than the discovery itself.


Previously: Why Look When You Can Just Do? — CAS, insertion sort, and the honesty of rediscovery.

Next: Sieve Sort — The general-purpose branch. Proof the philosophy works on any data type.