HalfHumanDraft Subscribe
44 posts 13 dashboards

Disorder Clusters

Sorting · Part 1 of 7

What if you found the pockets of chaos and fixed just those? (#5 of 12 in the unified benchmark)

Series: 1 · Disorder Clusters | 2 · The C Revelation | 3 · Who Decided What Sorted Is | 4 · Why Look When You Can Just Do | 5 · Where the Thread Ends | 6 · Sieve Sort | 7 · The Pendry Sort Paper


I was procrastinating on SQL homework when I noticed something about sorted data that I couldn't stop thinking about.

If you take a sorted array and mess up a small piece of it, a batch write failure, a sensor glitch, a handful of records arriving out of order, the disorder isn't spread evenly. It clusters. There are pockets of chaos in a sea of order. And every sorting algorithm I could find ignores that. They treat the whole array as a uniform problem: scan everything, compare everything, sort everything.

What if you found the pockets and fixed just those?

That question produced Safe-Slot Sort. This post is the story of where it started, what worked, what didn't, and how far a simple idea can go before it hits its limits.

Series context: This is Part 1 of a 7-part series documenting the development of sorting algorithms from first principles. All benchmark data in this series comes from a unified 35-pattern test suite at 1M elements, tested against 12 algorithms including industry standards (qsort, introsort, heapsort, radix-256).

The Insight

An inversion is where array[i] > array[i+1], a pair that's locally out of order. When you find one, the elements nearby are usually also displaced. By capturing a window around each inversion (I used ±8 elements), you grab the entire problem region.

If two inversions are far apart, their windows don't overlap. You can sort them separately. The problem decomposes into small, independent subproblems.

On nearly-sorted data, inversions are sparse. The algorithm scans O(n) to find them, then sorts O(k) elements where k is much smaller than n. Total work is O(n + k log k) instead of O(n log n).

That's the core. Everything else is implementation.

Complexity: O(n + k log k) where k = number of displaced elements. When k « n, this approaches linear time. When k ≈ n, it degrades to O(n log n) per iteration with up to 100 iterations.

The Algorithm

// The Safe-Slot Sort loop
1. Scan array for inversions (a[i] > a[i+1])
2. If no inversions: done, array is sorted
3. For each inversion, mark indices within ±window
4. Collect all marked indices (overlapping windows merge)
5. Sort the values at those indices
6. Place sorted values back
7. Repeat until no inversions remain

The window size of 8 was found experimentally. Large enough to capture related disorder, small enough to keep neighborhoods from merging unnecessarily.

The presort check matters. Without it, a reverse-sorted array takes 97x longer than Python's built-in sorted(). With it, Safe-Slot is actually faster than sorted() on reversed input. One sample pass saves the algorithm from its worst case.

Design choice: Window size of 8 was never optimized. Every benchmark in the entire series uses this value. Smaller windows might work better for sparse inversions. Larger windows for dense local corruption. This remains an unexplored dimension.

The Python Implementation

# Copyright 2026 HalfHuman Draft - Pendry, S
# Licensed under the Apache License, Version 2.0

import numpy as np

def safe_slot_sort(arr, window=8):
    a = arr.copy().astype(np.int64)
    n = len(a)
    if n <= 1: return a, 0

    # Presort check: detect reverse-sorted input
    samples = np.linspace(0, n-2, min(150, n-1), dtype=int)
    if np.sum(a[samples] > a[samples+1]) / len(samples) > 0.8:
        a = a[::-1].copy()

    for iteration in range(100):
        inversions = np.where(a[:-1] > a[1:])[0]
        if len(inversions) == 0: return a, iteration

        mask = np.zeros(n, dtype=bool)
        for inv in inversions:
            mask[max(0,inv-window):min(n,inv+window+2)] = True

        indices = np.where(mask)[0]
        a[indices] = np.sort(a[indices])

    return a, 100

Implementation: Python + NumPy. The presort check samples 150 positions. The iteration cap of 100 is a safety valve for pathological inputs. In practice, nearly-sorted data converges in 1–2 iterations.

Unified Benchmark Results

Safe-Slot Sort (Python) tested at 1M elements, 5 trials, median timing. Compared against Python sorted(), numpy.sort(), numpy mergesort, and numpy heapsort.

Where Safe-Slot Wins

Pattern Safe-Slot numpy.sort() sorted() Result
Sorted 4.3ms 19.8ms 71.8ms 4.6x vs numpy
Reverse 4.7ms 18.8ms 81.3ms 4.0x vs numpy
Plateau 2.6ms 9.5ms 61.6ms 3.7x vs numpy
Localized 0.1% 3.5ms 16.0ms 62.3ms 4.6x vs numpy
Scatter 0.1% 7.3ms 16.9ms 78.1ms 2.3x vs numpy

Where Safe-Slot Loses

Pattern Safe-Slot numpy.sort() Result
Local Swap 5% 58.2ms 16.4ms 3.5x slower
Scatter 5% 105.0ms 18.5ms 5.7x slower
Heavy Dupes (10) 405.0ms 4.6ms 88x slower
Random N/A 16.4ms Would not converge
Pipe Organ N/A 17.8ms Would not converge

Aggregate Standing

Metric Value
Total time (tested patterns) 2,784.9ms
Wins 2 of 35
Tested 22 of 35 (12 N/A, 1 FAIL)
Ranking #6 of 8 Python algorithms by total time

What Didn't Work: The Failure Modes

The benchmarks paint a clear picture: Safe-Slot wins when disorder is sparse and localized, loses when disorder is dense or scattered. But the real problem was deeper than crossover points. Safe-Slot had patterns that could break it entirely.

The diffusion problem. Take one element and move it from position 0 to position 10,000,000. The algorithm sees one inversion. It captures a window of 8 elements around it. Sorts those 8. The displaced element moves about 8 positions. Next iteration: one inversion, another tiny fix. At 8 positions per iteration with a cap of 100 iterations, an element that needs to travel 10 million positions never gets home. The output is unsorted.

Producing unsorted output isn't just slow. It's broken. That's not acceptable.

The benchmark confirms this. Safe-Slot is marked N/A on 12 of 35 test patterns and FAIL on Binary(0/1). On the 22 patterns where it does run, its total time of 2,784ms puts it second to last among Python competitors. Only list.sort() and sorted() are slower in aggregate, and they handle every pattern without failure.

Honest accounting: 12 N/A patterns is a lot. The algorithm can't handle: random data, pipe organ, sawtooth, two-block swaps, outlier displacement, organ+noise, wide-range data, clustered data, or high-percentage scatter. That's most real-world scenarios.

What I Learned

Three things came out of this first attempt.

The insight is real. Disorder clusters around inversions, and exploiting that clustering produces genuine speedups on structured data. Safe-Slot is 4.6x faster than numpy.sort() on sorted data and 4.0x faster on reversed data. This isn't a benchmarking artifact.

The failure modes are predictable. Far displacement, periodic patterns, and sawtooth inputs all break the algorithm in the same way: elements need to travel further than the window allows per iteration. The fix is either bailout (detect and fall back to a standard sort) or adaptation (detect and expand the window).

Python was lying to me about the algorithm's potential. The 2–4x speedups felt meaningful but modest. I wondered if the overhead of Python's interpreter, numpy's internal dispatch, and object allocation were masking what the algorithm could actually do in a language where those costs disappear.

There was only one way to find out.

What comes next: The C port in Post 2 reveals the real algorithm. 18x faster than qsort on nearly-sorted data. The Python version was showing us a shadow of what the approach could do.

The Adaptive Attempt

Before moving to C, I tried to fix the failure modes in Python with an adaptive wrapper: estimate disorder upfront, then pick the faster path.

# If disorder < threshold: use Safe-Slot
# If reverse-sorted: flip array, use Safe-Slot
# Otherwise: use numpy.sort()

First test, zero parameter tuning: the adaptive version finished 1.42x faster than numpy.sort() in aggregate across the 30 patterns where it ran. That's the best overall result of any algorithm in the Python benchmark. The architecture worked: it correctly identified already-sorted, reverse-sorted, and fully-random inputs. But the threshold between "use Safe-Slot" and "use numpy" was miscalibrated. It kept using Safe-Slot in the 1.5–5% disorder range where it should have switched.

And five patterns still tripped it. Sawtooth data, pipe organ, two-block swaps, outlier displacement, and organ+noise all fooled the disorder sampler into thinking the data was nearly sorted when it wasn't. The sampler uses evenly-spaced probes that can land on ascending runs in sawtooth patterns, completely missing the resets to zero between teeth.

Benchmark: Safe-Slot Adaptive: 365.5ms total, 1 win, 30/35 tested. 1.42x faster than numpy.sort(). Best aggregate of any Python algorithm in the suite.

Classification

Property Value
Category Adaptive comparison sort (inversion-based)
Best case O(n), already sorted, one scan
Sweet spot O(n + k log k), sparse localized disorder
Worst case O(n²) or unsorted output on pathological inputs
Auxiliary space O(n) for mask, indices, values buffers
Stable No
Data type Integer (Python: int64 via numpy)

Effective domain: Data that was recently sorted and received small, localized perturbations. Sensor data with occasional glitches. Database indices after a handful of inserts. Log files with minor reordering. Anything where you can say "this was sorted, and then a little bit went wrong."

Failure domain: Everything else. Random data, adversarial patterns, wide-range values, heavy duplicates, periodic structures, and any scenario where disorder is spread across the array rather than clustered in a few spots.

The Tenets

Somewhere during this work, three principles emerged that would drive everything that came after. Not just the sorting algorithms, but the entire approach to problem-solving that the series documents:

Structure exists; exploit it. Don't assume your data is random. Most data in real systems isn't.

Find inversions; fix locally. When something is broken, fix that thing. Don't tear the whole structure down and rebuild it.

Work scales with disorder. A nearly-sorted array should take nearly no work. A fully random array should take full work. The algorithm's effort should be a function of how much is actually wrong, not how much data exists.

These aren't clever. They're obvious, in retrospect. But every comparison-based sort I could find violates at least one of them. Quicksort violates the first: it partitions everything, even the parts that are already partitioned. Merge sort violates the second: it tears the array into pieces and rebuilds from scratch. Heapsort violates the third: it does O(n log n) work regardless of how ordered the input is.

Safe-Slot Sort obeys all three. That's why it's fast on structured data. That's also why it fails on unstructured data: when disorder doesn't cluster, the tenets don't apply, and the algorithm has no fallback.

The rest of this series is about building that fallback. And discovering, along the way, that the tenets lead somewhere much further than one algorithm.

The tenets: These three principles survive through all seven posts. They drive the C port, the counting sort philosophy, the CAS insight, the adaptive dispatch in Pendry Sort, and the window detector in Sieve Sort. They are the actual contribution of this series.


Next: The C Revelation — What happened when Python overhead disappeared and the real algorithm showed up.