HalfHumanDraft Subscribe
44 posts 13 dashboards

Why Look When You Can Just Do?

Sorting · Part 4 of 7

The FileMode.Create moment, the insertion sort admission, and the birth of the fourth axiom. (CAS-Binary: #4 of 12 by total on tested patterns)

Series: 1 · Disorder Clusters | 2 · The C Revelation | 3 · Who Decided What Sorted Is | 4 · Why Look When You Can Just Do | 5 · Where the Thread Ends | 6 · Sieve Sort | 7 · The Pendry Sort Paper


Inside of a coworker code i found this:

new FileStream(path, FileMode.Create, FileAccess.Write);

I noticed that there were no checks to see if a file existed first. The standard pattern is check, then look at it, then delete, then create. Four operations.

I figured that his reasoning was, we don't have many files. We have one that we'll replace soon anyway. So why look? Just create.

FileMode.Create doesn't ask if the file exists. It just creates. If something was there, it's gone. If nothing was there, now something is. One action. The state afterward IS the goal.

I stared at that line for a while. Then I went home and rewrote everything I'd been doing with sorting.

The principle: Observe → Plan → Execute → Verify becomes just DO. The consequence of the action contains everything the other three phases would have gathered.

The 5-Line Truth

for (int i = 0; i < n - 1; i++)
    if (a[i] > a[i+1]) {
        int v = a[i+1], j = i;
        while (j >= 0 && a[j] > v) { a[j+1] = a[j]; j--; }
        a[j+1] = v;
    }

Walk through positions. When something is broken (a[i] > a[i+1]), grab the misplaced element and cascade it backward until it settles. Where it stops IS where it belongs. The elements that shifted ARE sub-fixes. The fact that you reach the end without triggering IS the proof of sorted.

No counting. No planning. No verification. The consequence of fixing IS the information about what was broken.

I called it Consequence-as-Action Sort. CAS.

The Uncomfortable Admission

This is insertion sort.

Not "inspired by" or "related to." It is insertion sort. The 5-line truth is textbook insertion sort with one optimization: the if (a[i] > a[i+1]) check skips elements that are already in place.

And here's the thing that took me longer to accept: gravity sort from Post 3 (the comparison-based fallback version, not the counting sort), echo sort, and micro_sss from Post 2 were all the same algorithm too. Same motion, different variable names. "An element falls to its floor" and "a ripple propagates until it settles" and "a fix cascades through shifted elements" are three metaphors for one motion: find the insertion point, shift everything above it, place the element.

Post 3 asked "who decided what sorted is?" and I spent pages on three sorts that turned out to be one sort wearing different clothes. That's embarrassing. It's also honest. And the embarrassment taught me something: metaphors can fool you into thinking you've found something new when you've found a new way to see something old.

But a new way to see something old IS sometimes the contribution.

Rediscovery count: This is rediscovery #2 in the series. Gravity/Echo as counting sort was #1. CAS as insertion sort is #2. Flash Sort as bucket sort in Post 5 will be #3. The pattern of convergent discovery becomes a lesson in itself.

What the Framing Reveals

The CAS framing isn't novel code. It's a novel question about code: what information does the fix itself produce?

When you move an element from position i+1 to position j+1:

Distance traveled (i − j) = displacement. You didn't calculate it. It emerged. Settling point (j+1) = correct position. You didn't search for it. It's where the cascade stopped. Elements shifted (i − j count) = sub-fixes. You didn't plan them. They happened. No more triggers = sorted. You didn't verify. The absence IS the proof.

Traditional algorithm design treats these as separate problems. CAS treats them as consequences of a single action. The difference is design philosophy, not computational complexity.

And design philosophy matters. FileMode.Create and check-then-delete-then-create produce the same result. But one is three syscalls, three failure modes, and three things to maintain. The other is one.

CAS-Binary: The Production Variant

The cascade in the 5-line truth does a linear scan backward to find where the element belongs. For a large displacement, that's O(k) comparisons per element. Binary search makes it O(log k):

void cas_binary(int* a, int n) {
    if (n <= 1) return;
    detect_and_flip_reverse(a, n);
    for (int i = 0; i < n - 1; i++) {
        if (a[i] <= a[i+1]) continue;
        int v = a[i+1];
        int left = 0, right = i + 1;
        while (left < right) {
            int mid = (left + right) >> 1;
            if (a[mid] <= v) left = mid + 1;
            else right = mid;
        }
        if (left <= i) {
            memmove(&a[left+1], &a[left],
                    (i+1-left) * sizeof(int));
            a[left] = v;
        }
    }
}

The memmove is key. Instead of shifting elements one at a time in a while loop, a single bulk memory move. Modern CPUs are built for this. On adversarial data, CAS-Binary cuts times by 4–5x versus CAS-Core.

Benchmark: CAS-Binary: 18,205 µs total across 9 tested patterns, 0 wins. The lowest total per-pattern of any algorithm. But only 9 patterns are safe to run, which limits its competitive standing.

What Failed

SIMD Scan

AVX-512 to scan 16 elements at once for inversions. Result: everything got worse, nearly 2x worse on adversarial data. Finding inversions was never the bottleneck. The scalar if (a[i] > a[i+1]) check is already O(n) and branch-predicted well on structured data. The bottleneck is the fixing. SIMD made the cheap part faster and added overhead that slowed down the expensive part.

Repair Blocks

Identify a contiguous block of disorder and fix it all at once. Results: identical to CAS-Core within noise. Once you allow the cascade to go back past the block boundary (which you must for correctness), you're just doing CAS-Core with extra steps. One sentence in the blog is all it deserved.

4-Way Parallel

Four threads, each sorting a disjoint quarter of the array, then merge. Cuts adversarial times by ~60%. "Two halves swapped" drops to 0ms because each half is already sorted. But adds O(n) memory for the merge buffer and thread overhead on small arrays.

Lessons: Three failures, three lessons. (1) Profile before you optimize. (2) Extra steps that don't change the fundamental motion are waste. (3) Parallelism helps when the problem decomposes cleanly.

The Fourth Axiom

Four posts in, and a new principle had emerged from the CAS work that complemented the three tenets from Post 2:

The consequence of the action contains the information. Don't observe, plan, execute, verify as separate phases. Collapse them. The fix IS the observation IS the plan IS the proof.

This principle didn't produce a faster algorithm. CAS is insertion sort. Its complexity is well-known. But it produced a design question that drove everything after: what information does each touch already contain?

That question eventually became the fifth axiom in the Pendry Sort paper: "one touch, maximum extraction." Every time you load a value from memory, get everything you can from it. The 16-point sample in Pendry Sort extracts inversions AND range AND routing hints from the same 32 values. The merged scan counts inversions AND tracks the dirty window AND determines bail conditions in a single walk. No pass exists that serves only one purpose.

CAS is where that idea started. Not as an optimization. As a way of seeing.

Classification

Variant Best Case Worst Case Memory Lines Rank
CAS Core O(n) O(n²) O(1) 5 #10 of 12
CAS Binary O(n) O(n²)* O(1) 12 #4 of 12 (by per-test avg)
4-Way Parallel O(n) O(n²/p + n) O(n) ~40 Not in unified benchmark

Same worst-case complexity but with O(log k) comparisons per element, much faster constant factor.

What CAS actually is: Not a new algorithm. A design philosophy applied to a known algorithm, producing a version that's maximally simple, maximally readable, and maximally honest about what it does. The 5-line truth is the entire algorithm. There's nothing hidden, nothing clever, nothing that requires a textbook to understand.

What CAS contributed to the series: The fourth axiom. The question "what information does each touch contain?" That question is what separates Pendry Sort from a naive cascade of if-else sorting paths. Every pass in Pendry Sort serves multiple purposes because CAS taught me to ask: what else is this touch already telling me?


Previously: Who Decided What Sorted Is? — Counting sort and the question of comparison.

Next: Where the Thread Ends — Adaptive integer sort, Flash Sort rediscovery, and the full circle.