Lateral mesh pipeline explorer

HalfHuman^Draft

44 posts 13 dashboards

Lateral mesh pipeline explorer

Interactive companion to the architecture white paper

Chains

Stages

Alive

Failed

Instruction

Diversion

Queued

Shunt

Re-inject

Node status

Alive

256

Failed

Operational

100%

Status

Instructions

Done

Dropped

In flight

Total

Pipeline performance

Throughput

Avg latency

cycles

Peak latency

worst case

Overhead

Contention

Diverts

Queued

Queue %

Divert %

Event log

Click any node to toggle

Kill count 3

Shunt range ±1

Neighbors only. Shortest signal path, lowest overhead. Most faithful to the paper. Fails when 2 adjacent chains have dead nodes at the same stage.

1. The fundamental trade

Traditional chips treat a defective node as a die-killing failure. This architecture treats it as a latency event. The instruction diverts laterally to a neighbor, borrows that neighbor's equivalent pipeline stage, then returns diagonally to the origin chain at the next stage down.

2. Tagged addressing

Every instruction carries an origin address prefix identifying its home chain. When diverted, the mesh reads this tag to re-inject after the borrowed stage completes.

3. Mutual exclusion and queuing

Each node processes one instruction at a time. When a borrowed instruction occupies a neighbor, that neighbor's own next instruction queues. Contention resolved by waiting, not arbitration hardware.

4. Verification code reuse

Non-adjacent chains share verification sequences. Code space scales with interaction radius, not chain count.

5. The economic thesis

Trade silicon purity cost for mesh routing overhead. If per-node routing is cheaper than purity investment, the architecture wins.

6. Failure cascade boundary

Graceful degradation under sparse defects. Critical: adjacent chains failing at the same stage. At ±1 shunt, two adjacent failures kill routing. At ±4, you need 9.

7. Shunt range tradeoff

±1 = neighbors only, shortest signal, most realistic. ±4+ = extended reach, more tolerant, longer traces, worse signal integrity, more routing and address complexity.

Architecture

Similarity

Key difference

TMR

Fault tolerance for sequential logic

TMR adds 3x hardware; this uses existing neighbors

Network-on-Chip

Packet routing with address tags

Operates within pipeline stages, not between clusters

Neuromorphic

Fault-tolerant mesh compute

Targets conventional sequential silicon

Cerebras

On-die routing around dead zones

Borrows and returns at node level, not cluster

Where this sits

One abstraction layer deeper than NoC. NoC routes packets between compute clusters. This routes instructions between pipeline stages within a single execution unit.

Crossover economics model

Explore where mesh routing overhead becomes cheaper than purity investment.

Defect rate5%

Node count256

Purity cost/node$8

Routing overhead$3

Latency tolerance10

Purity cost

$2048

Mesh cost

$768

Savings

63%