HalfHumanDraft Subscribe
44 posts 13 dashboards

Lateral mesh pipeline explorer

Interactive companion to the architecture white paper

Chains
Stages
Alive
Failed
Instruction
Diversion
Queued
Shunt
Re-inject
Node status
Alive
256
Failed
0
Operational
100%
Status
OK
Instructions
Done
0
Dropped
0
In flight
0
Total
0
Pipeline performance
Throughput
--
Avg latency
--
cycles
Peak latency
--
worst case
Overhead
--
Contention
Diverts
0
Queued
0
Queue %
--
Divert %
--
Event log
Click any node to toggle
3
±1
Neighbors only. Shortest signal path, lowest overhead. Most faithful to the paper. Fails when 2 adjacent chains have dead nodes at the same stage.

1. The fundamental trade

Traditional chips treat a defective node as a die-killing failure. This architecture treats it as a latency event. The instruction diverts laterally to a neighbor, borrows that neighbor's equivalent pipeline stage, then returns diagonally to the origin chain at the next stage down.

2. Tagged addressing

Every instruction carries an origin address prefix identifying its home chain. When diverted, the mesh reads this tag to re-inject after the borrowed stage completes.

3. Mutual exclusion and queuing

Each node processes one instruction at a time. When a borrowed instruction occupies a neighbor, that neighbor's own next instruction queues. Contention resolved by waiting, not arbitration hardware.

4. Verification code reuse

Non-adjacent chains share verification sequences. Code space scales with interaction radius, not chain count.

5. The economic thesis

Trade silicon purity cost for mesh routing overhead. If per-node routing is cheaper than purity investment, the architecture wins.

6. Failure cascade boundary

Graceful degradation under sparse defects. Critical: adjacent chains failing at the same stage. At ±1 shunt, two adjacent failures kill routing. At ±4, you need 9.

7. Shunt range tradeoff

±1 = neighbors only, shortest signal, most realistic. ±4+ = extended reach, more tolerant, longer traces, worse signal integrity, more routing and address complexity.

Architecture
Similarity
Key difference
TMR
Fault tolerance for sequential logic
TMR adds 3x hardware; this uses existing neighbors
Network-on-Chip
Packet routing with address tags
Operates within pipeline stages, not between clusters
Neuromorphic
Fault-tolerant mesh compute
Targets conventional sequential silicon
Cerebras
On-die routing around dead zones
Borrows and returns at node level, not cluster

Where this sits

One abstraction layer deeper than NoC. NoC routes packets between compute clusters. This routes instructions between pipeline stages within a single execution unit.

Crossover economics model

Explore where mesh routing overhead becomes cheaper than purity investment.

5%
256
$8
$3
10
Purity cost
$2048
Mesh cost
$768
Savings
63%