Multi-Raft Consensus
NodeDB uses Multi-Raft — each vShard is its own independent Raft group with its own leader, log, and snapshot schedule. This avoids the bottleneck of a single Raft group for the entire cluster.
Per-vShard Raft
Each Raft group handles:
- Leader election — automatic failover when the current leader becomes unreachable
- Log replication — WAL entries replicated to followers before acknowledgement
- Snapshots — periodic state snapshots to truncate the Raft log
Write Path (Replicated)
- Client sends write to the vShard leader
- Leader appends to local WAL
- Leader replicates to Raft followers
- Quorum acknowledges (majority of replicas)
- Leader commits and responds to client
Writes are linearizable within each Raft group.
Raft group kinds
NodeDB runs three kinds of Raft groups simultaneously:
| Kind | Purpose | Count |
| Data | One per vShard — replicates WAL entries for that shard's data | One per vShard |
| Meta | Cluster membership, catalog, schema | One per cluster |
| Sequencer | Cross-shard transaction ordering (Calvin epoch log) | One per cluster |
Each kind has independent leader election. A sequencer leader failure does not affect data-group leaders, and vice versa.
Sequencer Raft group
The sequencer group exists solely to produce a globally-ordered log of cross-shard transaction batches (epochs). It has its own dedicated group ID outside the data-group range so it can never accidentally alias a vShard. See Cross-Shard Transactions for how the sequencer group interacts with the scheduler and executor.
Single-shard writes never touch the sequencer group — they go directly through the relevant data-group's Raft.
Advantages of Multi-Raft
- Independent leaders — different vShards can have leaders on different nodes, distributing write load
- Parallel commits — vShards commit independently, no global ordering bottleneck
- Granular failover — a node failure only triggers leader election for the vShards it led, not the entire cluster
- Failure isolation — sequencer leader election is independent of data and meta group elections