CSR Index

CSR (Compressed Sparse Row) is the graph engine's core index format. It stores adjacency data in contiguous arrays for cache-resident traversal.

Layout

CsrIndex (per tenant):
  out_offsets:  Vec<u32>                 [num_nodes + 1]  — offset into target array per node
  out_targets:  DenseArray<u32>          [num_edges]      — destination node IDs (contiguous, mmap-capable)
  out_labels:   DenseArray<u32>          [num_edges]      — edge labels (parallel array)
  out_weights:  Option<DenseArray<f64>>                   — optional, allocated only when weighted

  in_offsets / in_targets / in_labels / in_weights — symmetric for inbound

Tenant Partitioning

The in-memory index is ShardedCsrIndex — one CsrIndex per tenant. Algorithms and traversals receive a single tenant's partition; there is no lexical tenant prefix on node names. Each CsrIndex is assigned a unique partition tag at construction, and public APIs that return dense node indices hand out LocalNodeId { id, partition_tag }. Using a node id from one partition with another partition's API panics at the boundary.

Memory Efficiency

At 1 billion edges, CSR uses ~10 GB vs ~60 GB for naive adjacency lists (6x improvement). Node IDs are interned as u32, labels as u32.

Storage

Edges are persisted in a redb B-Tree with forward and reverse indexes, both keyed by (tenant_id: u32, "src\x00label\x00dst") tuples. Tenant isolation is structural (first-class key component), not lexical. The CSR index is built at query time for bulk operations. Writes go to a mutable buffer and become visible immediately. Compaction merges the buffer into dense CSR arrays when the buffer exceeds 10% of the dense size.

Graph Operations

The CSR index supports all traversal and algorithm operations:

  • BFS/DFS traversal
  • Shortest path (Dijkstra)
  • All 13 native graph algorithms
  • MATCH pattern matching
  • GraphRAG fusion
View page sourceLast updated on Apr 24, 2026 by Farhan Syah