Inverted Index
The inverted index maps terms to document IDs for full-text search. It uses an LSM architecture with posting compression and Block-Max WAND scoring.
Creating a Search Index
CREATE SEARCH INDEX ON articles FIELDS title, body ANALYZER 'english' FUZZY true;
Architecture
Memtable — Writes accumulate in-memory (HashMap<String, Vec<CompactPosting>>). When the memtable exceeds the threshold (32M posting entries or 100K unique terms), it flushes to an immutable segment.
Segments — Compressed on-disk segments with delta-encoded, bitpacked posting lists. Level-based compaction (8 levels, 8 segments per level).
Query merge — Searches merge the active memtable with all persisted segments.
Posting Compression
- Delta encoding for sorted doc IDs
- Variable-width bitpacking (3-byte header:
[count: u16][bit_width: u8]) - SIMD-accelerated unpack (SSE2 on x86_64, NEON on AArch64)
- SmallFloat fieldnorms (1 byte per document, 4x space reduction)
Block-Max WAND
Posting lists are split into 128-document blocks with precomputed block_max_tf and block_min_fieldnorm. During scoring, blocks that can't beat the current top-k threshold are skipped entirely.