What is NodeDB
NodeDB is a single Rust binary that provides eight peer engines — Document (schemaless), Document (strict), Key-Value, Columnar, Timeseries, Spatial, Vector, and Array — sharing one storage core, plus Graph and Full-Text Search as cross-engine overlays on any collection. Seven of the eight are selected per-collection via WITH (engine='<name>'); Array uses its own CREATE ARRAY DDL family. Each engine is built with purpose-specific data structures. All share the same storage, memory, and query planner. Cross-engine queries execute in one process with zero network hops.
The Problem
Modern applications don't fit in one database. A healthcare app needs patient records (relational), medical imaging embeddings (vector), care team relationships (graph), device telemetry (timeseries), and offline-first sync for field workers (CRDT). The industry answer is a polyglot stack: PostgreSQL for relational, Qdrant for vectors, Neo4j for graphs, ClickHouse for timeseries, Redis for caching. Each system has its own protocol, deployment, and failure modes. Cross-database queries require application-level joins.
Some databases claim to solve this by bolting on capabilities — "graph" as recursive JOINs, "timeseries" without columnar compression or continuous aggregation. The features exist in name but not in performance.
Eight Engines
Vector — HNSW index with SQ8, PQ, and IVF-PQ quantization. SIMD-accelerated distance math. Adaptive bitmap pre-filtering.
Graph — CSR adjacency index with 13 native algorithms (PageRank, WCC, Louvain, SSSP, etc.) and a Cypher-subset MATCH pattern engine. GraphRAG fusion with vector search.
Document — Two modes per collection. Schemaless: MessagePack blobs with CRDT sync. Strict: Binary Tuples with O(1) field extraction and 3-4x cache density over BSON.
Columnar / Timeseries / Spatial — Three peer engines sharing one compressed-column storage core (ALP, FastLanes, FSST, Gorilla, LZ4 codecs; block statistics; predicate pushdown). columnar for general analytics; timeseries adds append-only ingest, retention, and continuous aggregation; spatial adds R*-tree, geohash, and H3 indexing.
Key-Value — Hash-indexed O(1) point lookups with typed value fields. Native TTL, secondary indexes, atomic INCR/CAS, and predicate-filtered scans. SQL-queryable and joinable.
Full-Text Search — Block-Max WAND optimized BM25 with 16 Snowball stemmers, 27-language stop words, CJK bigram tokenization, posting compression, fuzzy matching, and native hybrid vector fusion.
Array — ND sparse multi-dimensional engine with tile-based storage, Z-order indexing, per-tile MBR statistics, and bitemporal cells. Replaces TileDB / Zarr / SciDB / Rasdaman for genomics, single-cell biology, raster cubes, and climate models.
CRDT — Loro-backed conflict-free replicated data types. AP on the edge, CP in the cloud. SQL constraint validation at sync time with compensation hints.
Three Deployment Modes
Origin (server) — Full distributed database. Multi-Raft consensus, Thread-per-Core Data Plane with io_uring, PostgreSQL-compatible SQL over pgwire. Horizontal scaling with automatic shard rebalancing.
Origin (local) — Same binary, single-node. No cluster overhead. Like running PostgreSQL locally.
NodeDB-Lite (embedded) — In-process library for phones, browsers (WASM), and desktops. All eight engines run locally with sub-millisecond reads. CRDT sync to Origin via WebSocket.
PostgreSQL Compatible
Connect with psql, any PostgreSQL driver, or ORM. Six wire protocols are available:
- pgwire — PostgreSQL wire protocol (port 6432)
- HTTP — REST, SSE, WebSocket (port 6480)
- NDB — Native MessagePack protocol (port 6433)
- RESP — Redis-compatible KV protocol (optional)
- ILP — InfluxDB Line Protocol for timeseries ingest (optional)
- Sync — WebSocket sync for NodeDB-Lite clients (port 9090)
What NodeDB Replaces
The combination of PostgreSQL + pgvector + Redis + Neo4j + ClickHouse + Elasticsearch — unified into one binary with shared storage and zero network hops between engines.