--- # AI & LLM Applications (use-cases/ai-llm) --- title: AI & LLM Applications description: Build RAG pipelines, agent memory stores, and hybrid semantic search on NodeDB — vector, graph, full-text, and document engines in one query. --- Modern AI applications need at least three databases to function: a vector store for embeddings, a graph database for entity relationships, and a search index for keyword recall. NodeDB collapses all three into one, with a fourth engine — schemaless Document — for storing raw LLM outputs and conversation history. ## Engines used | Engine | Role | |---|---| | **Vector** | HNSW similarity search, filtered ANN, multivec MaxSim | | **Full-Text Search** | BM25 keyword recall, 27-language analyzers | | **Graph** | Entity relationships, GraphRAG context expansion | | **Document (schemaless)** | Conversation history, LLM outputs, agent state | ## Hybrid RAG retrieval A basic RAG retrieval combines vector similarity with BM25 keyword recall, then fuses the two ranked lists with Reciprocal Rank Fusion. `rrf_score(...)` does the fusion inside the planner — one statement, no application-side merge loop. ```sql -- Hybrid search: vector ANN + BM25, fused with RRF SELECT id, content, metadata, rrf_score( vector_distance(embedding, $query_vec), bm25_score(content, $query_text) ) AS score FROM documents WHERE metadata ->> 'source' = 'internal' ORDER BY score DESC LIMIT 10; ``` ## GraphRAG: expand context through entity relationships Entities and their relationships are graph edges layered on the same `documents` collection — graph is an overlay, not a separate store. After vector search finds seed chunks, the fusion DSL walks those edges and merges the rankings: ```sql -- Wire entities as edges on the documents collection GRAPH INSERT EDGE IN 'documents' FROM $chunk_id TO $entity_id TYPE 'mentions'; GRAPH INSERT EDGE IN 'documents' FROM $entity_a TO $entity_b TYPE 'related_to' PROPERTIES { weight: 0.8 }; -- Vector search seeds → BFS along 'related_to' → RRF merge of vector rank + hop distance GRAPH RAG FUSION ON documents QUERY $query_vec VECTOR_FIELD 'embedding' VECTOR_TOP_K 50 EXPANSION_DEPTH 2 EDGE_LABEL 'related_to' FINAL_TOP_K 10 RRF_K (60.0, 35.0); ``` Need keyword relevance in the same pass? Add the `BM25` leg and the `RRF_K` tuple becomes a triple — vector, graph expansion, and BM25 all fused in one plan: ```sql GRAPH RAG FUSION ON documents QUERY $query_vec VECTOR_FIELD 'embedding' VECTOR_TOP_K 50 BM25 $query_text ON 'content' EXPANSION_DEPTH 2 EDGE_LABEL 'related_to' FINAL_TOP_K 10 RRF_K (60.0, 35.0, 50.0); ``` You can also walk the entity graph directly with a Cypher-subset `MATCH`: ```sql MATCH (c:Chunk)-[:mentions]->(e:Entity)-[:related_to*1..2]->(related:Entity) WHERE c.id = $chunk_id RETURN DISTINCT related.id LIMIT 20; ``` ## Agent memory store Long-running agents need persistent, queryable memory. A schemaless Document collection stores arbitrary turn payloads; a vector index makes semantic recall fast. ```sql CREATE COLLECTION agent_memory; CREATE VECTOR INDEX idx_mem ON agent_memory METRIC cosine DIM 1536; -- Store a turn (object-literal insert on a schemaless collection) INSERT INTO agent_memory { session_id: $sid, turn: $n, role: $role, content: $text, embedding: $vec, created_at: now() }; -- Recall: recent turns from this session + semantically similar turns from any session (SELECT content, role, created_at FROM agent_memory WHERE session_id = $sid ORDER BY turn DESC LIMIT 10) UNION ALL (SELECT content, role, created_at FROM agent_memory WHERE session_id <> $sid ORDER BY embedding <=> $query_vec LIMIT 5) ORDER BY created_at; ``` ## Filtered vector search Production vector workloads almost always filter by metadata before ranking. NodeDB's NaviX adaptive-local traversal keeps ANN recall high even with tight filters — the planner builds a roaring bitmap of matching IDs and picks pre-filter, post-filter, or brute-force based on selectivity. ```sql -- Similar products, restricted to in-stock items in the user's region SELECT id, name, price, embedding <=> $query_vec AS distance FROM products WHERE in_stock = true AND region = $region AND category_id = ANY($category_ids) ORDER BY embedding <=> $query_vec LIMIT 20; ``` Store full-precision embeddings and rank on the first N dimensions for a fast first pass via the `query_dim` tuning argument — then re-rank the top candidates at full precision. No reindexing: ```sql SELECT id, vector_distance(embedding, $query_vec, query_dim => 256) AS distance FROM products WHERE in_stock = true ORDER BY distance LIMIT 100; ``` ## Why not a dedicated vector database? Dedicated vector databases force you to maintain a separate store for every non-vector data shape. When your RAG pipeline needs keyword fallback, entity relationships, and conversation history, you end up with four systems, four consistency boundaries, and cross-system joins that travel over the network. NodeDB keeps everything in one process with a single shared snapshot. A hybrid search + graph expansion + memory recall query is one SQL statement with zero inter-process calls. --- # Fintech & Trading (use-cases/fintech-trading) --- title: Fintech & Trading description: Tick ingestion, OHLCV aggregates, bitemporal audit ledgers, and risk models — unified in NodeDB without Kafka fan-out or separate OLAP clusters. --- Financial systems typically spread across a time-series database for tick data, a columnar store for analytics, a strict relational database for ledger records, and a key-value cache for positions. NodeDB unifies all four workloads, adds bitemporal support for regulatory audit trails, and handles the ingestion throughput fintech demands. ## Engines used | Engine | Role | |---|---| | **Timeseries** | Tick ingestion, OHLCV, continuous aggregates, retention | | **Columnar** | Risk models, portfolio analytics, aggregation-heavy OLAP | | **Key-Value** | Current positions, order-book snapshots, hot-path cache | | **Document (strict)** | Immutable transaction records, bitemporal audit ledger | ## Tick ingestion and OHLCV aggregates The Timeseries engine is append-only with per-column compression, a `TIME_KEY` column driving partition-by-time and block skip, and built-in retention. ```sql CREATE COLLECTION ticks ( ts TIMESTAMP TIME_KEY, symbol VARCHAR, price FLOAT, volume BIGINT ) WITH (engine='timeseries', partition_by='1d', retention='90d'); -- Ingest ticks (batch insert for throughput) INSERT INTO ticks (ts, symbol, price, volume) VALUES ('2026-01-15 09:30:00.123', 'AAPL', 185.42, 1200), ('2026-01-15 09:30:00.456', 'AAPL', 185.45, 800); -- 1-minute OHLCV bars SELECT DISTINCT time_bucket('1m', ts) AS bucket, symbol, first_value(price) OVER w_ord AS open, max(price) OVER w AS high, min(price) OVER w AS low, last_value(price) OVER w_ord AS close, sum(volume) OVER w AS volume FROM ticks WHERE symbol = $symbol AND ts >= now() - INTERVAL '1 hour' WINDOW w AS (PARTITION BY time_bucket('1m', ts), symbol), w_ord AS (w ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ORDER BY bucket DESC; ``` ## Bitemporal audit ledger Regulators demand both when a transaction occurred (valid time) and when it was recorded in the system (system time). NodeDB tracks system time *in the key* (derived from the WAL LSN at Raft commit) and valid time *in the value* — no application-side audit table. ```sql -- Bitemporal transaction ledger on the strict Document engine CREATE COLLECTION transactions ( id UUID DEFAULT gen_uuid_v7(), account_id UUID, symbol VARCHAR, quantity DECIMAL, price DECIMAL, side VARCHAR, executed_at TIMESTAMP ) WITH (engine='document_strict', bitemporal=true); -- Record a trade with explicit valid time INSERT INTO transactions (account_id, symbol, quantity, price, side, executed_at, valid_time) VALUES ($account_id, 'MSFT', 100, 374.50, 'BUY', '2026-01-15 10:00:00', '2026-01-15 10:00:00'); -- Regulatory query: account state as the database recorded it on a past day, -- valid as of noon that day. Times are milliseconds since the Unix epoch. SELECT symbol, quantity, price, side, executed_at FROM transactions AS OF SYSTEM TIME 1768521599000 -- 2026-01-15 23:59:59 UTC AS OF VALID TIME 1768478400000 -- 2026-01-15 12:00:00 UTC WHERE account_id = $account_id ORDER BY executed_at; -- Full lineage of a single transaction in system-time order SELECT id, side, quantity, price, system_time FROM transactions AS OF SYSTEM TIME NULL -- special: returns every version WHERE id = $txn_id ORDER BY system_time ASC; ``` ## Current positions — Key-Value hot path The KV engine provides O(1) point lookups for current position state, updated on every fill. ```sql CREATE COLLECTION positions (key TEXT PRIMARY KEY) WITH (engine='kv'); -- Upsert the position record on a fill UPSERT INTO positions { key: $account_symbol, symbol: $symbol, quantity: $qty, cost_basis: $cost_basis, updated_at: now() }; -- Atomic running-quantity adjustment without read-modify-write SELECT KV_INCR_FLOAT('positions', $account_symbol_qty_key, $delta_qty); -- Single-millisecond position lookup SELECT * FROM positions WHERE key = $account_symbol; ``` ## Portfolio risk analytics — Columnar engine Risk calculations scan every position in a portfolio, applying factor models across millions of rows. The Columnar engine with predicate pushdown and per-column compression handles this without a separate OLAP cluster. ```sql CREATE COLLECTION portfolio_risk ( account_id UUID, as_of_date TIMESTAMP TIME_KEY, market_value DECIMAL, beta FLOAT, daily_pnl DECIMAL, position_status VARCHAR ) WITH (engine='columnar'); -- Value-at-Risk across all active positions SELECT account_id, sum(market_value) AS total_exposure, sum(market_value * beta) AS beta_adjusted, approx_percentile(daily_pnl, 0.05) AS var_95 FROM portfolio_risk WHERE as_of_date >= now() - INTERVAL '1 day' AND position_status = 'active' GROUP BY account_id HAVING sum(market_value) > 100000; ``` ## Cross-engine P&L attribution One query joins the timeseries tick data with the KV position record — zero network hops between engines, one shared snapshot. ```sql -- Intraday unrealised P&L: latest tick price vs. cost basis SELECT p.value ->> 'symbol' AS symbol, (p.value ->> 'quantity')::numeric AS quantity, (p.value ->> 'cost_basis')::numeric AS cost_basis, t.last_price, ((t.last_price - (p.value ->> 'cost_basis')::numeric) * (p.value ->> 'quantity')::numeric) AS unrealised_pnl FROM positions p CROSS JOIN LATERAL ( SELECT price AS last_price FROM ticks WHERE symbol = p.value ->> 'symbol' ORDER BY ts DESC LIMIT 1 ) t; ``` Define a continuous aggregate once and NodeDB refreshes it incrementally as new ticks arrive — no cron job, no separate aggregation pipeline: ```sql CREATE CONTINUOUS AGGREGATE ohlcv_1m ON ticks AS SELECT time_bucket('1m', ts) AS bucket, symbol, min(price) AS low, max(price) AS high, sum(volume) AS volume FROM ticks GROUP BY bucket, symbol WITH (refresh_interval = '5s'); ``` ## Why not a dedicated time-series database? Purpose-built time-series databases force all non-tick data — ledger records, positions, risk factors — into external systems. Cross-system queries (tick price × position quantity × risk factor) travel over the network and require application-side joins. NodeDB keeps every engine co-located: cross-engine queries reduce to in-process roaring-bitmap intersections over a shared surrogate identity space. --- # Healthcare (use-cases/healthcare) --- title: Healthcare description: Schema-enforced patient records, bitemporal clinical timelines, and row-level security for HIPAA-grade data governance — all in NodeDB. --- Healthcare data has two hard requirements that pull in opposite directions: strict schema enforcement for regulatory compliance, and flexible temporal queries for clinical timelines that capture both when care was delivered and when it was recorded. NodeDB satisfies both with its strict Document engine and native bitemporal support, while RBAC and row-level security enforce access control at the storage layer rather than in application middleware. ## Engines used | Engine | Role | |---|---| | **Document (strict)** | Patient records, encounter notes, medication orders — schema-enforced Binary Tuples | | **Timeseries** | Vital signs, lab results, continuous monitoring streams | | **Full-Text Search** | Clinical note search, ICD/SNOMED code lookup | | **Columnar** | Population health analytics, cohort queries, reporting | ## Schema-enforced patient records The strict Document engine (`engine='document_strict'`) validates every write against the declared columns and types — bad data is rejected at the storage layer, not caught later in application code. `bitemporal=true` adds system-time and valid-time tracking. ```sql CREATE COLLECTION patients ( id UUID DEFAULT gen_uuid_v7(), mrn VARCHAR, dob DATE, given_name VARCHAR, family_name VARCHAR, allergies JSON, org_id UUID, created_at TIMESTAMP DEFAULT now() ) WITH (engine='document_strict', bitemporal=true); -- NodeDB rejects table-level constraints; use a unique index for MRN uniqueness CREATE UNIQUE INDEX ON patients(mrn); INSERT INTO patients (mrn, dob, given_name, family_name, allergies, org_id) VALUES ('MRN-00001234', '1982-07-14', 'Jane', 'Doe', '["penicillin"]', $org_id); ``` ## Bitemporal clinical timeline Clinical decisions depend on what was known at the time of treatment, not just what is currently recorded. System time is stored in the key (derived from the WAL LSN at Raft commit); valid time is client-assigned and stored in the value. Corrections add a new version — old versions remain queryable. ```sql CREATE COLLECTION diagnoses ( id UUID DEFAULT gen_uuid_v7(), patient_id UUID, encounter_id UUID, icd10_code VARCHAR, description VARCHAR, is_primary BOOLEAN ) WITH (engine='document_strict', bitemporal=true); -- Initial diagnosis, valid as of when the condition was present INSERT INTO diagnoses (patient_id, encounter_id, icd10_code, description, is_primary, valid_time) VALUES ($patient_id, $enc_id, 'J18.9', 'Pneumonia, unspecified', true, '2026-01-10'); -- Later correction: it was actually J18.1. New version, same patient. INSERT INTO diagnoses (patient_id, encounter_id, icd10_code, description, is_primary, valid_time) VALUES ($patient_id, $enc_id, 'J18.1', 'Lobar pneumonia, unspecified', true, '2026-01-10'); -- Audit query: what the database held on Jan 11, valid as of Jan 12. -- AS OF times are milliseconds since the Unix epoch. SELECT icd10_code, description FROM diagnoses AS OF SYSTEM TIME 1768175999000 -- 2026-01-11 23:59:59 UTC AS OF VALID TIME 1768176000000 -- 2026-01-12 00:00:00 UTC WHERE patient_id = $patient_id; -- Full version history of a patient's diagnoses, in system-time order SELECT icd10_code, description, system_time FROM diagnoses AS OF SYSTEM TIME NULL -- special: returns every version WHERE patient_id = $patient_id ORDER BY system_time ASC; ``` ## Vital signs — Timeseries engine Continuous monitoring produces millions of readings per patient per day. The Timeseries engine's append-only design and per-column compression handle this efficiently, with per-collection retention. ```sql CREATE COLLECTION vitals ( recorded_at TIMESTAMP TIME_KEY, patient_id UUID, metric VARCHAR, value FLOAT, unit VARCHAR ) WITH (engine='timeseries', partition_by='1d', retention='3650d'); INSERT INTO vitals (recorded_at, patient_id, metric, value, unit) VALUES (now(), $patient_id, 'heart_rate', 72, 'bpm'), (now(), $patient_id, 'spo2', 98.5, '%'), (now(), $patient_id, 'systolic_bp', 118, 'mmHg'); -- 5-minute rolling averages for a deterioration alert SELECT time_bucket('5m', recorded_at) AS bucket, avg(CASE WHEN metric = 'heart_rate' THEN value END) AS avg_hr, avg(CASE WHEN metric = 'spo2' THEN value END) AS avg_spo2 FROM vitals WHERE patient_id = $patient_id AND recorded_at >= now() - INTERVAL '30 minutes' GROUP BY bucket ORDER BY bucket; ``` ## Row-level security RLS policies filter rows transparently on every query — no application code can bypass them. Policies are injected at plan time, before engine dispatch, so they apply to vector, graph, FTS, and KV access too. ```sql -- Records are visible only within the patient's organization CREATE RLS POLICY patient_org ON patients FOR READ USING (org_id = $auth.org_id); -- Writes are limited to the same organization CREATE RLS POLICY patient_org_write ON patients FOR WRITE USING (org_id = $auth.org_id); -- Restrictive policy: only clinical roles may read patient records at all CREATE RLS POLICY clinical_roles_only ON patients FOR READ USING ($auth.role = 'physician' OR $auth.role = 'nurse' OR $auth.role = 'pharmacist' OR $auth.role = 'admin') RESTRICTIVE; ``` Substitution is fail-closed: if the session's auth context lacks a referenced variable, the query rejects the row rather than allowing wide-open access. ## Population health cohort query — Columnar engine Aggregate queries across millions of patients use the Columnar engine with predicate pushdown and per-column compression. ```sql CREATE COLLECTION encounters ( id UUID DEFAULT gen_uuid_v7(), patient_id UUID, admit_date TIMESTAMP, discharge_date TIMESTAMP TIME_KEY ) WITH (engine='columnar'); -- 30-day readmission rate by primary-diagnosis chapter SELECT SUBSTR(d.icd10_code, 1, 3) AS icd_chapter, count(DISTINCT e.patient_id) AS discharges, count(DISTINCT r.patient_id) AS readmissions, round( count(DISTINCT r.patient_id)::numeric / nullif(count(DISTINCT e.patient_id), 0) * 100, 1 ) AS readmit_rate_pct FROM encounters e JOIN diagnoses d ON d.encounter_id = e.id AND d.is_primary = true LEFT JOIN encounters r ON r.patient_id = e.patient_id AND r.admit_date BETWEEN e.discharge_date AND e.discharge_date + INTERVAL '30 days' AND r.id <> e.id WHERE e.discharge_date >= now() - INTERVAL '90 days' GROUP BY icd_chapter ORDER BY readmit_rate_pct DESC; ``` ## Clinical note search — Full-Text Search overlay FTS runs as an overlay on a Document collection, indexing note content without a separate Elasticsearch cluster. Negation uses the `NOT` keyword (at least one positive term required). ```sql CREATE SEARCH INDEX ON clinical_notes FIELDS note_text ANALYZER 'english'; -- Notes mentioning a symptom cluster, excluding ones flagged resolved SELECT patient_id, note_date, note_text, bm25_score(note_text, 'dyspnea tachycardia NOT resolved') AS score FROM clinical_notes WHERE text_match(note_text, 'dyspnea tachycardia NOT resolved') ORDER BY score DESC LIMIT 25; ``` NodeDB's RBAC, RLS, and TLS form the access-control perimeter. De-identification, minimum-necessary determinations, and BAA obligations are legal and operational responsibilities — consult your compliance team before exposing data to analytics pipelines. --- # Geospatial & Logistics (use-cases/geospatial-logistics) --- title: Geospatial & Logistics description: Real-time fleet tracking, OGC spatial predicates, graph-based routing, and time-stamped position history — all in NodeDB without PostGIS or a separate graph database. --- Logistics platforms need at least three data systems: a spatial database for geographic queries, a time-series store for position history, and a graph database for route planning. NodeDB provides all three in one binary — OGC-compliant spatial predicates over an R*-tree index, a `TIME_KEY`-partitioned timeseries engine, and 13 graph algorithms over a CSR adjacency index — and they share a single snapshot. ## Engines used | Engine | Role | |---|---| | **Spatial** | Fleet positions, zone boundaries, proximity queries, OGC predicates | | **Timeseries** | Position history, speed logs, ETA telemetry | | **Graph** | Road network, depot-to-destination routing, delivery sequencing | | **Key-Value** | Live vehicle state, driver assignments, hot-path lookups | ## Real-time fleet tracking A `GEOMETRY SPATIAL_INDEX` column on a `WITH (engine='spatial')` collection gets an automatic R*-tree. Proximity queries over thousands of vehicles complete in milliseconds. ```sql CREATE COLLECTION vehicles ( vehicle_id VARCHAR PRIMARY KEY, location GEOMETRY SPATIAL_INDEX, heading FLOAT, speed_kmh FLOAT, updated_at TIMESTAMP ) WITH (engine='spatial'); CREATE COLLECTION zones ( zone_id VARCHAR PRIMARY KEY, name VARCHAR, zone_type VARCHAR, boundary GEOMETRY SPATIAL_INDEX, active BOOLEAN ) WITH (engine='spatial'); -- Update a vehicle's live position on each GPS ping INSERT INTO vehicles (vehicle_id, location, heading, speed_kmh, updated_at) VALUES ($vehicle_id, ST_Point($lon, $lat), $heading, $speed, now()) ON CONFLICT (vehicle_id) DO UPDATE SET location = EXCLUDED.location, heading = EXCLUDED.heading, speed_kmh = EXCLUDED.speed_kmh, updated_at = EXCLUDED.updated_at; -- All vehicles within 5 km of a point, recently seen SELECT vehicle_id, ST_Distance(location, ST_Point($lon, $lat)) AS distance_m, speed_kmh, heading FROM vehicles WHERE ST_DWithin(location, ST_Point($lon, $lat), 5000) AND updated_at >= now() - INTERVAL '2 minutes' ORDER BY distance_m; ``` ## Zone containment and geofencing Delivery zones, restricted areas, and service regions are polygons. Containment is a single OGC predicate. ```sql -- Which active zone contains this delivery point? SELECT z.zone_id, z.name, z.zone_type FROM zones z WHERE ST_Contains(z.boundary, ST_Point($delivery_lon, $delivery_lat)) AND z.active = true; -- Deliveries per zone in the last day (spatial join) SELECT z.name, count(d.id) AS deliveries FROM deliveries d JOIN zones z ON ST_Contains(z.boundary, d.delivery_point) WHERE d.delivered_at >= now() - INTERVAL '1 day' GROUP BY z.name ORDER BY deliveries DESC; ``` ## Position history — Timeseries engine Every GPS ping is appended to a timeseries collection for audit, replay, and ETA recalculation — without bloating the live-position spatial index. ```sql CREATE COLLECTION position_history ( ts TIMESTAMP TIME_KEY, vehicle_id VARCHAR, location GEOMETRY SPATIAL_INDEX, speed_kmh FLOAT, odometer_km FLOAT ) WITH (engine='timeseries', partition_by='1d', retention='365d'); INSERT INTO position_history (ts, vehicle_id, location, speed_kmh, odometer_km) VALUES (now(), $vehicle_id, ST_Point($lon, $lat), $speed, $odometer); -- Reconstruct a vehicle's route for a shift SELECT ts, location, speed_kmh FROM position_history WHERE vehicle_id = $vehicle_id AND ts BETWEEN $shift_start AND $shift_end ORDER BY ts; -- Average speed per hour to surface traffic patterns SELECT time_bucket('1h', ts) AS hour, vehicle_id, avg(speed_kmh) AS avg_speed FROM position_history WHERE ts >= now() - INTERVAL '7 days' GROUP BY hour, vehicle_id; ``` ## Graph-based routing The road network is stored as graph edges overlaid on a document collection. NodeDB's `GRAPH PATH` does hop-shortest paths; `GRAPH ALGO SSSP` (Dijkstra) does weighted single-source shortest paths over an edge cost property — no separate graph database. ```sql CREATE COLLECTION road_network; -- holds the nodes; edges are an overlay -- Each road segment is a weighted edge GRAPH INSERT EDGE IN 'road_network' FROM $node_a TO $node_b TYPE 'road' PROPERTIES { distance_m: 120, travel_time_s: 14 }; -- Shortest hop path between two nodes along 'road' edges GRAPH PATH FROM $depot_node TO $dest_node MAX_DEPTH 50 LABEL 'road'; -- Weighted shortest paths from a depot to every reachable node (Dijkstra) GRAPH ALGO SSSP ON road_network FROM $depot_node; ``` ## Delivery sequencing — cross-engine query Combine spatial proximity (which deliveries fall in which zone) with a sequencing pass (visit order within each zone) in one query. ```sql WITH zone_deliveries AS ( SELECT d.id AS delivery_id, d.address_point, z.zone_id, z.depot_point FROM deliveries d JOIN zones z ON ST_Contains(z.boundary, d.address_point) WHERE d.scheduled_date = current_date AND d.status = 'pending' ) SELECT zone_id, row_number() OVER ( PARTITION BY zone_id ORDER BY ST_Distance(address_point, depot_point) ) AS visit_order, delivery_id, address_point FROM zone_deliveries ORDER BY zone_id, visit_order; ``` Spatial and Graph share the same global surrogate identity space, so you can match a road node by geographic proximity (Spatial R*-tree) and immediately run a traversal or `GRAPH ALGO SSSP` from it — no ID translation, no network round-trip between engines. ## Live ETAs via LIVE SELECT Push position changes to rider apps the moment a vehicle moves — no polling. ```sql -- The rider app subscribes to one vehicle's position stream LIVE SELECT vehicle_id, location, speed_kmh FROM vehicles WHERE vehicle_id = $vehicle_id; -- On each GPS ping the subscriber receives the new row, recomputes the ETA -- against the routing graph, and pushes it to the app. ``` --- # Gaming & Social (use-cases/gaming-social) --- title: Gaming & Social description: Social graphs, real-time leaderboards, offline-first mobile with CRDT sync, and live change streams — without Redis, Neo4j, or a separate pub/sub broker. --- Gaming and social platforms stress databases in ways most workloads do not: millions of concurrent writes, graph traversals for friend feeds and matchmaking, sub-millisecond leaderboard reads, and mobile clients that must work offline and sync automatically. NodeDB handles all of these as peer engines and overlays in a single deployment. ## Engines used | Engine | Role | |---|---| | **Graph** | Friend networks, guild membership, match-history traversal | | **Key-Value** | Leaderboards (sorted index), session tokens, inventory hot path | | **Document (schemaless)** | Player profiles, game state, match records with CRDT sync | | **Timeseries** | Match telemetry, event logs, achievement timestamps | | **Full-Text Search** | Player search by username, guild discovery | ## Social graph — friends, guilds, follows Graph edges are an overlay on a document collection — here `players`. The CSR adjacency index handles millions of nodes and edges; friend-of-friend, mutual connections, and guild traversals run as Cypher-subset `MATCH` patterns. ```sql CREATE COLLECTION players; -- Add a friendship (bidirectional → two edges) GRAPH INSERT EDGE IN 'players' FROM $player_a TO $player_b TYPE 'friend'; GRAPH INSERT EDGE IN 'players' FROM $player_b TO $player_a TYPE 'friend'; -- Friends-of-friends: people you might know (2 hops, excluding direct friends) MATCH (me:Player)-[:friend]->(friend:Player)-[:friend]->(candidate:Player) WHERE me.id = $me AND candidate.id <> $me AND NOT EXISTS { MATCH (me)-[:friend]->(candidate) } RETURN DISTINCT candidate.id, candidate.username LIMIT 20; -- Guild members reachable through a player's guilds MATCH (me:Player)-[:member_of]->(guild:Guild)<-[:member_of]-(member:Player) WHERE me.id = $me AND member.id <> $me RETURN member.id, member.username, guild.name LIMIT 50; ``` ## Real-time leaderboard — sorted index A `SORTED INDEX` on the score column gives O(log n) rank lookups and top-k scans without a Redis sidecar. The leaderboard functions (`RANK`, `TOPK`, `SORTED_COUNT`) operate directly on it. ```sql CREATE COLLECTION scores (player_id TEXT PRIMARY KEY) WITH (engine='kv'); CREATE SORTED INDEX lb ON scores (score DESC) KEY player_id; -- Record a player's score after a match, keeping their best INSERT INTO scores (player_id, score) VALUES ($player_id, $new_score) ON CONFLICT (player_id) DO UPDATE SET score = greatest(scores.score, EXCLUDED.score); -- Top 100 global ranks SELECT * FROM TOPK('lb', 100); -- A single player's rank, and the total leaderboard size SELECT RANK('lb', $player_id) AS rank, SORTED_COUNT('lb') AS total; -- Everyone in a score band (e.g. a bracket) SELECT * FROM RANGE('lb', $min_score, $max_score); ``` ## Offline-first mobile — CRDT sync Mobile games must function without a network connection. NodeDB-Lite runs the same engines embedded on iOS and Android; declaring a `conflict_policy` on a collection makes its writes mergeable, and Lite syncs deltas to Origin automatically when connectivity returns. ```sql -- Mergeable player state: concurrent edits to different fields both survive CREATE COLLECTION player_state WITH (conflict_policy = 'field_merge'); -- The device subscribes only to its own slice of the data SUBSCRIBE SHAPE ON player_state WHERE player_id = $me; -- Offline: the player collects items, finishes quests — ordinary local writes UPSERT INTO player_state { player_id: $me, inventory: $inventory, quests: $quests, updated_at: now() }; -- When the network returns, NodeDB-Lite ships the CRDT deltas to Origin. The -- collection's conflict_policy decides the outcome: 'field_merge' merges -- concurrent edits to different fields; same-field clashes fall back to LWW. ``` `lww` (last-writer-wins) is the default; `field_merge` keeps concurrent edits to different fields. Set it at creation or with `ALTER COLLECTION player_state SET conflict_policy = 'field_merge'`. See the [Conflict Policies docs](/docs/crdt-sync/conflict-policies). ## Live match updates — LIVE SELECT Push score updates and match events to spectators and dashboards in real time without a separate pub/sub broker. `LIVE SELECT` registers a filtered query and streams matching rows over pgwire, WebSocket, or the native protocol. ```sql -- A spectator subscribes to one match's event feed LIVE SELECT * FROM match_events WHERE match_id = $match_id; -- The match engine inserts events as they occur: INSERT INTO match_events (match_id, event_type, player_id, data, ts) VALUES ($match_id, 'kill', $player_id, $event_data, now()); -- Every subscriber receives the row immediately. No polling, no WebSocket -- server. Cancel with: CANCEL LIVE SELECT ; ``` ## Match history and telemetry — Timeseries engine Every match generates thousands of telemetry events. The Timeseries engine ingests them at high throughput and makes them queryable for replay, anti-cheat, and balance analytics. ```sql CREATE COLLECTION match_telemetry ( ts TIMESTAMP TIME_KEY, match_id UUID, player_id UUID, event VARCHAR, data JSON ) WITH (engine='timeseries', partition_by='1d', retention='90d'); INSERT INTO match_telemetry (ts, match_id, player_id, event, data) VALUES (now(), $match_id, $player_id, 'damage_dealt', '{"amount": 342, "weapon": "rifle"}'); -- Average damage per weapon in a match — for balance analysis SELECT data ->> 'weapon' AS weapon, avg((data ->> 'amount')::int) AS avg_damage, count(*) AS events FROM match_telemetry WHERE match_id = $match_id AND event = 'damage_dealt' GROUP BY weapon ORDER BY avg_damage DESC; ``` ## Player search — Full-Text Search overlay Username and guild discovery uses the FTS overlay on the `players` collection — no Elasticsearch cluster. ```sql CREATE SEARCH INDEX ON players FIELDS username, bio ANALYZER 'simple' FUZZY true; SELECT id, username, bio, bm25_score(username, $term) AS score FROM players WHERE text_match(username, $term, { fuzzy: true, distance: 1 }) ORDER BY score DESC LIMIT 10; ``` --- # IoT & Edge (use-cases/iot-edge) --- title: IoT & Edge description: High-throughput sensor ingestion, ND array tiles for batch telemetry, retention policies, and CRDT edge-to-cloud sync via NodeDB-Lite — one stack from device to cloud. --- IoT stacks typically span a time-series database for sensor data, an object store or array database for bulk telemetry, and a synchronisation layer for edge devices. NodeDB covers all three: the Timeseries engine handles streaming ingestion, the Array engine stores multi-dimensional batch telemetry (replacing InfluxDB + TileDB separately), and NodeDB-Lite runs the same engines embedded on edge hardware, syncing deltas to Origin over CRDT when connectivity allows. ## Engines used | Engine | Role | |---|---| | **Timeseries** | Streaming sensor ingestion, continuous aggregates, retention | | **Array** | ND telemetry tiles (spectrometry, radar, camera frames), batch analytics | | **Document (schemaless)** | Device registry, configuration, alert rules with CRDT sync | | **Key-Value** | Last-known device state, command queues | ## Streaming sensor ingestion — Timeseries engine The Timeseries engine is append-only, with a `TIME_KEY` column driving partition-by-time and block skip, plus per-collection retention. ```sql CREATE COLLECTION sensor_readings ( ts TIMESTAMP TIME_KEY, device_id VARCHAR, metric VARCHAR, value FLOAT, unit VARCHAR ) WITH (engine='timeseries', partition_by='1d', retention='180d'); -- Bulk import (NDJSON / JSON array / CSV auto-detected) COPY sensor_readings FROM '/var/spool/readings.ndjson'; -- Or INSERT for low-volume devices INSERT INTO sensor_readings (ts, device_id, metric, value, unit) VALUES (now(), 'device-001', 'temperature', 23.4, 'C'), (now(), 'device-001', 'humidity', 61.2, '%'), (now(), 'device-001', 'co2_ppm', 412, 'ppm'); -- 15-minute summaries SELECT time_bucket('15m', ts) AS bucket, device_id, avg(CASE WHEN metric = 'temperature' THEN value END) AS avg_temp, max(CASE WHEN metric = 'co2_ppm' THEN value END) AS max_co2 FROM sensor_readings WHERE ts >= now() - INTERVAL '24 hours' GROUP BY bucket, device_id ORDER BY bucket DESC; ``` For line-protocol producers (Telegraf, Vector), enable the ILP listener (`ports.ilp = 8086`) and push directly: ```bash echo "env,device=device-001 temperature=23.4,humidity=61.2 1735689600000000000" | nc localhost 8086 ``` ## Multi-dimensional telemetry — Array engine Instruments like spectrometers, LiDAR scanners, and thermal cameras produce ND arrays of readings. The Array engine stores them in compressed Z-order tiles with per-tile ND-MBR statistics — replacing a separate TileDB or Zarr deployment. It uses its own DDL family (`CREATE ARRAY`), not `CREATE COLLECTION`. ```sql -- A 3D spectrometry array: (device, wavelength, time). Dimensions are -- integer-typed with half-open domains [lo, hi). CREATE ARRAY spectral_readings DIMS ( device_id INT64 DOMAIN [0, 100000), wavelength_nm INT32 DOMAIN [300, 1100), ts_epoch INT64 DOMAIN [0, 9223372036854775807) ) ATTRS (intensity FLOAT32, noise_floor FLOAT32) TILE_EXTENTS (1, 128, 3600) WITH (cell_order = 'Z-ORDER', audit_retain_ms = 7776000000); -- 90 days -- Insert cells from a scan session INSERT INTO ARRAY spectral_readings (device_id, wavelength_nm, ts_epoch, intensity, noise_floor) VALUES ($dev, 500, $t, 0.41, 0.02), ($dev, 501, $t, 0.43, 0.02), ($dev, 502, $t, 0.40, 0.02); -- Force in-memory tiles to durable storage SELECT ARRAY_FLUSH('spectral_readings'); -- Slice: wavelengths 500–600 nm for one device over a time window SELECT wavelength_nm, avg(intensity) AS avg_intensity FROM ARRAY_SLICE( 'spectral_readings', { device_id: [$dev, $dev + 1), wavelength_nm: [500, 600), ts_epoch: [$t0, $t1) }, ['intensity'] ) GROUP BY wavelength_nm ORDER BY wavelength_nm; -- Reduce a dimension: total intensity per wavelength, collapsing time SELECT * FROM ARRAY_AGG('spectral_readings', 'intensity', 'SUM', 'ts_epoch'); ``` ## Edge deployment — NodeDB-Lite Edge gateways run NodeDB-Lite: the full engine set in an embeddable library with no network dependencies. Data is written locally and synced to Origin via Loro CRDT deltas when connectivity is available. ```sql -- On the edge device (NodeDB-Lite, embedded in a Rust/C/Swift process) -- Local timeseries writes survive network outages INSERT INTO local_readings (ts, sensor_id, metric, value) VALUES (now(), $sensor_id, 'vibration_g', $g); -- Device configuration arrives via a shape subscription scoped to this device SUBSCRIBE SHAPE ON device_config WHERE device_id = $me; SELECT value FROM device_config WHERE key = 'alert_thresholds'; ``` ```swift // iOS / embedded Swift let db = NodeDbLite.open(path: "edge.db") db.execute("INSERT INTO local_readings ...") db.sync(url: "wss://origin.example.com/sync", token: authToken) ``` CRDT sync is transparent — the edge process doesn't implement retry logic, conflict resolution, or delta tracking. Declare a `conflict_policy` on the collection (`lww` or `field_merge`) and the engine produces and ships the deltas; Origin validates SQL constraints at Raft commit and replies with a `CompensationHint` if a write loses. ## Last-known state — Key-Value engine The KV engine stores the current state of every device for sub-millisecond dashboard reads without scanning the timeseries collection. ```sql CREATE COLLECTION device_state (key TEXT PRIMARY KEY) WITH (engine='kv'); -- Updated on every sensor publish UPSERT INTO device_state { key: $device_id, last_seen: now(), status: $status, battery_pct: $batt }; -- Dashboard: current state of many devices at once SELECT * FROM device_state WHERE key = ANY($device_ids); ``` ## Alerting via LIVE SELECT Trigger downstream alert handlers the moment a sensor value crosses a threshold — no polling loop. ```sql -- Subscribe to readings that breach the CO2 alert threshold LIVE SELECT device_id, value, ts FROM sensor_readings WHERE metric = 'co2_ppm' AND value > 1000; -- The handler receives each qualifying row immediately and fans out to -- PagerDuty, SMS, or a dashboard socket. Cancel with: CANCEL LIVE SELECT ; ``` ## Retention and downsampling Raw readings age out via the collection's `retention`. A continuous aggregate keeps incrementally-maintained hourly summaries — no cron job, no separate pipeline. ```sql CREATE CONTINUOUS AGGREGATE sensor_hourly ON sensor_readings AS SELECT time_bucket('1h', ts) AS bucket, device_id, metric, avg(value) AS avg_val, min(value) AS min_val, max(value) AS max_val FROM sensor_readings GROUP BY bucket, device_id, metric WITH (refresh_interval = '5m'); ``` ## Why not InfluxDB + TileDB? A line-protocol time-series database can't store multi-dimensional instrument output, so you bolt on TileDB or Zarr — and now telemetry lives in two systems with two query languages and no shared identity. NodeDB keeps streaming readings and ND array tiles in one process: a query can prefilter cells by the same surrogate-identity bitmaps the rest of the engines use, and the edge runs the identical engine set embedded. --- # E-commerce & Search (use-cases/ecommerce-search) --- title: E-commerce & Search description: Hybrid BM25 + vector product search, RRF fusion ranking, recommendation graphs, and real-time inventory analytics — without Elasticsearch, a vector database, or a separate OLAP cluster. --- E-commerce search stacks are notoriously complex: Elasticsearch for keyword search, a vector database for semantic similarity, a recommendation engine backed by a graph database, and a data warehouse for sales analytics. NodeDB provides all four as peer engines (plus Graph and FTS overlays) sharing one storage core, one query planner, and one operational surface. ## Engines used | Engine | Role | |---|---| | **Full-Text Search** | BM25 keyword search, faceted filtering, autocomplete | | **Vector** | Semantic product search, visual similarity, embedding-based recommendations | | **Graph** | "Customers also bought", category navigation, brand-product relationships | | **Document (schemaless)** | Product catalog, flexible attributes, variant metadata | | **Columnar** | Sales analytics, inventory reporting, cohort analysis | ## Hybrid product search The strongest e-commerce search combines keyword precision with semantic recall, then fuses the ranked lists. `rrf_score(...)` does the fusion in the planner — no application-side merge, no Elasticsearch + Pinecone glue code. ```sql CREATE COLLECTION products; CREATE SEARCH INDEX ON products FIELDS title, description ANALYZER 'english' FUZZY true; -- One vector index per embedding column on the same collection. CREATE VECTOR INDEX idx_text ON products (text_embedding) METRIC cosine DIM 384; CREATE VECTOR INDEX idx_image ON products (image_embedding) METRIC cosine DIM 512; -- Hybrid search: BM25 keyword + vector semantic, fused with RRF SELECT id, title, price, brand, rrf_score( vector_distance(text_embedding, $query_vec), bm25_score(description, $query_text) ) AS relevance FROM products WHERE in_stock = true AND price BETWEEN $min_price AND $max_price AND category_id = ANY($category_ids) ORDER BY relevance DESC LIMIT 20; ``` ## Autocomplete and faceted filtering ```sql -- Autocomplete: fuzzy title match, ranked by popularity SELECT DISTINCT id, title FROM products WHERE text_match(title, $prefix, { fuzzy: true, distance: 1 }) ORDER BY popularity_score DESC LIMIT 8; -- Facet counts: products per brand in the current keyword result set SELECT brand, count(*) AS count FROM products WHERE text_match(description, $query_text) AND in_stock = true GROUP BY brand ORDER BY count DESC LIMIT 15; ``` ## Visual similarity search For fashion, furniture, and home goods, image-embedding similarity lets shoppers find products that look alike — searching the `image_embedding` index created above, right alongside the catalog row: ```sql -- "Shop similar items" — products whose image embedding is closest to the -- viewed one, restricted to the same in-stock category. SELECT id, title, price, image_url, image_embedding <=> $viewed_vec AS visual_distance FROM products WHERE category_id = $category_id AND in_stock = true AND id <> $viewed_product_id ORDER BY image_embedding <=> $viewed_vec LIMIT 12; ``` ## Recommendation graph — "Customers also bought" Co-purchase relationships are graph edges layered on the `products` collection — graph is an overlay, you don't create a separate "graph collection." Traversal-based recommendations run alongside the catalog. ```sql -- A batch job recomputes co-purchase counts and (re)writes the edge with the -- current weight — DELETE then INSERT, since GRAPH INSERT EDGE is insert-only. GRAPH DELETE EDGE IN 'products' FROM $product_a TO $product_b TYPE 'co_purchased'; GRAPH INSERT EDGE IN 'products' FROM $product_a TO $product_b TYPE 'co_purchased' PROPERTIES { weight: $copurchase_count }; -- "Customers also bought": direct co-purchase neighbours of a product, -- ranked by co-purchase weight. Graph nodes are document rows, so their -- catalog fields are available in RETURN. MATCH (seed:Product)-[e:co_purchased]->(rec:Product) WHERE seed.id = $product_id AND rec.in_stock = true RETURN rec.id, rec.title, rec.price, e.weight AS affinity ORDER BY e.weight DESC LIMIT 8; ``` ## Real-time inventory — LIVE SELECT Push in-stock / out-of-stock transitions to product pages without polling. `LIVE SELECT` registers a query and streams matching changes over pgwire, WebSocket, or the native protocol. ```sql -- A shopper viewing their wishlist subscribes to back-in-stock events LIVE SELECT id, title FROM products WHERE id = ANY($wishlist_ids) AND in_stock = true; -- When a back-ordered item is restocked, every subscriber gets the row -- delivered immediately. Cancel with: CANCEL LIVE SELECT ; ``` ## Sales analytics — Columnar engine The Columnar engine handles aggregation-heavy analytics across millions of orders without a separate data warehouse — per-column compression, block statistics, and predicate pushdown. ```sql CREATE COLLECTION orders ( id TEXT PRIMARY KEY, placed_at TIMESTAMP TIME_KEY, status TEXT ) WITH (engine='columnar'); -- Revenue by category, last 30 days SELECT c.name AS category, count(DISTINCT o.id) AS orders, sum(oi.quantity * oi.price) AS revenue, avg(oi.quantity * oi.price) AS avg_order_value FROM orders o JOIN order_items oi ON oi.order_id = o.id JOIN products p ON p.id = oi.product_id JOIN categories c ON c.id = p.category_id WHERE o.placed_at >= now() - INTERVAL '30 days' AND o.status = 'completed' GROUP BY c.name ORDER BY revenue DESC; -- Conversion funnel: views → add-to-cart → purchase SELECT time_bucket('1d', e.ts) AS day, count(CASE WHEN e.event = 'product_view' THEN 1 END) AS views, count(CASE WHEN e.event = 'add_to_cart' THEN 1 END) AS add_to_cart, count(CASE WHEN e.event = 'purchase' THEN 1 END) AS purchases, round( count(CASE WHEN e.event = 'purchase' THEN 1 END)::numeric / nullif(count(CASE WHEN e.event = 'product_view' THEN 1 END), 0) * 100, 2 ) AS conversion_pct FROM events e WHERE e.ts >= now() - INTERVAL '14 days' GROUP BY day ORDER BY day; ``` Store full-precision embeddings and rank the first N dimensions for a fast first pass via the `query_dim` tuning argument, then re-rank the top candidates at full precision — no reindexing: ```sql SELECT id, title, vector_distance(text_embedding, $query_vec, query_dim => 256) AS distance FROM products WHERE in_stock = true ORDER BY distance LIMIT 100; ``` ## Why not Elasticsearch + a vector database? Elasticsearch handles keyword search but requires a separate vector database for semantic recall, a graph database for recommendations, and a data warehouse for analytics. Synchronising product-catalog changes across four systems is the dominant operational burden. NodeDB keeps every engine in the same process: a hybrid search + recommendation + inventory check is one SQL statement, not four API calls.