E-commerce search stacks are notoriously complex: Elasticsearch for keyword search, a vector database for semantic similarity, a recommendation engine backed by a graph database, and a data warehouse for sales analytics. NodeDB provides all four as peer engines (plus Graph and FTS overlays) sharing one storage core, one query planner, and one operational surface.

Engines used

EngineRole
Full-Text SearchBM25 keyword search, faceted filtering, autocomplete
VectorSemantic product search, visual similarity, embedding-based recommendations
Graph"Customers also bought", category navigation, brand-product relationships
Document (schemaless)Product catalog, flexible attributes, variant metadata
ColumnarSales analytics, inventory reporting, cohort analysis

The strongest e-commerce search combines keyword precision with semantic recall, then fuses the ranked lists. rrf_score(...) does the fusion in the planner — no application-side merge, no Elasticsearch + Pinecone glue code.

CREATE COLLECTION products;
CREATE SEARCH INDEX ON products FIELDS title, description ANALYZER 'english' FUZZY true;
-- One vector index per embedding column on the same collection.
CREATE VECTOR INDEX idx_text  ON products (text_embedding)  METRIC cosine DIM 384;
CREATE VECTOR INDEX idx_image ON products (image_embedding) METRIC cosine DIM 512;

-- Hybrid search: BM25 keyword + vector semantic, fused with RRF
SELECT
  id,
  title,
  price,
  brand,
  rrf_score(
    vector_distance(text_embedding, $query_vec),
    bm25_score(description, $query_text)
  ) AS relevance
FROM products
WHERE in_stock     = true
  AND price        BETWEEN $min_price AND $max_price
  AND category_id  = ANY($category_ids)
ORDER BY relevance DESC
LIMIT 20;

Autocomplete and faceted filtering

-- Autocomplete: fuzzy title match, ranked by popularity
SELECT DISTINCT id, title
FROM products
WHERE text_match(title, $prefix, { fuzzy: true, distance: 1 })
ORDER BY popularity_score DESC
LIMIT 8;

-- Facet counts: products per brand in the current keyword result set
SELECT brand, count(*) AS count
FROM products
WHERE text_match(description, $query_text)
  AND in_stock = true
GROUP BY brand
ORDER BY count DESC
LIMIT 15;

For fashion, furniture, and home goods, image-embedding similarity lets shoppers find products that look alike — searching the image_embedding index created above, right alongside the catalog row:

-- "Shop similar items" — products whose image embedding is closest to the
-- viewed one, restricted to the same in-stock category.
SELECT id, title, price, image_url,
       image_embedding <=> $viewed_vec AS visual_distance
FROM products
WHERE category_id = $category_id
  AND in_stock = true
  AND id <> $viewed_product_id
ORDER BY image_embedding <=> $viewed_vec
LIMIT 12;

Recommendation graph — "Customers also bought"

Co-purchase relationships are graph edges layered on the products collection — graph is an overlay, you don't create a separate "graph collection." Traversal-based recommendations run alongside the catalog.

-- A batch job recomputes co-purchase counts and (re)writes the edge with the
-- current weight — DELETE then INSERT, since GRAPH INSERT EDGE is insert-only.
GRAPH DELETE EDGE IN 'products' FROM $product_a TO $product_b TYPE 'co_purchased';
GRAPH INSERT EDGE IN 'products' FROM $product_a TO $product_b TYPE 'co_purchased'
  PROPERTIES { weight: $copurchase_count };

-- "Customers also bought": direct co-purchase neighbours of a product,
-- ranked by co-purchase weight. Graph nodes are document rows, so their
-- catalog fields are available in RETURN.
MATCH (seed:Product)-[e:co_purchased]->(rec:Product)
WHERE seed.id = $product_id
  AND rec.in_stock = true
RETURN rec.id, rec.title, rec.price, e.weight AS affinity
ORDER BY e.weight DESC
LIMIT 8;

Real-time inventory — LIVE SELECT

Push in-stock / out-of-stock transitions to product pages without polling. LIVE SELECT registers a query and streams matching changes over pgwire, WebSocket, or the native protocol.

-- A shopper viewing their wishlist subscribes to back-in-stock events
LIVE SELECT id, title FROM products
WHERE id = ANY($wishlist_ids)
  AND in_stock = true;

-- When a back-ordered item is restocked, every subscriber gets the row
-- delivered immediately. Cancel with: CANCEL LIVE SELECT <subscription_id>;

Sales analytics — Columnar engine

The Columnar engine handles aggregation-heavy analytics across millions of orders without a separate data warehouse — per-column compression, block statistics, and predicate pushdown.

CREATE COLLECTION orders (
  id          TEXT PRIMARY KEY,
  placed_at   TIMESTAMP TIME_KEY,
  status      TEXT
) WITH (engine='columnar');

-- Revenue by category, last 30 days
SELECT
  c.name                       AS category,
  count(DISTINCT o.id)         AS orders,
  sum(oi.quantity * oi.price)  AS revenue,
  avg(oi.quantity * oi.price)  AS avg_order_value
FROM orders o
JOIN order_items oi  ON oi.order_id    = o.id
JOIN products    p   ON p.id           = oi.product_id
JOIN categories  c   ON c.id           = p.category_id
WHERE o.placed_at >= now() - INTERVAL '30 days'
  AND o.status    = 'completed'
GROUP BY c.name
ORDER BY revenue DESC;

-- Conversion funnel: views → add-to-cart → purchase
SELECT
  time_bucket('1d', e.ts) AS day,
  count(CASE WHEN e.event = 'product_view' THEN 1 END) AS views,
  count(CASE WHEN e.event = 'add_to_cart'  THEN 1 END) AS add_to_cart,
  count(CASE WHEN e.event = 'purchase'     THEN 1 END) AS purchases,
  round(
    count(CASE WHEN e.event = 'purchase' THEN 1 END)::numeric
    / nullif(count(CASE WHEN e.event = 'product_view' THEN 1 END), 0) * 100,
    2
  ) AS conversion_pct
FROM events e
WHERE e.ts >= now() - INTERVAL '14 days'
GROUP BY day
ORDER BY day;

Matryoshka embeddings for tiered search

Store full-precision embeddings and rank the first N dimensions for a fast first pass via the query_dim tuning argument, then re-rank the top candidates at full precision — no reindexing:

SELECT id, title, vector_distance(text_embedding, $query_vec, query_dim => 256) AS distance
FROM products
WHERE in_stock = true
ORDER BY distance
LIMIT 100;

Why not Elasticsearch + a vector database?

Elasticsearch handles keyword search but requires a separate vector database for semantic recall, a graph database for recommendations, and a data warehouse for analytics. Synchronising product-catalog changes across four systems is the dominant operational burden. NodeDB keeps every engine in the same process: a hybrid search + recommendation + inventory check is one SQL statement, not four API calls.

View page sourceLast updated on May 12, 2026 by Farhan Syah