How does semantic product search solve the vocabulary mismatch problem in e-commerce?

Customers search with their own words ('warm winter jacket under $100') while merchants describe products differently ('Insulated Puffer Coat'). CLIP/SigLIP models map both text and images into the same 768-dimensional vector space, so similarity is computed by meaning rather than keyword overlap. This eliminates the vocabulary gap.

What three search modes does a browser-based e-commerce search support?

Text-to-product (customer types a description, CLIP text encoder finds matching product images), image-to-product (customer uploads a photo, CLIP vision encoder finds visually similar items), and auto-categorization (zero-shot classification assigns products to a taxonomy on upload). All three share one model and one VectorDB.

How much does Algolia cost versus a browser-based semantic search?

Algolia's AI-powered Grow Plus tier charges $1.75 per 1,000 searches. A mid-size Shopify store with 500,000 monthly searches pays $250-875/month, plus record storage fees. Enterprise tiers start at $50K+ annually. Browser-based search with LocalMode costs $0/month after the one-time model download (~400 MB for SigLIP).

E-Commerce Product Search That Understands Intent - No Algolia Required

A customer types "warm winter jacket under $100" into your search bar. A keyword search engine tokenizes that into warm, winter, jacket, under, $100 and returns nothing - because your product titles say "Insulated Puffer Coat" and "Fleece-Lined Parka." The customer leaves. You just lost a sale to a search box.

This is the fundamental problem with keyword search in e-commerce. Products are described by merchants. Searches are described by customers. The two vocabularies rarely overlap.

Algolia, Elasticsearch, and Searchspring solve this with cloud-hosted semantic layers - and charge you for every single query. Algolia's Grow plan bills $0.50 per 1,000 search requests. Their AI-powered Grow Plus tier charges $1.75 per 1,000. A mid-size Shopify store running 500,000 searches per month pays $250–$875/month just for search, before you count record storage fees ($0.40/1,000 records), analytics add-ons, or merchandising tools. Enterprise tiers start at $50K+ annually.

What if the search engine ran entirely in the customer's browser? No API keys. No per-request billing. No data leaving the device. That is what we built in the Product Search demo - and in this post, we will walk through every line of code.

The Three Search Modes

Our approach uses CLIP/SigLIP models - neural networks trained on hundreds of millions of image-text pairs - to map both text and images into the same 768-dimensional vector space. Once everything lives in the same space, similarity is just a distance calculation.

This unlocks three search modes that would each require a separate paid service with cloud providers:

Mode	What the customer does	What happens under the hood
Text-to-product	Types "warm winter jacket"	Text is embedded via CLIP text encoder, compared against product image vectors
Image-to-product	Uploads a photo from Instagram	Photo is embedded via CLIP vision encoder, nearest neighbors returned
Auto-categorization	Nothing - it is automatic	Zero-shot classification assigns each product to a taxonomy on upload

All three modes share one model (Xenova/siglip-base-patch16-224, ~400MB downloaded once) and one VectorDB. No backend. No API keys. No recurring cost.

Step 1: Create the Vector Database

Every product's visual fingerprint is stored in a VectorDB with HNSW indexing for sub-millisecond search:

import { createVectorDB } from '@localmode/core';

const db = await createVectorDB({
  name: 'product-catalog',
  dimensions: 768,       // SigLIP-Base produces 768-d vectors
  storage: 'indexeddb',  // Persists across sessions
});

The dimensions value must match the model output. SigLIP-Base-Patch16-224 produces 768-dimensional vectors. Standard CLIP ViT-Base models produce 512. LocalMode's TransformersCLIPEmbeddingModel infers this automatically from the model ID, but the VectorDB needs the explicit number.

Memory vs IndexedDB

For a production catalog with thousands of products, use 'indexeddb' storage so vectors persist across page reloads. The showcase demo uses 'memory' storage because uploaded images are ephemeral. For a Shopify plugin, you would persist to IndexedDB and only re-index when the catalog changes.

Step 2: Index Products with Image Features

When a product is added to the catalog, we extract its visual fingerprint using SigLIP's vision encoder and store it in the VectorDB:

import { extractImageFeatures } from '@localmode/core';
import { transformers } from '@localmode/transformers';

// Create the image feature model (lazy-loads on first use)
const imageModel = transformers.imageFeatures('Xenova/siglip-base-patch16-224');

async function indexProduct(product, imageDataUrl) {
  // Extract a 768-d feature vector from the product image
  const { features } = await extractImageFeatures({
    model: imageModel,
    image: imageDataUrl,
  });

  // Store in VectorDB with metadata for filtering
  await db.add({
    id: product.id,
    vector: features,
    metadata: {
      name: product.name,
      category: product.category,
      price: product.price,
    },
  });
}

The extractImageFeatures function accepts any ImageInput - a base64 data URL, a Blob, an ImageData object, or an ArrayBuffer. For a Shopify integration, you would fetch product image URLs and pass them directly. The vision encoder loads lazily on the first call and stays in memory for subsequent products.

Batch ingestion for large catalogs

For catalogs with hundreds or thousands of products, LocalMode's adaptive batching automatically sizes batches to the customer's device:

import { computeOptimalBatchSize } from '@localmode/core';

const { batchSize, reasoning } = computeOptimalBatchSize({
  taskType: 'ingestion',
  modelDimensions: 768,
});

console.log(reasoning);
// → "16 cores, 32GB RAM, GPU: Yes → batch size 128"

The showcase app displays this in the header as a "Batch: N" badge - customers on a MacBook Pro with 32GB RAM and a GPU get batch size 128, while a budget Chromebook might get 16. The ingestion pipeline never overwhelms the device.

Step 3: Text-to-Product Search

This is where CLIP's cross-modal alignment shines. The customer types natural language; the text encoder maps it into the same vector space as the product images:

import { embed, semanticSearch } from '@localmode/core';

// Create the text embedding model (shares weights with image model)
const textModel = transformers.embedding('Xenova/siglip-base-patch16-224');

// Option A: One-step semantic search
const { results } = await semanticSearch({
  db,
  model: textModel,
  query: 'warm winter jacket under $100',
  k: 20,
});

for (const result of results) {
  console.log(`${result.id}: ${(result.score * 100).toFixed(0)}% match`);
}

// Option B: Embed + search separately (useful for caching the embedding)
const { embedding } = await embed({
  model: textModel,
  value: 'warm winter jacket under $100',
});

const matches = await db.search(embedding, {
  k: 20,
  filter: { category: 'Clothing' },  // Metadata filter
  threshold: 0.2,                     // Cross-modal similarity threshold
});

Notice the threshold: 0.2. Cross-modal CLIP/SigLIP similarity scores are lower than same-modality text embeddings. A score of 0.25 in cross-modal search represents strong relevance, while the same score in a text-only embedding model like BGE would be poor. The showcase app uses getDefaultThreshold() from @localmode/core to look up per-model presets, falling back to 0.2 for SigLIP.

The filter parameter enables metadata-based narrowing without a separate faceting service. Algolia charges extra for faceted search. Here, it is a single object on the search call, evaluated against typed metadata stored alongside each vector.

Step 4: Image-to-Product Search (Visual Similarity)

A customer screenshots a jacket from Instagram and drops it into your search bar. The vision encoder produces a feature vector from the query image, and the VectorDB finds the nearest neighbors:

async function searchByImage(queryImageDataUrl) {
  // Extract features from the query image
  const { features } = await extractImageFeatures({
    model: imageModel,
    image: queryImageDataUrl,
  });

  // Search for visually similar products
  const results = await db.search(features, { k: 20 });

  return results.map((r) => ({
    productId: r.id,
    similarity: r.score,
    metadata: r.metadata,
  }));
}

Because both the catalog images and the query image pass through the same vision encoder, similarity scores are directly comparable. A score above 0.7 indicates near-identical items (same product, different angle). Scores between 0.4 and 0.7 capture visually related items (similar style, different brand).

This is the same pipeline that powers Google Lens and Pinterest Visual Search - except it runs in the customer's browser tab, not on a fleet of GPU servers.

Step 5: Auto-Categorization with Zero-Shot Classification

When a merchant uploads a new product image, we automatically assign it to a category without any training data:

import { classifyImageZeroShot } from '@localmode/core';

const classifierModel = transformers.zeroShotImageClassifier(
  'Xenova/siglip-base-patch16-224'
);

const CATEGORIES = [
  'Electronics', 'Clothing', 'Home & Garden', 'Toys',
  'Food & Beverage', 'Sports', 'Books', 'Automotive', 'Health',
];

async function categorizeProduct(imageDataUrl) {
  const { labels, scores } = await classifyImageZeroShot({
    model: classifierModel,
    image: imageDataUrl,
    candidateLabels: CATEGORIES,
  });

  return {
    category: labels[0],         // "Clothing"
    confidence: scores[0],       // 0.87
  };
}

All three models - imageFeatures, embedding, and zeroShotImageClassifier - share the same underlying SigLIP weights. The text encoder, vision encoder, and projection heads are loaded once and reused across all three operations. This is not three separate 400MB downloads; it is one model serving three distinct capabilities.

The Cost Comparison

Here is what these features cost with cloud services versus LocalMode, for a mid-size store processing 500,000 search queries and 5,000 product uploads per month:

Capability	Cloud Service	Monthly Cost	LocalMode Cost
Text search (500K queries)	Algolia Grow	$250/mo	$0
Text search (500K queries)	Algolia Grow Plus (AI)	$875/mo	$0
Visual similarity search	Google Vision AI	$7.50/mo (5K images at $1.50/1K)	$0
Auto-categorization	Google Vision Labels	$7.50/mo	$0
Record storage (50K products)	Algolia	$20/mo	$0
Infrastructure	Elasticsearch Cloud	$95–$175/mo	$0
Annual total		$3,300–$12,900	$0

Sources: Algolia pricing, Elastic Cloud pricing, Google Cloud Vision pricing.

The tradeoff is a one-time ~400MB model download on the customer's first visit. After that, the model is cached in the browser and all subsequent searches run locally in 50–200ms. For a Shopify plugin, you can trigger the download during onboarding or preload it in a service worker.

Scaling math

At 1 million searches/month, Algolia Grow costs $500/mo ($6,000/year). At 5 million, it is $2,500/mo ($30,000/year). LocalMode costs $0 at every scale because compute happens on the customer's device. The more users you have, the more compute capacity you gain - for free.

Putting It All Together: The React Integration

The Product Search showcase app wires everything together with @localmode/react hooks. Here is the simplified architecture:

import { useSemanticSearch } from '@localmode/react';

function useProductSearch() {
  // Text search delegates to the React hook
  const { results, isSearching, search } = useSemanticSearch({
    model: textModel,
    db: vectorDB,
    topK: 20,
  });

  // Image search calls the service layer
  const searchByImage = async (file) => {
    const dataUrl = await readFileAsDataUrl(file);
    const { features } = await extractImageFeatures({
      model: imageModel,
      image: dataUrl,
    });
    return db.search(features, { k: 20 });
  };

  // Upload pipeline with batch processing
  const uploadBatch = useBatchOperation({
    fn: async (file, signal) => {
      const dataUrl = await readFileAsDataUrl(file);
      const { label, score } = await categorizeProduct(dataUrl);
      const product = { id: crypto.randomUUID(), category: label, ... };
      await indexProduct(product, dataUrl);
      return product;
    },
    concurrency: 1,
  });

  return { results, isSearching, search, searchByImage, uploadBatch };
}

The useSemanticSearch hook manages loading states, error handling, and cancellation. The useBatchOperation hook processes file uploads sequentially with progress tracking and abort support. Components receive clean state - results, isSearching, error - and never touch model instances directly.

Model Quality: How Does Browser-Side CLIP Compare?

Skepticism about browser-based ML is warranted. Can a quantized SigLIP model running in WebAssembly actually match cloud search quality?

Recent benchmarks paint an encouraging picture. A 2025 study on image embeddings for e-commerce found that SigLIP achieved state-of-the-art retrieval performance across five of six product datasets, outperforming standard CLIP and domain-specific models. Amazon's research team demonstrated that CLIP-based unified search enables effective cross-modal product retrieval with a single model serving both text and image queries. And Alibaba's VL-CLIP paper showed that multimodal product embeddings increase click-through rate by 18.6% and add-to-cart rate by 15.5% compared to text-only approaches.

The quantized models in LocalMode (q8 precision via ONNX Runtime) retain over 99% of the full-precision model's accuracy while running 2–3x faster. The quality gap between a browser-side SigLIP search and a cloud-hosted Elasticsearch semantic layer is far smaller than the gap between keyword search and any semantic approach.

When to Use This (And When Not To)

This approach works well for:

Shopify/WooCommerce plugins where you cannot control the backend
Privacy-sensitive catalogs (medical devices, defense, luxury goods)
Offline-capable applications (trade shows, field sales)
Startups that want search without a recurring cloud bill
Internal tools where data should not leave the company network

Consider a cloud solution when:

Your catalog exceeds 100,000 products (browser memory becomes a constraint)
You need real-time collaborative merchandising and A/B testing
Your search requires inventory-aware ranking tied to a live database
Sub-10ms latency at the 99th percentile is a hard requirement

For most small-to-mid e-commerce stores - which represent the vast majority of Shopify's 5.5 million active stores - a 400MB model download and 50–200ms search latency is a better deal than $3,000–$30,000 per year in search API fees.

Get Started

Install the packages:

npm install @localmode/core @localmode/transformers @localmode/react

The complete source code for the Product Search app is in the showcase repository under apps/showcase-nextjs/src/app/(apps)/product-search/. The service layer in _services/search.service.ts contains the full indexing and search pipeline. The hook in _hooks/use-product-search.ts shows the React integration with useSemanticSearch and useBatchOperation.

Methodology

Cost figures sourced from official pricing pages accessed March 2026:

Algolia Pricing - Grow: $0.50/1K requests, Grow Plus: $1.75/1K requests, records: $0.40/1K beyond 100K included
Elastic Cloud Pricing - $95–$175+/mo for managed clusters
Algolia pricing analysis by Meilisearch - independent cost breakdown
CLIP for e-commerce product retrieval (Amazon) - architecture guide for unified text-image search
Benchmarking image embeddings for e-commerce - SigLIP state-of-the-art on 5/6 product datasets
VL-CLIP multimodal recommendations - 18.6% CTR uplift, 15.5% ATC uplift
Fashion CLIP for product similarity - domain-specific CLIP evaluation
Shopify statistics 2026 - 5.5M+ active stores

Try it yourself

Visit localmode.ai to try 30+ AI demo apps running entirely in your browser. No sign-up, no API keys, no data leaves your device.

Read the Getting Started guide to add local AI to your application in under 5 minutes.

Frequently Asked Questions