E-Commerce Product Search That Understands Intent - No Algolia Required
Build semantic product search with text-to-product matching, visual similarity via CLIP, and auto-categorization - all running in the browser at $0/month. Complete code walkthrough with LocalMode's VectorDB, embedImage, and classifyImageZeroShot APIs.
A customer types "warm winter jacket under $100" into your search bar. A keyword search engine tokenizes that into warm, winter, jacket, under, $100 and returns nothing - because your product titles say "Insulated Puffer Coat" and "Fleece-Lined Parka." The customer leaves. You just lost a sale to a search box.
This is the fundamental problem with keyword search in e-commerce. Products are described by merchants. Searches are described by customers. The two vocabularies rarely overlap.
Algolia, Elasticsearch, and Searchspring solve this with cloud-hosted semantic layers - and charge you for every single query. Algolia's Grow plan bills $0.50 per 1,000 search requests. Their AI-powered Grow Plus tier charges $1.75 per 1,000. A mid-size Shopify store running 500,000 searches per month pays $250–$875/month just for search, before you count record storage fees ($0.40/1,000 records), analytics add-ons, or merchandising tools. Enterprise tiers start at $50K+ annually.
What if the search engine ran entirely in the customer's browser? No API keys. No per-request billing. No data leaving the device. That is what we built in the Product Search demo - and in this post, we will walk through every line of code.
The Three Search Modes
Our approach uses CLIP/SigLIP models - neural networks trained on hundreds of millions of image-text pairs - to map both text and images into the same 768-dimensional vector space. Once everything lives in the same space, similarity is just a distance calculation.
This unlocks three search modes that would each require a separate paid service with cloud providers:
| Mode | What the customer does | What happens under the hood |
|---|---|---|
| Text-to-product | Types "warm winter jacket" | Text is embedded via CLIP text encoder, compared against product image vectors |
| Image-to-product | Uploads a photo from Instagram | Photo is embedded via CLIP vision encoder, nearest neighbors returned |
| Auto-categorization | Nothing - it is automatic | Zero-shot classification assigns each product to a taxonomy on upload |
All three modes share one model (Xenova/siglip-base-patch16-224, ~400MB downloaded once) and one VectorDB. No backend. No API keys. No recurring cost.
Step 1: Create the Vector Database
Every product's visual fingerprint is stored in a VectorDB with HNSW indexing for sub-millisecond search:
import { createVectorDB } from '@localmode/core';
const db = await createVectorDB({
name: 'product-catalog',
dimensions: 768, // SigLIP-Base produces 768-d vectors
storage: 'indexeddb', // Persists across sessions
});The dimensions value must match the model output. SigLIP-Base-Patch16-224 produces 768-dimensional vectors. Standard CLIP ViT-Base models produce 512. LocalMode's TransformersCLIPEmbeddingModel infers this automatically from the model ID, but the VectorDB needs the explicit number.
Memory vs IndexedDB
For a production catalog with thousands of products, use 'indexeddb' storage so vectors persist across page reloads. The showcase demo uses 'memory' storage because uploaded images are ephemeral. For a Shopify plugin, you would persist to IndexedDB and only re-index when the catalog changes.
Step 2: Index Products with Image Features
When a product is added to the catalog, we extract its visual fingerprint using SigLIP's vision encoder and store it in the VectorDB:
import { extractImageFeatures } from '@localmode/core';
import { transformers } from '@localmode/transformers';
// Create the image feature model (lazy-loads on first use)
const imageModel = transformers.imageFeatures('Xenova/siglip-base-patch16-224');
async function indexProduct(product, imageDataUrl) {
// Extract a 768-d feature vector from the product image
const { features } = await extractImageFeatures({
model: imageModel,
image: imageDataUrl,
});
// Store in VectorDB with metadata for filtering
await db.add({
id: product.id,
vector: features,
metadata: {
name: product.name,
category: product.category,
price: product.price,
},
});
}The extractImageFeatures function accepts any ImageInput - a base64 data URL, a Blob, an ImageData object, or an ArrayBuffer. For a Shopify integration, you would fetch product image URLs and pass them directly. The vision encoder loads lazily on the first call and stays in memory for subsequent products.
Batch ingestion for large catalogs
For catalogs with hundreds or thousands of products, LocalMode's adaptive batching automatically sizes batches to the customer's device:
import { computeOptimalBatchSize } from '@localmode/core';
const { batchSize, reasoning } = computeOptimalBatchSize({
taskType: 'ingestion',
modelDimensions: 768,
});
console.log(reasoning);
// → "16 cores, 32GB RAM, GPU: Yes → batch size 128"The showcase app displays this in the header as a "Batch: N" badge - customers on a MacBook Pro with 32GB RAM and a GPU get batch size 128, while a budget Chromebook might get 16. The ingestion pipeline never overwhelms the device.
Step 3: Text-to-Product Search
This is where CLIP's cross-modal alignment shines. The customer types natural language; the text encoder maps it into the same vector space as the product images:
import { embed, semanticSearch } from '@localmode/core';
// Create the text embedding model (shares weights with image model)
const textModel = transformers.embedding('Xenova/siglip-base-patch16-224');
// Option A: One-step semantic search
const { results } = await semanticSearch({
db,
model: textModel,
query: 'warm winter jacket under $100',
k: 20,
});
for (const result of results) {
console.log(`${result.id}: ${(result.score * 100).toFixed(0)}% match`);
}
// Option B: Embed + search separately (useful for caching the embedding)
const { embedding } = await embed({
model: textModel,
value: 'warm winter jacket under $100',
});
const matches = await db.search(embedding, {
k: 20,
filter: { category: 'Clothing' }, // Metadata filter
threshold: 0.2, // Cross-modal similarity threshold
});Notice the threshold: 0.2. Cross-modal CLIP/SigLIP similarity scores are lower than same-modality text embeddings. A score of 0.25 in cross-modal search represents strong relevance, while the same score in a text-only embedding model like BGE would be poor. The showcase app uses getDefaultThreshold() from @localmode/core to look up per-model presets, falling back to 0.2 for SigLIP.
The filter parameter enables metadata-based narrowing without a separate faceting service. Algolia charges extra for faceted search. Here, it is a single object on the search call, evaluated against typed metadata stored alongside each vector.
Step 4: Image-to-Product Search (Visual Similarity)
A customer screenshots a jacket from Instagram and drops it into your search bar. The vision encoder produces a feature vector from the query image, and the VectorDB finds the nearest neighbors:
async function searchByImage(queryImageDataUrl) {
// Extract features from the query image
const { features } = await extractImageFeatures({
model: imageModel,
image: queryImageDataUrl,
});
// Search for visually similar products
const results = await db.search(features, { k: 20 });
return results.map((r) => ({
productId: r.id,
similarity: r.score,
metadata: r.metadata,
}));
}Because both the catalog images and the query image pass through the same vision encoder, similarity scores are directly comparable. A score above 0.7 indicates near-identical items (same product, different angle). Scores between 0.4 and 0.7 capture visually related items (similar style, different brand).
This is the same pipeline that powers Google Lens and Pinterest Visual Search - except it runs in the customer's browser tab, not on a fleet of GPU servers.
Step 5: Auto-Categorization with Zero-Shot Classification
When a merchant uploads a new product image, we automatically assign it to a category without any training data:
import { classifyImageZeroShot } from '@localmode/core';
const classifierModel = transformers.zeroShotImageClassifier(
'Xenova/siglip-base-patch16-224'
);
const CATEGORIES = [
'Electronics', 'Clothing', 'Home & Garden', 'Toys',
'Food & Beverage', 'Sports', 'Books', 'Automotive', 'Health',
];
async function categorizeProduct(imageDataUrl) {
const { labels, scores } = await classifyImageZeroShot({
model: classifierModel,
image: imageDataUrl,
candidateLabels: CATEGORIES,
});
return {
category: labels[0], // "Clothing"
confidence: scores[0], // 0.87
};
}All three models - imageFeatures, embedding, and zeroShotImageClassifier - share the same underlying SigLIP weights. The text encoder, vision encoder, and projection heads are loaded once and reused across all three operations. This is not three separate 400MB downloads; it is one model serving three distinct capabilities.
The Cost Comparison
Here is what these features cost with cloud services versus LocalMode, for a mid-size store processing 500,000 search queries and 5,000 product uploads per month:
| Capability | Cloud Service | Monthly Cost | LocalMode Cost |
|---|---|---|---|
| Text search (500K queries) | Algolia Grow | $250/mo | $0 |
| Text search (500K queries) | Algolia Grow Plus (AI) | $875/mo | $0 |
| Visual similarity search | Google Vision AI | $7.50/mo (5K images at $1.50/1K) | $0 |
| Auto-categorization | Google Vision Labels | $7.50/mo | $0 |
| Record storage (50K products) | Algolia | $20/mo | $0 |
| Infrastructure | Elasticsearch Cloud | $95–$175/mo | $0 |
| Annual total | $3,300–$12,900 | $0 |
Sources: Algolia pricing, Elastic Cloud pricing, Google Cloud Vision pricing.
The tradeoff is a one-time ~400MB model download on the customer's first visit. After that, the model is cached in the browser and all subsequent searches run locally in 50–200ms. For a Shopify plugin, you can trigger the download during onboarding or preload it in a service worker.
Scaling math
At 1 million searches/month, Algolia Grow costs $500/mo ($6,000/year). At 5 million, it is $2,500/mo ($30,000/year). LocalMode costs $0 at every scale because compute happens on the customer's device. The more users you have, the more compute capacity you gain - for free.
Putting It All Together: The React Integration
The Product Search showcase app wires everything together with @localmode/react hooks. Here is the simplified architecture:
import { useSemanticSearch } from '@localmode/react';
function useProductSearch() {
// Text search delegates to the React hook
const { results, isSearching, search } = useSemanticSearch({
model: textModel,
db: vectorDB,
topK: 20,
});
// Image search calls the service layer
const searchByImage = async (file) => {
const dataUrl = await readFileAsDataUrl(file);
const { features } = await extractImageFeatures({
model: imageModel,
image: dataUrl,
});
return db.search(features, { k: 20 });
};
// Upload pipeline with batch processing
const uploadBatch = useBatchOperation({
fn: async (file, signal) => {
const dataUrl = await readFileAsDataUrl(file);
const { label, score } = await categorizeProduct(dataUrl);
const product = { id: crypto.randomUUID(), category: label, ... };
await indexProduct(product, dataUrl);
return product;
},
concurrency: 1,
});
return { results, isSearching, search, searchByImage, uploadBatch };
}The useSemanticSearch hook manages loading states, error handling, and cancellation. The useBatchOperation hook processes file uploads sequentially with progress tracking and abort support. Components receive clean state - results, isSearching, error - and never touch model instances directly.
Model Quality: How Does Browser-Side CLIP Compare?
Skepticism about browser-based ML is warranted. Can a quantized SigLIP model running in WebAssembly actually match cloud search quality?
Recent benchmarks paint an encouraging picture. A 2025 study on image embeddings for e-commerce found that SigLIP achieved state-of-the-art retrieval performance across five of six product datasets, outperforming standard CLIP and domain-specific models. Amazon's research team demonstrated that CLIP-based unified search enables effective cross-modal product retrieval with a single model serving both text and image queries. And Alibaba's VL-CLIP paper showed that multimodal product embeddings increase click-through rate by 18.6% and add-to-cart rate by 15.5% compared to text-only approaches.
The quantized models in LocalMode (q8 precision via ONNX Runtime) retain over 99% of the full-precision model's accuracy while running 2–3x faster. The quality gap between a browser-side SigLIP search and a cloud-hosted Elasticsearch semantic layer is far smaller than the gap between keyword search and any semantic approach.
When to Use This (And When Not To)
This approach works well for:
- Shopify/WooCommerce plugins where you cannot control the backend
- Privacy-sensitive catalogs (medical devices, defense, luxury goods)
- Offline-capable applications (trade shows, field sales)
- Startups that want search without a recurring cloud bill
- Internal tools where data should not leave the company network
Consider a cloud solution when:
- Your catalog exceeds 100,000 products (browser memory becomes a constraint)
- You need real-time collaborative merchandising and A/B testing
- Your search requires inventory-aware ranking tied to a live database
- Sub-10ms latency at the 99th percentile is a hard requirement
For most small-to-mid e-commerce stores - which represent the vast majority of Shopify's 5.5 million active stores - a 400MB model download and 50–200ms search latency is a better deal than $3,000–$30,000 per year in search API fees.
Get Started
Install the packages:
npm install @localmode/core @localmode/transformers @localmode/reactThe complete source code for the Product Search app is in the showcase repository under apps/showcase-nextjs/src/app/(apps)/product-search/. The service layer in _services/search.service.ts contains the full indexing and search pipeline. The hook in _hooks/use-product-search.ts shows the React integration with useSemanticSearch and useBatchOperation.
Methodology
Cost figures sourced from official pricing pages accessed March 2026:
- Algolia Pricing - Grow: $0.50/1K requests, Grow Plus: $1.75/1K requests, records: $0.40/1K beyond 100K included
- Elastic Cloud Pricing - $95–$175+/mo for managed clusters
- Algolia pricing analysis by Meilisearch - independent cost breakdown
- CLIP for e-commerce product retrieval (Amazon) - architecture guide for unified text-image search
- Benchmarking image embeddings for e-commerce - SigLIP state-of-the-art on 5/6 product datasets
- VL-CLIP multimodal recommendations - 18.6% CTR uplift, 15.5% ATC uplift
- Fashion CLIP for product similarity - domain-specific CLIP evaluation
- Shopify statistics 2026 - 5.5M+ active stores
Try it yourself
Visit localmode.ai to try 30+ AI demo apps running entirely in your browser. No sign-up, no API keys, no data leaves your device.
Read the Getting Started guide to add local AI to your application in under 5 minutes.