Does LocalMode semantic search work with Next.js?

Yes. LocalMode models run client-side, so use 'use client' directives on components that call hooks. The model downloads happen in the browser, not during SSR. The LocalMode showcase app itself is built with Next.js 16.

How fast is browser-based semantic search?

After model load, embedding a query takes 5-20ms and VectorDB search over 10K vectors takes under 5ms. Total search latency is under 30ms, faster than any network-based search solution.

Do I need a backend server for semantic search in React?

No. The entire search system — embedding model, vector database, and search logic — runs in the browser with zero backend code. Just install @localmode/core, @localmode/transformers, and @localmode/react.

AI Search in React Apps

Add semantic search to any React app in 10 minutes with LocalMode hooks - search by meaning, not keywords.

Category: Developer Guide

The Problem

Adding search to a React app typically means setting up Algolia, Elasticsearch, or a backend API. These require server infrastructure, API keys, and ongoing costs. For many apps - documentation sites, note-taking tools, content managers - a client-side semantic search would be simpler and more privacy-friendly.

This is a common challenge for teams building modern applications. Traditional approaches either compromise on privacy (by sending data to cloud APIs), require complex server infrastructure (adding cost and maintenance burden), or sacrifice functionality (by avoiding AI entirely). LocalMode provides a fourth option: run the AI locally in the browser.

The Solution

Use @localmode/react hooks to build semantic search directly in your React components. useSemanticSearch() handles both embedding queries and vector store search in one hook, and usePipeline() orchestrates multi-step RAG flows. The entire search system runs in the browser with zero backend code.

Why Local-First?

Building this feature with on-device inference provides three structural advantages over cloud-based alternatives:

Zero marginal cost - After the initial model download, every inference operation is free. No per-token fees, no monthly API bills, no surprise invoices. This matters especially for features used frequently or by many users.
Architectural privacy - User data never leaves the device. This is not a policy promise ("we won't look at your data") but an architectural guarantee: the data physically cannot reach any server because the processing happens in the browser tab.
Offline capability - Once models are cached in IndexedDB, the entire feature works without internet. This is critical for field deployments, mobile apps with spotty connectivity, and enterprise environments with restricted networks.

Technology Stack

Package	Purpose
`@localmode/core`	createVectorDB(), embed(), semanticSearch()
`@localmode/transformers`	BGE-small embedding model
`@localmode/react`	useEmbed(), useSemanticSearch(), usePipeline()

Install the required packages:

npm install @localmode/core @localmode/transformers @localmode/react

Implementation

import { useSemanticSearch } from '@localmode/react';
import { transformers } from '@localmode/transformers';
import { createVectorDB } from '@localmode/core';
import { useState, useEffect } from 'react';

const model = transformers.embedding('Xenova/bge-small-en-v1.5');

function SearchComponent() {
  const [db, setDb] = useState(null);
  const { results, isSearching, search } = useSemanticSearch({ model, db, topK: 10 });
  const [query, setQuery] = useState('');

  useEffect(() => {
    createVectorDB({ name: 'docs', dimensions: 384 }).then(setDb);
  }, []);

  const handleSearch = () => {
    if (db) search(query);
  };

  return (
    <div>
      <input value={query} onChange={e => setQuery(e.target.value)} />
      <button onClick={handleSearch} disabled={isSearching || !db}>
        {isSearching ? 'Searching...' : 'Search'}
      </button>
      {results.length > 0 && <ResultsList items={results} />}
    </div>
  );
}

How This Works

The code above demonstrates the complete pipeline. Let us walk through the key decisions:

Model selection - The models referenced in this example are chosen for their balance of size, speed, and quality for this specific use case. Smaller models load faster and use less memory; larger models produce better results. Start with the recommended models and upgrade only if quality is insufficient for your users.
Browser APIs - LocalMode uses IndexedDB for persistent storage (vectors, model cache), Web Workers for background processing (keeping the UI responsive during inference), and the Web Crypto API for optional encryption.
Error handling - All LocalMode functions throw typed errors (ModelLoadError, StorageError, ValidationError) with actionable hints. Wrap calls in try/catch and use the error's hint property to display user-friendly messages.
Cancellation - Pass an AbortSignal to any long-running operation. This lets users cancel searches, embeddings, or generation without waiting for completion.

Production Considerations

When deploying this solution to production, consider these factors:

Model preloading: Download models during user onboarding or application setup, not on first use. Use preloadModel() with an onProgress callback to show download progress. This avoids the poor experience of a loading spinner on the first AI interaction.

Storage management: IndexedDB has browser-specific quotas (Chrome allows up to 60% of total disk size per origin; iOS Safari is more restrictive). Use getStorageQuota() to check available space and navigator.storage.persist() to request persistent storage that survives browser storage pressure.

Device adaptation: Not all users have the same hardware. Use detectCapabilities() and recommendModels() to select models appropriate for each user's device - call recommendModels(caps, { task }) with the detected capabilities. A desktop with a discrete GPU can handle 3GB models; a mobile phone with 3GB RAM should use models under 300MB.

Error boundaries: Wrap AI-powered components in error boundaries. If model loading fails (network error, storage quota exceeded, incompatible browser), fall back gracefully - show the non-AI version of the feature rather than crashing the page.

Methodology

Every hook name, return shape, and API in this guide was verified directly against the LocalMode source code: packages/react/src/index.ts (full export list), packages/react/src/hooks/use-embed.ts, packages/react/src/hooks/use-semantic-search.ts, packages/react/src/core/use-operation.ts (confirming the { data, error, isLoading, execute, cancel, reset } return shape), and packages/core/src/embeddings/types.ts (EmbedResult). The IndexedDB storage quota figure ("up to 60% of total disk size per origin") was verified against MDN's Storage quotas and eviction criteria article and cross-checked against other LocalMode blog posts that cite the same source. Code examples use the real exported API surface - useSemanticSearch from @localmode/react and createVectorDB / semanticSearch from @localmode/core.

Sources

LocalMode source - packages/react/src/index.ts - full hook export list (56 use* hooks confirmed)
LocalMode source - packages/react/src/hooks/use-semantic-search.ts - useSemanticSearch signature and return shape
LocalMode source - packages/react/src/hooks/use-embed.ts - useEmbed signature and EmbedResult return
LocalMode source - packages/react/src/core/use-operation.ts - base hook return shape { data, error, isLoading, execute, cancel, reset }
LocalMode source - packages/core/src/embeddings/types.ts - EmbedResult interface
LocalMode source - packages/core/src/capabilities/recommend.ts - recommendModels(capabilities, options) signature
MDN - Storage quotas and eviction criteria - Chrome IndexedDB quota: up to 60% of total disk size per origin

AI Search in React Apps

AI Search in React Apps

The Problem

The Solution

Why Local-First?

Technology Stack

Implementation

How This Works

Production Considerations

Further Reading

Methodology

Sources

Frequently Asked Questions