WebGPU Vector Search
Accelerate vector distance computation with GPU compute shaders for 5-50x faster batch search operations.
LocalMode can optionally use WebGPU compute shaders to accelerate vector distance computation during HNSW search. This is entirely opt-in and falls back to CPU distance functions silently when WebGPU is unavailable.
See it in action
Try PDF Search and Semantic Search for working demos of these APIs.
Quick Start
Enable GPU acceleration with a single flag:
import { createVectorDB } from '@localmode/core';
const db = await createVectorDB({
name: 'docs',
dimensions: 384,
enableGPU: true,
});
// Search works exactly the same — GPU acceleration is transparent
const results = await db.search(queryVector, { k: 10 });Automatic Fallback
If WebGPU is not available (older browsers, Node.js, SSR), the VectorDB automatically uses CPU distance functions. No code changes needed.
How It Works
HNSW search computes hundreds of vector distances per query. With GPU acceleration:
- Below threshold (default: 64 candidates) — Uses CPU distance functions (GPU dispatch overhead would dominate)
- Above threshold — Batches all candidate vectors, dispatches a single GPU compute shader, reads back results
- Result — Each GPU dispatch computes 256+ distances in parallel, achieving 5-50x speedup for large candidate sets
WGSL Compute Shaders
Three WGSL compute shaders are compiled at initialization (one per distance metric):
- Cosine distance —
1 - dot(a,b) / (||a|| * ||b||) - Euclidean distance —
sqrt(sum((a[d] - b[d])^2)) - Dot product distance —
-sum(a[d] * b[d])(negated for HNSW)
Each shader uses @workgroup_size(256), processing 256 distances per workgroup dispatch.
Configuration
VectorDB Level
The simplest way to enable GPU acceleration:
const db = await createVectorDB({
name: 'my-db',
dimensions: 384,
enableGPU: true,
});HNSW Index Level
For fine-grained control, configure the gpu options on the HNSW index:
const db = await createVectorDB({
name: 'my-db',
dimensions: 384,
indexOptions: {
gpu: {
enabled: true,
batchThreshold: 128, // Default: 64
onFallback: (reason) => {
console.log('GPU fallback:', reason);
},
},
},
});When both enableGPU and indexOptions.gpu are provided, the explicit indexOptions.gpu settings take precedence:
const db = await createVectorDB({
name: 'my-db',
dimensions: 384,
enableGPU: true, // sets gpu.enabled = true
indexOptions: {
gpu: { batchThreshold: 128 }, // overrides default threshold
},
});HNSWGPUOptions
Prop
Type
Standalone GPU Distance Computer
For use cases outside of VectorDB (e.g., custom similarity search, batch deduplication), use the standalone API:
import { createGPUDistanceComputer } from '@localmode/core';
const gpu = await createGPUDistanceComputer({
batchThreshold: 32,
onFallback: (reason) => console.warn(reason),
});
// Compute distances between a query and 1000 candidates
const distances = await gpu.computeDistances(
queryVector,
candidateVectors,
'cosine',
);
// Clean up GPU resources when done
gpu.destroy();GPUDistanceOptions
Prop
Type
GPUDistanceComputer
Prop
Type
Browser Requirements
| Browser | WebGPU Support | Notes |
|---|---|---|
| Chrome 113+ | Stable | Full support |
| Edge 113+ | Stable | Full support |
| Firefox 141+ | Stable | Since late 2025 |
| Safari 26+ | Stable | WWDC 2025 |
| Node.js | Not available | CPU fallback |
WebGPU covers approximately 90% of browser traffic.
Performance Characteristics
| Candidate Count | CPU (384d) | GPU Expected | Speedup |
|---|---|---|---|
| 64 | ~1.2ms | ~0.8ms | 1.5x |
| 256 | ~5ms | ~0.5ms | 10x |
| 1024 | ~20ms | ~0.7ms | 28x |
| 4096 | ~80ms | ~1.5ms | 53x |
First Search Overhead
The first search after page load includes shader compilation time (1-5ms per pipeline). The browser caches compiled shaders for subsequent page loads.
When to Enable GPU
GPU acceleration is most beneficial when:
- Your index has 10K+ vectors
- You use
efSearchvalues of 100+ (more candidates per search) - You perform batch operations (bulk search, reranking, deduplication)
- Vector dimensions are 384+ (more computation per distance)
For small indexes (< 1K vectors) with default efSearch: 50, the overhead of GPU dispatch may not provide a net benefit.
Resource Management
Buffer Pooling
The GPU distance manager maintains a buffer pool to minimize GPU memory allocation overhead. Buffers are reused across consecutive search calls. The pool is automatically cleaned up when:
db.close()is calledgpu.destroy()is called- The GPU device is lost (e.g., tab backgrounding)
GPU Memory
For typical usage (384 dimensions, 256 candidates per dispatch), the buffer pool uses approximately 512KB of GPU memory — negligible compared to model weights.
Device Lost Recovery
If the GPU device is lost (hardware error, tab backgrounding), the system:
- Falls back to CPU for the current operation
- Invokes the
onFallbackcallback - Attempts to re-acquire the GPU device on the next search
Fallback Behavior
GPU acceleration falls back to CPU silently in these cases:
| Scenario | Behavior |
|---|---|
| WebGPU not available | CPU fallback at initialization |
| Candidate count < batchThreshold | CPU per-pair distance |
| GPU device lost | CPU fallback + recovery attempt |
| GPU dispatch error | CPU fallback for that operation |
| Node.js / SSR environment | CPU fallback |
The onFallback callback is invoked in all cases, providing observability without breaking the application.
Cleanup
Always clean up GPU resources when done:
// VectorDB — close() handles GPU cleanup
await db.close();
// Standalone — call destroy() explicitly
gpu.destroy();Failing to clean up may leak GPU buffers until the page is unloaded.