← Back to Models

BERT NER & Reranking Models in the Browser

Specialized BERT models for named entity recognition (NER) and search result reranking in the browser.

BERT NER & Reranking Models in the Browser

Specialized BERT models for named entity recognition (NER) and search result reranking in the browser.

Overview

The BERT NER & Reranking family is available through Transformers.js in LocalMode, with model sizes ranging from 23MB–279MB. The primary task for these models is ner, and they can be used with any application built on the LocalMode SDK.

Running BERT NER & Reranking models locally in the browser eliminates API costs, removes network latency, and keeps all user data on-device. After the initial model download, inference is instant and works offline. Each model variant targets a different trade-off between size, speed, and quality - choose based on your users' device capabilities and your application's requirements.

Architecture and History

Named entity recognition and search reranking are two specialized NLP tasks that benefit enormously from browser-local inference. BERT-base-NER (110MB) detects persons, organizations, locations, and miscellaneous entities in text using BIO tagging. It powers LocalMode's extractEntities() function and is essential for applications like contract analysis, content tagging, and data extraction.

MiniLM-L-6-v2 for reranking (23MB) is a cross-encoder model that scores query-document pairs for relevance. Unlike embedding similarity (which computes vectors independently), cross-encoders attend to both query and document simultaneously, producing more accurate relevance scores. In a typical RAG pipeline, you'd use embeddings for initial retrieval (fast, approximate) then rerank the top results with MiniLM for precision (slower, exact).

Both models are remarkably compact - MiniLM at 23MB is one of the smallest useful models in the entire catalog. Together they enable a complete information extraction and retrieval pipeline running entirely in the browser: extract entities from documents, embed and store them, search with semantic similarity, then rerank results for maximum precision.

Variant Comparison

The following table lists every BERT NER & Reranking variant available through LocalMode, across all supported providers. Click a model ID to view its HuggingFace model card.

Model IDProviderSizeSpeedQualityContextDevice
Xenova/bert-base-NERTransformers.js110MBMediumGood-WASM
Xenova/ms-marco-MiniLM-L-6-v2Transformers.js23MBFastGood-WASM
Xenova/bert-base-multilingual-cased-ner-hrlTransformers.js178MBMediumGood-WASM
Xenova/bge-reranker-baseTransformers.js279MBMediumHigh-WASM

Size Distribution

Size RangeCount
Under 200MB3variants
200MB–500MB1variant

How to choose a variant: Start with the smallest model that meets your quality requirements. For prototyping and development, use the fastest variant (smallest size, "Fast" speed tier). For production, test your specific use case against 2–3 variants and measure the quality difference against user expectations. In many applications, users cannot distinguish between "Good" and "High" quality tiers - the smaller model saves download time and memory.

Provider-Specific Code Examples

NER variants use the NERModel interface and reranker variants use the RerankerModel interface, both from @localmode/core. Switching between providers requires changing only the import and model ID - no application logic changes.

Transformers.js

Transformers.js runs ONNX-optimized models via ONNX Runtime Web. WebGPU acceleration where available, WASM fallback otherwise.

import { transformers } from '@localmode/transformers';

const model = transformers.ner('Xenova/bert-base-NER');
// Use the model with the corresponding @localmode/core function

Choosing Between NER and Reranker

Use transformers.ner() for entity extraction and transformers.reranker() for search result reranking - these serve different tasks and use different core interfaces.

import { transformers } from '@localmode/transformers';

// Named entity recognition (NERModel interface)
const nerModel = transformers.ner('Xenova/bert-base-NER');

// Search result reranking (RerankerModel interface)
const reranker = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');

When to Use BERT NER & Reranking

BERT NER & Reranking models are a strong choice when:

  • You need ner - BERT NER & Reranking is optimized for ner tasks with models across multiple size tiers.
  • Browser compatibility matters - Available through 1 provider (transformers), ensuring coverage across Chrome, Firefox, Safari, and Edge.
  • Size flexibility is important - The 23MB–279MB range means you can target everything from mobile devices to high-end desktops with the same model family.

HuggingFace Model Cards

Methodology

Model sizes were verified against the actual ONNX file listings in each Xenova HuggingFace repository (specifically model_quantized.onnx in the /onnx subfolder, which is the default file loaded by Transformers.js). Interface names (NERModel, RerankerModel) and API method names (transformers.ner(), transformers.reranker()) were verified against the LocalMode source code in packages/transformers/src/provider.ts and packages/core/src/classification/types.ts. Entity type labels and language counts were verified against the upstream base model cards (dslim/bert-base-NER, Davlan/bert-base-multilingual-cased-ner-hrl, cross-encoder/ms-marco-MiniLM-L6-v2, BAAI/bge-reranker-base). Performance tiers (speed/quality) are LocalMode's curated assessments - always benchmark on target devices before production deployment.

Sources