Reranking

Reranking improves the accuracy of RAG (Retrieval-Augmented Generation) pipelines by re-scoring documents based on their relevance to a query. After initial vector search retrieves candidates, reranking provides more precise ordering.

Why Rerank?

Vector search retrieves documents based on embedding similarity, but rerankers use cross-attention to directly score query-document pairs, often producing more accurate rankings for the final generation step.

Typical RAG pipeline:

Retrieve — Get 20-50 candidates via vector search (fast, approximate)
Rerank — Score and reorder candidates (precise, slower)
Generate — Use top 5-10 documents for LLM context

Basic Usage

import { rerank } from '@localmode/core';
import { transformers } from '@localmode/transformers';

// Create reranker model
const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');

const { results } = await rerank({
  model: rerankerModel,
  query: 'What is machine learning?',
  documents: [
    'Machine learning is a type of artificial intelligence...',
    'Cooking pasta requires boiling water...',
    'Deep learning is a subset of machine learning...',
  ],
  topK: 2,
});

// results: [
//   { index: 0, score: 0.95, text: 'Machine learning is a type of...' },
//   { index: 2, score: 0.88, text: 'Deep learning is a subset of...' }
// ]

RAG Pipeline Example

Perform Initial Vector Search

Retrieve more candidates than you need—reranking will filter to the best ones.

import { semanticSearch, createVectorDB, embed } from '@localmode/core';
import { transformers } from '@localmode/transformers';

const embeddingModel = transformers.embedding('Xenova/all-MiniLM-L6-v2');

// Get 20 candidates from vector search
const { embedding: queryVector } = await embed({
  model: embeddingModel,
  value: 'What is machine learning?',
});

const candidates = await db.search(queryVector, { k: 20 });

Rerank the Candidates

Score each document against the query for precise relevance ranking.

const rerankerModel = transformers.reranker('Xenova/ms-marco-MiniLM-L-6-v2');

const { results } = await rerank({
  model: rerankerModel,
  query: 'What is machine learning?',
  documents: candidates.map((c) => c.metadata.text),
  topK: 5, // Keep only top 5 after reranking
});

Use Top Results for Generation

Pass the reranked documents as context to your LLM.

const context = results.map((r) => r.text).join('\n\n');

const response = await streamText({
  model: languageModel,
  prompt: `Based on the following context, answer the question.

Context:
${context}

Question: What is machine learning?`,
});

API Reference

`rerank(options)`

Reranks documents by relevance to a query.

Prop

Type

Return Type: `RerankResult`

Prop

Type

`RankedDocument`

Prop

Type

Supported Models

Cross-encoder models score query-document pairs directly:

Model	Size	Speed	Quality	Use Case
`Xenova/ms-marco-MiniLM-L-6-v2`	23MB	Fast	Good	General purpose
`Xenova/ms-marco-MiniLM-L-12-v2`	33MB	Medium	Better	Higher accuracy
`Xenova/bge-reranker-base`	110MB	Slower	Best	Maximum quality

Choose based on your needs:

Speed-critical: Use ms-marco-MiniLM-L-6-v2 for fast inference
Balanced: Use ms-marco-MiniLM-L-12-v2 for good accuracy with reasonable speed
Quality-critical: Use bge-reranker-base when accuracy matters most

Start with ms-marco-MiniLM-L-6-v2—it's a great balance of speed and quality for most applications.

Cancellation Support

All reranking operations support AbortSignal for cancellation:

const controller = new AbortController();

// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);

try {
  const { results } = await rerank({
    model: rerankerModel,
    query: 'What is AI?',
    documents: largeDocumentSet,
    abortSignal: controller.signal,
  });
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Reranking was cancelled');
  }
}

Performance Tips

Optimize your reranking pipeline:

Limit candidates: Retrieve 20-50 candidates, not hundreds
Use topK: Only return the documents you need
Batch when possible: Rerank multiple queries together if your use case allows
Cache results: Consider caching reranked results for repeated queries

Custom Reranker Implementation

Implement the RerankerModel interface to create custom rerankers:

import type { RerankerModel, DoRerankOptions, DoRerankResult } from '@localmode/core';

class MyCustomReranker implements RerankerModel {
  readonly modelId = 'custom:my-reranker';
  readonly provider = 'custom';

  async doRerank(options: DoRerankOptions): Promise<DoRerankResult> {
    const { query, documents, topK } = options;

    // Your scoring logic here
    const scored = documents.map((doc, index) => ({
      index,
      score: this.scoreDocument(query, doc),
      text: doc,
    }));

    // Sort by score descending
    scored.sort((a, b) => b.score - a.score);

    // Apply topK
    const results = topK ? scored.slice(0, topK) : scored;

    return {
      results,
      usage: {
        inputTokens: query.length + documents.join('').length,
        durationMs: 0,
      },
    };
  }

  private scoreDocument(query: string, document: string): number {
    // Implement your scoring logic
    return 0.5;
  }
}

Reranking

On this page