Inference Queue

When your application runs multiple AI features concurrently (search, indexing, classification), the inference queue ensures interactive tasks take priority over background work. Tasks execute in priority order with configurable concurrency.

See it in action

Try PDF Search for a working demo of these APIs.

createInferenceQueue()

Create a priority-based queue:

import { createInferenceQueue } from '@localmode/core';

const queue = createInferenceQueue({ concurrency: 1 });

InferenceQueueConfig

Prop

Type

queue.add()

Add a task to the queue. Returns a promise that resolves with the task result:

import { embed } from '@localmode/core';

// Runs at the highest priority (first in the priorities array)
const result = await queue.add(
  () => embed({ model, value: 'search query' })
);
console.log(result.embedding.length); // 384

// Interactive search — runs first
const searchResult = await queue.add(
  () => embed({ model, value: userQuery }),
  { priority: 'interactive' }
);

// Background indexing — yields to interactive tasks
queue.add(
  () => embedMany({ model, values: newDocuments }),
  { priority: 'background' }
);

// Prefetch — lowest priority, runs when idle
queue.add(
  () => embedMany({ model, values: predictedQueries }),
  { priority: 'prefetch' }
);

const controller = new AbortController();

const result = await queue.add(
  () => embed({ model, value: 'query' }),
  { priority: 'interactive', abortSignal: controller.signal }
);

// Cancel before the task starts (removes from queue)
controller.abort();

QueueAddOptions

Prop

Type

If a task is aborted while waiting in the queue, it is removed and its promise rejects with an AbortError. Tasks that are already executing are not interrupted by the queue — pass the signal into the task function itself for that.

Queue Stats

Monitor queue health with real-time statistics:

// Read stats at any time
const stats = queue.stats;
console.log(`Pending: ${stats.pending}, Active: ${stats.active}`);
console.log(`Completed: ${stats.completed}, Failed: ${stats.failed}`);
console.log(`Avg latency: ${stats.avgLatencyMs}ms`);

The stats event fires after every task completes or fails:

const unsubscribe = queue.on('stats', (stats) => {
  console.log(`Queue: ${stats.pending} pending, ${stats.active} active`);
  updateProgressBar(stats);
});

// Later: stop listening
unsubscribe();

QueueStats

Prop

Type

queue.clear() and queue.destroy()

// Remove all pending (not yet executing) tasks
queue.clear();

// Destroy the queue — rejects all pending tasks, prevents new additions
queue.destroy();

Example: Interactive Search vs Background Indexing

A common pattern where user-facing search always takes priority over document ingestion:

import { createInferenceQueue, embed, embedMany, semanticSearch } from '@localmode/core';

const queue = createInferenceQueue({
  concurrency: 1, // Single model, one operation at a time
  priorities: ['interactive', 'background'],
});

// User searches — interactive priority, runs immediately
async function search(query: string) {
  const { embedding } = await queue.add(
    () => embed({ model, value: query }),
    { priority: 'interactive' }
  );
  return db.search(embedding, { k: 10 });
}

// Background indexing — yields whenever a search comes in
async function indexDocuments(texts: string[]) {
  // Split into small batches so interactive tasks can interleave
  const batchSize = 10;
  for (let i = 0; i < texts.length; i += batchSize) {
    const batch = texts.slice(i, i + batchSize);
    await queue.add(
      () => embedMany({ model, values: batch }),
      { priority: 'background' }
    );
  }
}

Custom Priority Levels

Define your own priority hierarchy:

const queue = createInferenceQueue({
  concurrency: 2,
  priorities: ['critical', 'user', 'batch', 'idle'],
});

// 'critical' tasks always run before 'user', which run before 'batch', etc.
await queue.add(fn, { priority: 'critical' });
await queue.add(fn, { priority: 'idle' });

Passing a priority string that is not in the configured priorities array will reject the task's promise with an error.

React Integration

Use the useInferenceQueue hook from @localmode/react for queue management in React:

import { useInferenceQueue } from '@localmode/react';

function AIFeature() {
  const { queue, stats } = useInferenceQueue({ concurrency: 1 });

  const handleSearch = async () => {
    if (!queue) return;
    const result = await queue.add(
      () => embed({ model, value: searchText }),
      { priority: 'interactive' }
    );
    // Use result...
  };

  return (
    <div>
      <button onClick={handleSearch}>Search</button>
      <p>Pending: {stats.pending} | Active: {stats.active}</p>
    </div>
  );
}

The hook creates the queue on mount, subscribes to stats for reactive updates, and destroys the queue on unmount.

App	Description	Links
PDF Search	Priority queue for embedding and search operations	Demo · Source

Inference Queue

createInferenceQueue()

InferenceQueueConfig

queue.add()

QueueAddOptions

Queue Stats

QueueStats

queue.clear() and queue.destroy()

Example: Interactive Search vs Background Indexing

Custom Priority Levels

React Integration

Next Steps

Pipelines

Embeddings

Events

Showcase Apps

On this page