Inference Queue
Priority-based task scheduling for multi-feature AI applications.
When your application runs multiple AI features concurrently (search, indexing, classification), the inference queue ensures interactive tasks take priority over background work. Tasks execute in priority order with configurable concurrency.
See it in action
Try PDF Search for a working demo of these APIs.
createInferenceQueue()
Create a priority-based queue:
import { createInferenceQueue } from '@localmode/core';
const queue = createInferenceQueue({ concurrency: 1 });InferenceQueueConfig
Prop
Type
queue.add()
Add a task to the queue. Returns a promise that resolves with the task result:
import { embed } from '@localmode/core';
// Runs at the highest priority (first in the priorities array)
const result = await queue.add(
() => embed({ model, value: 'search query' })
);
console.log(result.embedding.length); // 384// Interactive search — runs first
const searchResult = await queue.add(
() => embed({ model, value: userQuery }),
{ priority: 'interactive' }
);
// Background indexing — yields to interactive tasks
queue.add(
() => embedMany({ model, values: newDocuments }),
{ priority: 'background' }
);
// Prefetch — lowest priority, runs when idle
queue.add(
() => embedMany({ model, values: predictedQueries }),
{ priority: 'prefetch' }
);const controller = new AbortController();
const result = await queue.add(
() => embed({ model, value: 'query' }),
{ priority: 'interactive', abortSignal: controller.signal }
);
// Cancel before the task starts (removes from queue)
controller.abort();QueueAddOptions
Prop
Type
If a task is aborted while waiting in the queue, it is removed and its promise rejects with an AbortError. Tasks that are already executing are not interrupted by the queue — pass the signal into the task function itself for that.
Queue Stats
Monitor queue health with real-time statistics:
// Read stats at any time
const stats = queue.stats;
console.log(`Pending: ${stats.pending}, Active: ${stats.active}`);
console.log(`Completed: ${stats.completed}, Failed: ${stats.failed}`);
console.log(`Avg latency: ${stats.avgLatencyMs}ms`);Subscribe to Stats Events
The stats event fires after every task completes or fails:
const unsubscribe = queue.on('stats', (stats) => {
console.log(`Queue: ${stats.pending} pending, ${stats.active} active`);
updateProgressBar(stats);
});
// Later: stop listening
unsubscribe();QueueStats
Prop
Type
queue.clear() and queue.destroy()
// Remove all pending (not yet executing) tasks
queue.clear();
// Destroy the queue — rejects all pending tasks, prevents new additions
queue.destroy();Example: Interactive Search vs Background Indexing
A common pattern where user-facing search always takes priority over document ingestion:
import { createInferenceQueue, embed, embedMany, semanticSearch } from '@localmode/core';
const queue = createInferenceQueue({
concurrency: 1, // Single model, one operation at a time
priorities: ['interactive', 'background'],
});
// User searches — interactive priority, runs immediately
async function search(query: string) {
const { embedding } = await queue.add(
() => embed({ model, value: query }),
{ priority: 'interactive' }
);
return db.search(embedding, { k: 10 });
}
// Background indexing — yields whenever a search comes in
async function indexDocuments(texts: string[]) {
// Split into small batches so interactive tasks can interleave
const batchSize = 10;
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize);
await queue.add(
() => embedMany({ model, values: batch }),
{ priority: 'background' }
);
}
}Custom Priority Levels
Define your own priority hierarchy:
const queue = createInferenceQueue({
concurrency: 2,
priorities: ['critical', 'user', 'batch', 'idle'],
});
// 'critical' tasks always run before 'user', which run before 'batch', etc.
await queue.add(fn, { priority: 'critical' });
await queue.add(fn, { priority: 'idle' });Passing a priority string that is not in the configured priorities array will reject the task's promise with an error.
React Integration
Use the useInferenceQueue hook from @localmode/react for queue management in React:
import { useInferenceQueue } from '@localmode/react';
function AIFeature() {
const { queue, stats } = useInferenceQueue({ concurrency: 1 });
const handleSearch = async () => {
if (!queue) return;
const result = await queue.add(
() => embed({ model, value: searchText }),
{ priority: 'interactive' }
);
// Use result...
};
return (
<div>
<button onClick={handleSearch}>Search</button>
<p>Pending: {stats.pending} | Active: {stats.active}</p>
</div>
);
}The hook creates the queue on mount, subscribes to stats for reactive updates, and destroys the queue on unmount.
Next Steps
Pipelines
Compose multi-step AI workflows.
Embeddings
Generate embeddings for text and semantic search.
Events
Type-safe event system for reactive updates.