LocalMode
Core

Document Loaders

Load and parse documents from various sources for RAG pipelines.

Load documents from various formats (text, JSON, CSV, HTML) for processing in RAG pipelines. All loaders are zero-dependency and run entirely in the browser.

Quick Start

import { loadDocument, loadDocuments } from '@localmode/core';

// Auto-detect format and load
const docs = await loadDocument(myFile);

// Load with explicit type
const csvDocs = await loadDocument(csvFile, { loader: 'csv', textColumn: 'content' });

// Batch load multiple sources
const allDocs = await loadDocuments([file1, file2, file3]);

Built-in Loaders

TextLoader

Load plain text content:

import { TextLoader, createTextLoader } from '@localmode/core';

const loader = new TextLoader();
const docs = await loader.load('Hello world');

// Or with options
const loader2 = createTextLoader({ trim: true, separator: '\n\n' });
OptionTypeDefaultDescription
separatorstring-Split text by separator into multiple documents
trimbooleanfalseTrim whitespace from content
encodingstring'utf-8'Text encoding
maxSizenumber-Maximum file size in bytes
abortSignalAbortSignal-Cancellation signal

JSONLoader

Extract text from JSON structures:

import { JSONLoader, createJSONLoader } from '@localmode/core';

const loader = new JSONLoader();
const docs = await loader.load(jsonBlob);

// Extract specific fields
const loader2 = createJSONLoader({
  textFields: ['title', 'body'],
  fieldSeparator: '\n\n',
  recordsPath: 'data.articles',
});
OptionTypeDefaultDescription
textFieldsstring[]-Fields to extract text from
extractAllStringsbooleanfalseExtract from all string fields
fieldSeparatorstring' 'Separator when combining fields
recordsPathstring-Path to array of records (e.g., 'data.items')

CSVLoader

Load CSV/TSV data:

import { CSVLoader, createCSVLoader } from '@localmode/core';

const loader = createCSVLoader({
  textColumn: 'content',
  idColumn: 'id',
  hasHeader: true,
});
const docs = await loader.load(csvFile);
OptionTypeDefaultDescription
textColumnstring | number-Column for text content
textColumns(string | number)[]-Multiple columns to combine
columnSeparatorstring' 'Separator for combined columns
idColumnstring | number-Column for document IDs
columnDelimiterstring','Column delimiter
rowDelimiterstring'\n'Row delimiter
hasHeaderbooleantrueFirst row is header
skipEmptybooleanfalseSkip empty rows

HTMLLoader

Extract text from HTML content:

import { HTMLLoader, createHTMLLoader } from '@localmode/core';

const loader = createHTMLLoader({
  selector: 'article',
  extractMetadata: true,
  ignoreTags: ['script', 'style', 'nav'],
});
const docs = await loader.load(htmlString);
OptionTypeDefaultDescription
selectorstring-CSS selector to extract from
selectorsstring[]-Multiple selectors
extractMetadatabooleanfalseExtract metadata from <head>
preserveFormattingbooleanfalsePreserve paragraph breaks
ignoreTagsstring[]-Tags to ignore

DocumentLoader Interface

Implement custom loaders for any format:

import type { DocumentLoader, LoaderSource, LoadedDocument, LoaderOptions } from '@localmode/core';

class MyCustomLoader implements DocumentLoader {
  readonly supports = ['.custom', 'application/x-custom'];

  canLoad(source: LoaderSource): boolean {
    if (source instanceof File) {
      return source.name.endsWith('.custom');
    }
    return false;
  }

  async load(source: LoaderSource, options?: LoaderOptions): Promise<LoadedDocument[]> {
    const text = /* parse your format */;
    return [{
      id: crypto.randomUUID(),
      text,
      metadata: { source: 'custom-file', mimeType: 'application/x-custom' },
    }];
  }
}

LoadedDocument

interface LoadedDocument {
  id: string;
  text: string;
  metadata: {
    source: string;
    mimeType?: string;
    title?: string;
    pageCount?: number;
    sizeBytes?: number;
    [key: string]: unknown;
  };
}

Loader Registry

Create a custom registry with your own loaders for automatic format detection:

import { createLoaderRegistry, TextLoader, JSONLoader, CSVLoader } from '@localmode/core';

const registry = createLoaderRegistry([
  new TextLoader(),
  new JSONLoader(),
  new CSVLoader(),
  new MyCustomLoader(), // Your custom loader
]);

// Auto-detect format
const docs = await registry.load(someFile);

// Batch load
const allDocs = await registry.loadMany([file1, file2, file3]);

// Check which loader handles a source
const loader = registry.getLoader(someFile);

Accepted Source Types

All loaders accept these source types:

Source TypeDescription
stringRaw text content
BlobBinary blob
ArrayBufferRaw binary data
FileBrowser File object (from input or drag-and-drop)
{ type: 'url', url: string }URL to fetch
{ type: 'custom', data: unknown }Custom source data

Next Steps

On this page