Document Loaders

Load documents from various formats (text, JSON, CSV, HTML) for processing in RAG pipelines. All loaders are zero-dependency and run entirely in the browser.

Quick Start

import { loadDocument, loadDocuments } from '@localmode/core';

// Auto-detect format and load
const docs = await loadDocument(myFile);

// Load with explicit type
const csvDocs = await loadDocument(csvFile, { loader: 'csv', textColumn: 'content' });

// Batch load multiple sources
const allDocs = await loadDocuments([file1, file2, file3]);

Built-in Loaders

TextLoader

Load plain text content:

import { TextLoader, createTextLoader } from '@localmode/core';

const loader = new TextLoader();
const docs = await loader.load('Hello world');

// Or with options
const loader2 = createTextLoader({ trim: true, separator: '\n\n' });

Option	Type	Default	Description
`separator`	`string`	-	Split text by separator into multiple documents
`trim`	`boolean`	`false`	Trim whitespace from content
`encoding`	`string`	`'utf-8'`	Text encoding
`maxSize`	`number`	-	Maximum file size in bytes
`abortSignal`	`AbortSignal`	-	Cancellation signal

JSONLoader

Extract text from JSON structures:

import { JSONLoader, createJSONLoader } from '@localmode/core';

const loader = new JSONLoader();
const docs = await loader.load(jsonBlob);

// Extract specific fields
const loader2 = createJSONLoader({
  textFields: ['title', 'body'],
  fieldSeparator: '\n\n',
  recordsPath: 'data.articles',
});

Option	Type	Default	Description
`textFields`	`string[]`	-	Fields to extract text from
`extractAllStrings`	`boolean`	`false`	Extract from all string fields
`fieldSeparator`	`string`	`' '`	Separator when combining fields
`recordsPath`	`string`	-	Path to array of records (e.g., `'data.items'`)

CSVLoader

Load CSV/TSV data:

import { CSVLoader, createCSVLoader } from '@localmode/core';

const loader = createCSVLoader({
  textColumn: 'content',
  idColumn: 'id',
  hasHeader: true,
});
const docs = await loader.load(csvFile);

Option	Type	Default	Description
`textColumn`	`string \| number`	-	Column for text content
`textColumns`	`(string \| number)[]`	-	Multiple columns to combine
`columnSeparator`	`string`	`' '`	Separator for combined columns
`idColumn`	`string \| number`	-	Column for document IDs
`columnDelimiter`	`string`	`','`	Column delimiter
`rowDelimiter`	`string`	`'\n'`	Row delimiter
`hasHeader`	`boolean`	`true`	First row is header
`skipEmpty`	`boolean`	`false`	Skip empty rows

HTMLLoader

Extract text from HTML content:

import { HTMLLoader, createHTMLLoader } from '@localmode/core';

const loader = createHTMLLoader({
  selector: 'article',
  extractMetadata: true,
  ignoreTags: ['script', 'style', 'nav'],
});
const docs = await loader.load(htmlString);

Option	Type	Default	Description
`selector`	`string`	-	CSS selector to extract from
`selectors`	`string[]`	-	Multiple selectors
`extractMetadata`	`boolean`	`false`	Extract metadata from `<head>`
`preserveFormatting`	`boolean`	`false`	Preserve paragraph breaks
`ignoreTags`	`string[]`	-	Tags to ignore

DocumentLoader Interface

Implement custom loaders for any format:

import type { DocumentLoader, LoaderSource, LoadedDocument, LoaderOptions } from '@localmode/core';

class MyCustomLoader implements DocumentLoader {
  readonly supports = ['.custom', 'application/x-custom'];

  canLoad(source: LoaderSource): boolean {
    if (source instanceof File) {
      return source.name.endsWith('.custom');
    }
    return false;
  }

  async load(source: LoaderSource, options?: LoaderOptions): Promise<LoadedDocument[]> {
    const text = /* parse your format */;
    return [{
      id: crypto.randomUUID(),
      text,
      metadata: { source: 'custom-file', mimeType: 'application/x-custom' },
    }];
  }
}

LoadedDocument

interface LoadedDocument {
  id: string;
  text: string;
  metadata: {
    source: string;
    mimeType?: string;
    title?: string;
    pageCount?: number;
    sizeBytes?: number;
    [key: string]: unknown;
  };
}

Loader Registry

Create a custom registry with your own loaders for automatic format detection:

import { createLoaderRegistry, TextLoader, JSONLoader, CSVLoader } from '@localmode/core';

const registry = createLoaderRegistry([
  new TextLoader(),
  new JSONLoader(),
  new CSVLoader(),
  new MyCustomLoader(), // Your custom loader
]);

// Auto-detect format
const docs = await registry.load(someFile);

// Batch load
const allDocs = await registry.loadMany([file1, file2, file3]);

// Check which loader handles a source
const loader = registry.getLoader(someFile);

Accepted Source Types

All loaders accept these source types:

Source Type	Description
`string`	Raw text content
`Blob`	Binary blob
`ArrayBuffer`	Raw binary data
`File`	Browser File object (from input or drag-and-drop)
`{ type: 'url', url: string }`	URL to fetch
`{ type: 'custom', data: unknown }`	Custom source data