Open-Source npm Package

react-native-pageindex

Vectorless RAG. Reasoning-based document indexing.

Build hierarchical tree indexes from PDFs, Word docs, spreadsheets, CSV, and Markdown — using any LLM provider. Fast local search, offline retrieval, and conversational AI over your documents. No vector database required.

$ npm install react-native-pageindex

No vector database needed Works offline with keyword mode Any LLM provider — OpenAI, Claude, Ollama, Gemini

View on npm GitHub Repository

What is react-native-pageindex?

Document indexing that thinks, not just embeds

react-native-pageindex is a TypeScript library that builds structured, hierarchical indexes from documents — PDFs, Word files, CSV, Excel, and Markdown. Instead of converting text into opaque vector embeddings, it uses LLM reasoning to understand document structure and produce a navigable tree that maps chapters, sections, and subsections with precise page attribution.

The library was built to solve a specific problem: making document content genuinely searchable and retrievable inside mobile and edge applications, without requiring a vector database or constant network access. Traditional RAG pipelines depend on embedding models, vector stores, and cloud infrastructure. That works for server-side systems, but it falls apart when you need retrieval on a phone in a field with no signal.

react-native-pageindex takes a different approach. It produces a forward index (the hierarchical tree) and an inverted reverse index (term-to-node mappings using TF-IDF scoring). The keyword mode requires zero LLM calls — it runs entirely locally with no API key, no network, and no dependencies beyond the library itself. When you do have LLM access, the full pipeline adds summaries, semantic understanding, and richer structure.

The result is a library that fits cleanly into offline-first architectures, on-device AI systems, and retrieval-augmented workflows where you need grounded answers with source attribution, not just similarity scores.

Hierarchical Tree Index

LLM-reasoned document structure — chapters, sections, subsections — with page ranges, node IDs, and optional summaries.

Inverted Reverse Index

TF-IDF-based term-to-node mappings for fast keyword search. No LLM required. Runs entirely on-device.

Conversational Chat

Multi-turn Q&A with cited answers backed by the reverse index. Grounded context from actual page text, not hallucinated responses.

Why It Matters

Real benefits for real applications

Fast Local Retrieval

The keyword-mode reverse index delivers instant search results using TF-IDF scoring — no network round-trips, no embedding computation, no latency from cloud lookups.

Offline-First Architecture

Build your index once (with or without an LLM), then search and retrieve locally indefinitely. The entire retrieval loop works with zero connectivity after indexing.

Low-Latency Mobile Experience

On-device indexing and search mean results appear as fast as the user can type. No waiting for API responses. No loading spinners for basic document lookups.

Privacy Through Local Processing

When using keyword mode, document content never leaves the device. No text is sent to external servers. Sensitive documents stay where they belong — on the user's device.

Works in Low-Connectivity Environments

Field workers, remote clinics, agricultural advisors, and frontline teams need tools that work where networks don't. Keyword mode delivers full search capability without any connection.

Practical for Real Industries

Healthtech, agritech, education, enterprise field apps, and legal document systems — anywhere structured document access matters more than internet availability.

Who Should Use This

Built for teams that ship real products

React Native Developers

Drop in document indexing and search to any React Native app. TypeScript-first, fully typed, and designed to work with the React Native ecosystem and Metro bundler.

Mobile Architects

Add structured document retrieval to your mobile architecture without introducing vector databases, embedding pipelines, or cloud dependencies you don't need.

AI / Mobile Product Teams

Build retrieval-augmented features — document Q&A, knowledge assistants, cited answers — into your mobile product using any LLM provider your team already uses.

CTOs & Engineering Leaders

Evaluate a practical, production-oriented approach to on-device document intelligence and offline-capable retrieval that reduces infrastructure complexity.

Startups Building Retrieval Apps

Ship document-powered features faster. Skip the vector database setup. Use keyword mode for zero-cost retrieval, upgrade to LLM mode when you need semantic depth.

Organizations in Low-Connectivity Regions

Serve users in rural areas, emerging markets, and field environments where reliable internet is not guaranteed. The keyword pipeline works entirely offline.

Core Capabilities

What you can build with it

Multi-Format Indexing

Index PDFs, Word documents (.docx), CSV files, Excel spreadsheets (.xlsx/.xls), and Markdown. One unified API — pageIndexDocument — handles format detection and extraction automatically.

Hierarchical Tree Structure

The forward index produces a navigable tree: chapters, sections, subsections, each with title, page range, node ID, and optional LLM-generated summary. Not a flat list — a real structure.

Fast Reverse Index Search

buildReverseIndex creates an inverted index mapping terms to tree nodes with TF-IDF scoring. searchReverseIndex returns ranked results with matched terms, scores, and page ranges instantly.

LLM-Reasoned Structure

In full LLM mode, the library uses your LLM provider to reason about document structure — detecting tables of contents, verifying accuracy, resolving large sections, and generating per-node summaries.

Conversational Document Chat

Multi-turn Q&A over indexed documents. Retrieves relevant nodes via the reverse index, builds grounded context from page text, and returns cited answers with source metadata.

Provider-Agnostic LLM Interface

Bring your own LLM. The LLMProvider interface accepts any provider — OpenAI, Anthropic Claude, Google Gemini, Ollama (local), or custom endpoints. One interface, any model.

Zero-Dependency Keyword Mode

The keyword pipeline (CSV, Markdown) requires no external dependencies. No pdfjs-dist, no API keys, no network. Pure TypeScript TF-IDF indexing that runs anywhere JavaScript runs.

Fine-Grained Progress Tracking

Both PDF (13 steps) and Markdown (8 steps) pipelines emit real-time progress callbacks with step name, percentage, and detail. Build responsive UIs with accurate loading states.

Use Cases

Where it works in the real world

From field advisory tools to enterprise document assistants, react-native-pageindex fits wherever structured document access matters.

Offline Health Assistants

Clinical guides searchable without internet

Agritech Field Advisory

Crop manuals and farming guides for rural workers

Enterprise Document Search

Internal knowledge bases on mobile devices

Educational Content Apps

Textbook and course material indexing

Support & Troubleshooting

Technical manuals with instant local search

Document-Driven Assistants

AI Q&A backed by real document sources

Edge AI Retrieval Patterns

On-device RAG without vector infrastructure

Part of AppScale

How it fits into AppScale's approach

react-native-pageindex is not a demo project. It reflects how AppScale approaches product engineering — building tools that solve real architectural problems, not theoretical ones. We work with mobile systems, AI integration, and offline-first design every day. This library came from that work.

AppScale builds for constrained environments — limited connectivity, edge devices, mobile-first users. react-native-pageindex is a direct expression of that philosophy: practical, lightweight, designed for production environments where infrastructure assumptions don't hold.

Practical architecture over theoretical frameworks

AI on mobile — grounded, not performative

Offline-first as a core design principle

Scalable product engineering for real users

Built for environments where cloud is optional

Developer Trust

Built by someone who ships

Architect-Built

Created by an engineer with hands-on experience across mobile platforms, backend systems, and AI integration — not a weekend experiment, but a considered solution to a recurring architectural problem.

Production-Minded Design

TypeScript throughout, comprehensive type declarations, semantic versioning, and a clean API surface. Designed for teams that review code before they ship it.

Open Source, MIT Licensed

Fully open on GitHub. Inspect the code, read the implementation, fork it, extend it. No black boxes. No vendor lock-in. 100% transparent.

Practical, Not Performative

Every feature exists because a real use case demanded it. No bloat, no buzzword-driven additions. CSV keyword indexing works with zero dependencies in under a second.

FAQs

Common questions

It takes documents — PDFs, Word files, CSVs, Excel spreadsheets, and Markdown — and builds two indexes: a hierarchical forward index (tree structure with chapters, sections, and page ranges) and an inverted reverse index (term-to-node mappings for fast search). You can then search the index locally or run multi-turn conversational Q&A over the indexed content.

Yes. The keyword mode builds and searches the reverse index using TF-IDF scoring with zero LLM calls and zero network access. Build the index once, and the entire search and retrieval loop works offline indefinitely. The LLM mode requires an API call during indexing, but once the index is built, keyword search over it is fully offline.

React Native developers, mobile architects, AI product teams, and anyone building document-driven features in mobile or edge applications. It's particularly useful for teams serving users in low-connectivity environments — healthtech, agritech, education, enterprise field tools.

Traditional RAG converts text into vector embeddings and stores them in a vector database for similarity search. react-native-pageindex skips embeddings entirely. It uses LLM reasoning to build a structured tree, then creates a TF-IDF inverted index for search. No vector database, no embedding model, no similarity computation. The tradeoff is that keyword search is exact-match rather than semantic — but it's instant, offline, and has zero infrastructure requirements.

Any provider that can respond to a text prompt. The library defines a simple LLMProvider interface — you implement a function that takes a prompt and returns a response. The README includes ready-to-use examples for OpenAI, Anthropic Claude, Google Gemini, and Ollama (fully local). You can wire up any model or endpoint.

Yes. The built-in chat pattern follows a standard RAG loop: retrieve relevant nodes via searchReverseIndex, build grounded context from page text or node summaries, pass it to your LLM with conversation history, and return cited answers with source metadata. The library provides the retrieval and context layers — you bring the LLM.

All three. The core library is pure TypeScript with no platform-specific dependencies. It works in React Native, Node.js, and browser environments. The demo app is a Vite + React app that runs entirely in the browser. PDF extraction uses pdfjs-dist which works in browsers and Node.js.

Keyword mode is near-instant — a 100-row CSV indexes in under a second with zero API calls. LLM mode depends on your provider and document size. A 32-page PDF takes roughly 3-4 minutes with gpt-4o-mini due to the multi-step reasoning pipeline (TOC detection, verification, summary generation). Once indexed, search is always instant.

The library is at version 0.1.x, following semantic versioning. The API surface is stable and fully typed. It's designed for production use in architectures where offline search and structured document retrieval are needed. As with any early-version library, it's worth evaluating against your specific requirements.

When your users don't have reliable internet. When you need search results without network latency. When documents are sensitive and shouldn't leave the device. When you want retrieval without infrastructure overhead. When you're building for mobile-first, offline-first, or edge environments. If your users always have fast, reliable connections and you're already invested in cloud search infrastructure, a cloud service may be more appropriate.

Get Started

Try react-native-pageindex

Install from npm. Index a document. Search it locally. No vector database, no cloud setup, no configuration ceremony.

View on npm GitHub Repository Get in Touch