Open Source · MIT Licensed

Visual document analysis,
on your hardware

Upload PDFs and images, ask questions, and get answers grounded in your documents. ColPali visual embeddings paired with any Ollama model — running entirely on your machine.

Light
RAG Lab interface — light mode
Dark
RAG Lab interface — dark mode

Everything you need for
document intelligence

From single-page lookups to batch analysis across hundreds of documents, RAG Lab handles the full workflow.

Visual RAG

ColPali embeddings understand tables, figures, and layouts natively. Optional OCR mode for text-preferring LLMs.

Hybrid Retrieval

Visual search + keyword matching with weighted score fusion. Citation snippets show exactly which text matched your query.

Any Ollama Model

Plug in any local model or connect to Ollama Cloud. Vision, text, and reasoning models all work.

Batch Processing

Feed it a stack of documents. Get per-document streaming responses as they complete.

Full-Document Summarization

Ask “summarize” and every page gets processed in sequential chunks. No page left behind.

Prompt Templates

Build custom extraction templates — K-1 line items, invoice fields, whatever your workflow needs.

Conversation Memory

Mem0 remembers context across chat sessions. Disabled during RAG to keep answers document-grounded.

Multi-Session

Separate workspaces for separate projects. Switch contexts without losing state.

Adaptive Retrieval

Score-slope analysis finds the right number of pages per query. No manual threshold tuning.

Built with modern tools

Frontend
SvelteKit 2, Svelte 5, TypeScript
Backend
FastAPI, Python 3.10+
Embeddings
ColQwen3.5 (ColPali)
Vector Store
LanceDB (multi-vector)
LLM Inference
Ollama (any local model)
Memory
Mem0 + ChromaDB
Auth
FastAPI-Users, JWT, argon2
Storage
SQLite (local, zero-config)

Up and running in minutes

For Linux and WSL2. Requires Python 3.10+, Node.js 18+, an NVIDIA GPU, and Ollama.

terminal
# Clone and install $ git clone https://github.com/inkind79/rag-lab.git $ cd rag-lab $ python -m venv venv && source venv/bin/activate $ pip install -r requirements.txt $ cd frontend && npm install && cd .. # Pull your models $ ollama pull gemma4 $ ollama pull nomic-embed-text $ ollama pull gemma3:4b # conversation memory # Configure and run (two terminals) $ cp .env.example .env $ uvicorn fastapi_app:app --host 127.0.0.1 --port 8000 $ cd frontend && npm run dev # Open http://localhost:5173 — register an account, then sign in

System requirements

GPU
NVIDIA 8GB+ VRAM
16GB+ recommended for larger models
RAM
16GB+
System memory
Storage
20GB+
Models + dependencies
OS
Linux / WSL2
Requires NVIDIA CUDA support