Proof of Concept

RAG over PDFs with citations

Upload documents, ask questions, get answers with page-level citations. A minimal implementation of retrieval-augmented generation.

How it works

1. Upload

Drop your PDFs. They get chunked and embedded into vectors.

2. Ask

Query your documents in natural language via chat.

3. Cite

Get answers with citations that jump to the exact PDF page.

Own your data

Bring your own API keys. Your documents and embeddings stay under your control.

Your API Keys

Use your own OpenAI, Anthropic, or other LLM provider keys. We never store them unencrypted.

Your Vector Store

Optionally bring your own Pinecone index. Embeddings live in your namespace.

Tenant Isolated

All queries are scoped to your workspace. No cross-tenant data leakage.

Under the hood

A straightforward RAG pipeline with no magic.

Ingestion

  • PDF text extraction per page
  • Chunking with overlap
  • Embeddings via OpenAI or compatible
  • Vectors stored in Pinecone (namespaced)

Retrieval

  • Query embedding + similarity search
  • Top-k chunks with metadata
  • LLM generates answer with citations
  • Citations link to page numbers in viewer
Proof of Concept

This is a demonstration of RAG with citations over PDFs. Built to explore the architecture, not for production use. Expect rough edges.

Gnosis

Proof of concept