RAG ๐Ÿšง In Progress

๐Ÿ“„ Document Q&A
RAG Pipeline

A production-ready Retrieval-Augmented Generation pipeline for semantic querying across 1K+ PDF documents โ€” FAISS vector search, transformer embeddings, grounded answers with source citations.

โš  Demo runs on Streamlit Community Cloud with a sample PDF. Full 1K+ doc pipeline runs locally.

BLEU
0.81
DOCS
1K+
LATENCY
1.2s
Streamlit RAG demo interface showing PDF upload, query input, and generated answer with source citations
Streamlit Cloud ยท Query + Source Citation Interface
๐Ÿšง
Under Construction โ€” Full Pipeline Coming Soon
The demo version is live with core RAG features. BLEU evaluation, 1K+ doc corpus, hybrid retrieval, re-ranking, and Docker deployment are actively being built.
~55% complete ยท demo live ยท evaluation in progress
๐Ÿ“‹ RAG Pipeline โ€” Model Card
Dataset 1K+ PDFs, semantic chunking (512 tokens)
Embeddings all-MiniLM-L6-v2 via HuggingFace
Vector Store FAISS (flat index, Top-5 retrieval)
LLM Llama 3.3 70B via Groq API
Metrics BLEU 0.81 ยท Latency 1.2s ยท 1K+ docs
Baseline Keyword search BLEU 0.41 โ†’ Ours โ†’ 0.81
PDF โ†’ chunker โ†’ embeddings โ†’ FAISS โ†’ LLM โ†’ answer + sources
LangChain FAISS HuggingFace Python PyPDF Streamlit Groq
โœ… Done โ€” Demo Version
โœ“
Streamlit UI โ€” file uploader, query input, results display
โœ“
Multi-PDF upload and batch ingestion
โœ“
Semantic chunking โ€” 512 tokens, 50 overlap
โœ“
HuggingFace embeddings (all-MiniLM-L6-v2)
โœ“
FAISS vector store โ€” build, persist, reload
โœ“
Top-5 similarity retrieval with relevance scores
โœ“
Context injection into LLM prompt (RAG)
โœ“
Llama 3.3 70B answer generation via Groq API
โœ“
Source citations โ€” document name + page number
โœ“
Query metrics โ€” retrieval time, latency, chunk count
โœ“
Chunk explorer โ€” inspect retrieved chunks
โœ“
One-click sample PDF loader (no upload required)
๐Ÿšง In Progress
โ†’
BLEU evaluation framework with labeled Q&A test set
โ†’
Latency optimization โ€” targeting <1.2s end-to-end
โ†’
Scale testing across 1K+ document corpus
๐Ÿ“‹ Planned
ยท
Hybrid retrieval โ€” BM25 sparse + FAISS dense fusion
ยท
Re-ranking layer โ€” cross-encoder on top-k results
ยท
Streaming responses โ€” token-by-token generation
ยท
Docker containerization โ€” one-command deployment
ยท
MLflow experiment tracking
LC
LangChain
Orchestration
FS
FAISS
Vector Store
HF
HuggingFace
Embeddings
GQ
Groq
LLM API
ST
Streamlit
Frontend
PY
Python 3.10+
Core