🔬 Vikash Innovative Tech — RAG Complete Course · Basic to Advanced

Master Retrieval-Augmented Generation

A fully dynamic learning experience where RAG comes alive. Adjust chunk sizes, visualize embeddings, simulate retrieval, and watch the entire pipeline react in real time. From beginner clarity to production-grade mastery — all explained in the simplest possible way.From raw documents to a production-grade AI system that answers with precision. Every step explained, every concept interactive, every algorithm visualized live.

▶ Start Learning 🎮 Open Lab
📄
Load
🧹
Clean
✂️
Chunk
🔢
Embed
🗄️
Store
🔍
Retrieve
🤖
Generate
Chapter 00 — Foundation

What is RAG & Why It Exists

LLMs hallucinate and know nothing about your private data. RAG solves both problems permanently.

97%
ACCURACY LIFT
~0
HALLUCINATIONS*
KNOWLEDGE FRESHNESS
5ms
RETRIEVAL LATENCY
🧠

Problem 1 — Private Data

LLMs train on public internet data. They have zero knowledge of your company documents, internal policies, Vikash Innovative Tech product specs, customer data, or anything created after their training cutoff. Ask them about your business and they'll confidently fabricate answers.

🎭

Problem 2 — Hallucinations

LLMs generate statistically plausible text, not factually verified text. They will invent citations, make up statistics, and state wrong dates with total confidence. In production systems, this is catastrophic and litigious.

Solution — RAG

Think of it as an open-book exam. Before answering any question, the system retrieves the most relevant pages from your document library. The LLM sees those pages and answers only from that retrieved context — grounded, sourced, verifiable.

THE CORE IDEA — ONE SENTENCE
Retrieve the most relevant chunks of your data, inject them as context into the prompt, and let the LLM answer only from that context.
Chapter 00.5 — Architecture

End-to-End RAG Architecture

Click any stage to jump straight to its deep-dive section with code and interactive demo.

📄
Doc
Loading
🧹
Text
Preprocessing
✂️
Chunking
🔢
Embeddings
🗄️
Vector
Store
🔍
Retrieval
🤖
LLM +
Generate
Chapter 01 — The Pipeline

7 Steps, Zero Gaps

Every step explained with production-grade Python code, gotchas, and live demos.

Step 01 / Document Loading

Get Text Out
of Your Files

Before anything can happen, we need raw text. PDFs, Word docs, plain text, HTML — all become a standardized Python dict: { "content": text, "metadata": {...} }. That's the atomic unit of RAG.

💡

What is a document in RAG? A Python dict with two keys: content (the raw text) and metadata (source filename, page number, section, etc.). Everything downstream depends on this structure being consistent.

⚠️

Metadata is gold. Always attach source filename and page number. When the LLM answers a Vikash Innovative Tech policy question, the user can see exactly which document page it came from. That's what makes RAG trustworthy.

📂

Supported loaders: pdfplumber / pypdf (PDF), python-docx (Word), beautifulsoup4 (HTML), csv reader, direct open() for .txt and .md. Use the right loader for each file type.

step_01_load.py
# ── Step 01: Document Loading ──────────────────────
# Vikash Innovative Tech — RAG Pipeline
import pdfplumber
import os

def load_pdf(file_path: str) -> list[dict]:
    """Load PDF and extract text page-by-page."""
    documents = []
    with pdfplumber.open(file_path) as pdf:
        for i, page in enumerate(pdf.pages):
            text = page.extract_text()
            if text and len(text.strip()) > 20:
                documents.append({
                    "content": text,
                    "metadata": {
                        "source": file_path,
                        "page": i + 1,
                        "type": "pdf",
                        "company": "Vikash Innovative Tech"
                    }
                })
    return documents

def load_text(path: str) -> list[dict]:
    """Load plain text file as single document."""
    with open(path, "r", encoding="utf-8") as f:
        text = f.read()
    return [{
        "content": text,
        "metadata": {
            "source": path,
            "page": 1,
            "type": "text"
        }
    }]

def load_directory(dir_path: str) -> list[dict]:
    """Load all supported files in a directory."""
    loaders = {".pdf": load_pdf, ".txt": load_text}
    all_docs = []
    for fn in os.listdir(dir_path):
        ext = os.path.splitext(fn)[1]
        if ext in loaders:
            docs = loaders[ext](os.path.join(dir_path, fn))
            all_docs.extend(docs)
    return all_docs

# ── Usage ──────────────────────────────────────────
docs = load_pdf("vik_policy.pdf")
print(f"Loaded {len(docs)} pages")
# → Loaded 4 pages
# docs[0] = {
#   "content": "Vikash Innovative Tech\nWarranty Policy...",
#   "metadata": {"source": "vik_policy.pdf", "page": 1, ...}
# }
Step 02 / Text Preprocessing

Clean the
Noisy Raw Text

Raw PDF text is messy — broken sentences, double spaces, weird Unicode, stray headers. Embeddings work better on clean text. This step is unglamorous but critical.

🧹

What to fix: Multiple whitespace → single space. Mid-sentence line breaks → spaces. Non-printable chars → strip. Page headers/footers → remove. The goal: clean, readable prose.

⚠️

Don't over-clean! Removing too aggressively destroys semantic content. Keep punctuation, numbers, proper nouns, and acronyms intact. Vikash Innovative Tech product names must survive cleaning.

✗ BEFORE
"Vikash Innovative Tech\n policy allows 24 paid\nleaves per year. Contact\nHR@vikash .com"
✓ AFTER
"Vikash Innovative Tech policy allows 24 paid leaves per year. Contact HR@vikash.com"
step_02_preprocess.py
# ── Step 02: Text Preprocessing ──────────────────
import re
import unicodedata

def clean_text(text: str) -> str:
    """Normalize text for embedding quality."""
    # Normalize unicode (e.g. curly quotes → straight)
    text = unicodedata.normalize("NFKC", text)
    # Fix mid-sentence line breaks
    text = re.sub(r'(?<![.!?])\n(?=[a-z])', ' ', text)
    # Collapse multiple spaces / tabs
    text = re.sub(r'[ \t]+', ' ', text)
    # Normalize multiple newlines → double
    text = re.sub(r'\n{3,}', '\n\n', text)
    # Strip non-printable characters
    text = re.sub(r'[^\x20-\x7E\n]', '', text)
    # Remove page numbers / standalone numbers
    text = re.sub(r'^\d+\s*$', '', text, flags=re.MULTILINE)
    return text.strip()

def preprocess_documents(docs: list) -> list:
    """Clean all documents in pipeline."""
    cleaned = []
    for doc in docs:
        clean = clean_text(doc["content"])
        if len(clean) > 30:  # skip near-empty pages
            doc["content"] = clean
            cleaned.append(doc)
    return cleaned

# ── Usage ──────────────────────────────────────────
cleaned_docs = preprocess_documents(docs)
print(f"Cleaned: {len(cleaned_docs)} docs remain")
print(cleaned_docs[0]["content"][:200])
# → Vikash Innovative Tech Warranty Policy
#   All laptops come with a 1-year limited warranty...
Step 03 / Chunking

Split Into
Smart Pieces

Documents are too long to embed directly. We split into "chunks" — this is the single most important decision in RAG. Wrong chunk size = bad retrieval = wrong answers.

⚖️

The Core Trade-off: Chunks too large → retrieval brings back irrelevant noise. Chunks too small → you lose semantic context and answers become shallow. Sweet spot: 200–400 chars for most docs.

🔀

Overlap matters. A 50–80 char overlap between adjacent chunks ensures sentences at boundaries are never cut off. Always use overlap in production.

🌿

3 Strategies: Fixed Size (equal chunks), Sliding Window (overlapping windows), Recursive Splitter (paragraph→sentence→word priority). Full interactive lab below ↓

🎮 Open Full Chunking Lab
step_03_chunk.py
# ── Step 03: Chunking (all 3 strategies) ─────────
import re

# ── Strategy 1: Fixed Size ────────────────────────
def chunk_fixed(text, size=300):
    return [text[i:i+size].strip()
            for i in range(0,len(text),size)
            if text[i:i+size].strip()]

# ── Strategy 2: Sliding Window ────────────────────
def chunk_sliding(text, size=300, overlap=60):
    chunks, i = [], 0
    while i < len(text):
        end = min(i+size, len(text))
        t = text[i:end].strip()
        if t: chunks.append(t)
        if end == len(text): break
        i += size - overlap
    return chunks

# ── Strategy 3: Recursive Splitter ───────────────
def chunk_recursive(text, size=300,
                     seps=['\n\n','\n','. ','! ',' ']):
    if len(text) <= size: return [text.strip()]
    sep = next((s for s in seps if s in text), ' ')
    parts = text.split(sep)
    chunks, cur = [], ""
    for p in parts:
        candidate = cur + (sep if cur else "") + p
        if len(candidate) <= size: cur = candidate
        else:
            if cur.strip(): chunks.append(cur.strip())
            cur = p
    if cur.strip(): chunks.append(cur.strip())
    return [c for c in chunks if len(c) >= 15]

# ── Wrap chunks with metadata ─────────────────────
def chunk_documents(docs, strategy="recursive", **kw):
    fn_map = {
        "fixed": chunk_fixed,
        "sliding": chunk_sliding,
        "recursive": chunk_recursive
    }
    fn = fn_map[strategy]
    all_chunks = []
    for doc in docs:
        for j, txt in enumerate(fn(doc["content"], **kw)):
            all_chunks.append({
                "content": txt,
                "metadata": {**doc["metadata"], "chunk": j}
            })
    return all_chunks

chunks = chunk_documents(cleaned_docs, strategy="recursive", size=300)
print(f"Created {len(chunks)} chunks")
# → Created 22 chunks
Step 04 / Embeddings

Turn Text Into
Meaning-Vectors

Embeddings convert text to a list of numbers — a vector in high-dimensional space. The magic: semantically similar texts get numerically similar vectors. This enables semantic search, not keyword matching.

🧮

What is an embedding? A 768- or 1536-dimensional vector. "Warranty" and "guarantee" sit close together. "Warranty" and "pizza" sit far apart. We retrieve by spatial proximity — finding nearest neighbors.

🔑

Critical rule: Always use the SAME embedding model for both chunks and queries. Mixing models makes similarity scores meaningless — like measuring distance in miles vs kilometers.

💰

Model choices: OpenAI text-embedding-3-small (fast, cheap, great), all-MiniLM-L6-v2 (free, local, good), BAAI/bge-large (best open-source), Cohere embed-v3 (multilingual).

🔮 Open Embedding Lab
step_04_embed.py
# ── Step 04: Embeddings ──────────────────────────
from openai import OpenAI
from sentence_transformers import SentenceTransformer
import numpy as np

# ── Option A: OpenAI (paid, production) ───────────
oai = OpenAI()

def embed_openai(texts: list[str]) -> list:
    resp = oai.embeddings.create(
        input=texts,
        model="text-embedding-3-small"  # 1536 dims
    )
    return [d.embedding for d in resp.data]

# ── Option B: Sentence Transformers (free) ─────────
model = SentenceTransformer("all-MiniLM-L6-v2")

def embed_local(texts: list[str]) -> np.ndarray:
    return model.encode(texts, batch_size=32,
                         show_progress_bar=True)

# ── Embed all chunks in batches ───────────────────
def embed_chunks(chunks, batch_size=100):
    texts = [c["content"] for c in chunks]
    all_emb = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        all_emb.extend(embed_openai(batch))
    return all_emb

embeddings = embed_chunks(chunks)
print(f"Shape: {np.array(embeddings).shape}")
# → Shape: (22, 1536)

# ── Manual cosine similarity ──────────────────────
def cosine_sim(a, b):
    a, b = np.array(a), np.array(b)
    return float(np.dot(a,b) / (np.linalg.norm(a)*np.linalg.norm(b)))
Step 05 / Vector Store

Store & Index
All Embeddings

A vector database stores embeddings and lets you query them by similarity — not SQL WHERE clauses, but cosine distance search. Give it a query vector, get back the N closest chunk vectors.

🗄️

ChromaDB — perfect for prototyping. In-memory or disk-persistent, zero config, Python-native. For production: Pinecone (managed cloud), Weaviate, Qdrant (open-source), pgvector (PostgreSQL).

📐

Vector DB vs SQL: SQL searches by exact value match. Vector DB searches by angle between vectors — "find the 3 chunks that mean the most similar thing to this query." Fundamentally different operation.

Index your collection. Without an index (HNSW), vector search is O(n) — it compares every vector. With HNSW index, search is O(log n). Critical for production with millions of chunks.

step_05_vector_store.py
# ── Step 05: Vector Store (ChromaDB) ─────────────
import chromadb
from chromadb.config import Settings

# ── Persistent client (survives restarts) ─────────
client = chromadb.PersistentClient(
    path="./vikash_rag_db"
)

# ── Create or load collection ─────────────────────
collection = client.get_or_create_collection(
    name="vikash_innovative_tech",
    metadata={"hnsw:space": "cosine"}  # cosine dist
)

def store_chunks(chunks, embeddings):
    """Add chunks + embeddings to ChromaDB."""
    collection.add(
        documents=[c["content"] for c in chunks],
        embeddings=embeddings,
        metadatas=[c["metadata"] for c in chunks],
        ids=[str(i) for i in range(len(chunks))]
    )
    print(f"✓ Stored {collection.count()} chunks")

store_chunks(chunks, embeddings)
# → ✓ Stored 22 chunks in vikash_rag_db

# ── Metadata filtering (powerful feature) ─────────
# Find only chunks from warranty policy pages:
results = collection.query(
    query_embeddings=[query_emb],
    n_results=3,
    where={"source": "warranty_policy.pdf"}
)

# ── Check if collection already built ─────────────
if collection.count() > 0:
    print("Loaded existing index — skip rebuild")
else:
    store_chunks(chunks, embeddings)
Step 06 / Retrieval

Find the Right
Chunks at Query Time

A user asks a question. We embed their query using the same model, then find the K nearest chunks in the vector database. This is where RAG's power becomes visible — semantic matching, not keywords.

🎯

Top-K guideline: K=3 is a good default. Too low (K=1) and you miss context. Too high (K=20) and you fill the prompt with noise. For Vikash Innovative Tech docs, K=3–5 works well.

📐

Cosine similarity returns a score from 0 to 1. Score >0.85 = very relevant. Score 0.6–0.85 = somewhat relevant. Score <0.6 = probably irrelevant — consider filtering these out.

🔍 LIVE RETRIEVAL DEMO — Vikash Innovative Tech KB
step_06_retrieve.py
# ── Step 06: Retrieval ───────────────────────────
import numpy as np

def retrieve(query: str, k=3) -> list[dict]:
    """Find k most relevant chunks for a query."""
    # 1. Embed the query (same model as chunks!)
    q_emb = embed_openai([query])[0]

    # 2. Query vector store
    results = collection.query(
        query_embeddings=[q_emb],
        n_results=k
    )

    # 3. Format results with similarity scores
    retrieved = []
    for i in range(len(results["documents"][0])):
        sim = 1 - results["distances"][0][i]
        if sim > 0.5:  # Filter low-relevance
            retrieved.append({
                "text": results["documents"][0][i],
                "score": round(sim, 4),
                "source": results["metadatas"][0][i],
            })
    return sorted(retrieved,
                   key=lambda x:x["score"],
                   reverse=True)

# ── Example ───────────────────────────────────────
results = retrieve("What does the warranty cover?")
for r in results:
    print(f"Score: {r['score']:.3f}")
    print(r["text"][:100] + "...")
# → Score: 0.940
#   All laptops come with a 1-year limited...
# → Score: 0.872
#   Warranty covers: manufacturing defects...
Step 07 / Generate

Ground the LLM
in Your Context

The final step: combine retrieved chunks into a prompt and send to the LLM. The key insight — the system prompt must explicitly constrain the LLM to answer only from context. This eliminates hallucinations.

✍️

Prompt engineering for RAG: (1) Put context before the question. (2) Tell LLM to say "I don't know" if answer isn't in context. (3) Require source citation. These 3 rules prevent 99% of hallucinations.

🌡️

Set temperature=0.1. Lower temperature = more factual, deterministic answers. Higher temperature = more creative but more likely to drift from the context. For Q&A, stay at 0.0–0.2.

💬 VIKASH INNOVATIVE TECH — RAG SUPPORT CHAT
Vikash AI Support (RAG-powered)
You
What does the 1-year warranty cover, and what's excluded?
Vikash AI
📄 vik_policy.pdf, pages 1–2
step_07_generate.py
# ── Step 07: LLM + Prompt Engineering ────────────
from openai import OpenAI

client = OpenAI()

def build_context(retrieved_chunks):
    """Format retrieved chunks as numbered context."""
    return "\n\n".join(
        f"[{i+1}] (Source: {c['source'].get('source','?')}, "
        f"page {c['source'].get('page','?')})\n{c['text']}"
        for i, c in enumerate(retrieved_chunks)
    )

def rag_answer(question: str, k=3) -> str:
    """Full RAG: retrieve → prompt → answer."""
    # Retrieve relevant chunks
    chunks = retrieve(question, k=k)
    context = build_context(chunks)

    # Build system prompt with strict grounding
    system = """You are the AI assistant for Vikash Innovative Tech.
Answer ONLY based on the retrieved context provided.
If the answer is not in the context, say exactly:
'I don't have enough information to answer this.'
Always cite sources as [1], [2], etc.
Be concise and accurate."""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content":
             f"Context:\n{context}\n\nQuestion: {question}"}
        ],
        temperature=0.1
    )
    return response.choices[0].message.content

# ── Complete pipeline in 3 lines ──────────────────
answer = rag_answer("What does the warranty cover?")
print(answer)
# → Based on [1], Vikash Innovative Tech's 1-year
#   warranty covers manufacturing defects, hardware
#   malfunctions, and faulty components...
Chapter 02 — Interactive Lab

⬛ Chunking Strategy Lab

Drag the sliders, switch strategies, and watch Vikash Innovative Tech policy text re-chunk in real time. Click any colored span to inspect that chunk.

Chunking Strategy
Live Text Visualizer
ANIM
CHUNK SIZE DISTRIBUTION
Chunk Inspector
⚡ PARAMETER SWEEP — Watch chunk count evolve as size changes · Click any bar or press ▶
size: —
Darker = more chunks. Highlighted bar = current setting.

⚖️ Side-by-Side Comparison

Shared Parameters
Chunk Size 280
Overlap 50
Metrics Chart
Chapter 03 — Embedding Visualizer

🔮 Semantic Embedding Space

Each chunk becomes a point in high-dimensional space. We project to 2D here. Change the chunk size and watch the semantic clusters re-form with live physics animation.

2D Semantic Projection
COLOR:
t-SNE projection · hover dots to inspect
Adjust Chunks → Watch Clusters Re-form
Chunk Size 300
Overlap 60
Physics Gravity 0.5
Dots with similar content cluster together. Larger dot = larger chunk.
Chunk Legend
Topic Clusters
Similarity Matrix
hover=score
0.0
1.0
Chapter 04 — Advanced Topics

Beyond Basic RAG

Production RAG systems use these techniques to reach 95%+ accuracy on real-world queries.

📐

Cosine Similarity

The math powering semantic search. Measures angle between two embedding vectors. Score=1.0 means identical meaning. Score=0.0 means completely unrelated. "Warranty" and "guarantee" cluster at ~0.9.

MATH FOUNDATION
🏗️

Chunking Strategies

Fixed-size, sliding window, recursive, semantic, and sentence-level chunking — each with distinct trade-offs. Recursive splitting (paragraph → sentence → word) wins on most structured documents.

ARCHITECTURE
🎯

Hybrid Search

Combine dense (semantic) retrieval with sparse (BM25/TF-IDF keyword) retrieval. Dense finds meaning. Sparse finds exact names/codes. Hybrid search + reciprocal rank fusion = state-of-the-art precision.

ADVANCED
🔄

Re-ranking

Retrieve top-20 chunks, then use a cross-encoder model to re-rank and select top-3. Cross-encoders score query-chunk pairs jointly — 2–3× better precision than bi-encoders at 5× compute cost.

ADVANCED
🌐

Metadata Filtering

Pre-filter by date, department, document type, or author before semantic search. For Vikash Innovative Tech: "Find answers from warranty docs created after 2024" combines structured + unstructured search.

PRODUCTION
📊

RAG Evaluation (RAGAS)

Measure: faithfulness (zero hallucination), answer relevancy, context precision, context recall. Never deploy without evals. RAGAS automates this against ground-truth Q&A pairs you define.

QUALITY
🧠

Query Expansion

Generate multiple paraphrases of the user question before retrieval. "Warranty claim process" → also search "how to file warranty", "warranty repair steps". Dramatically improves recall.

ADVANCED
🗃️

Parent-Child Chunking

Store small child chunks for precise retrieval, but send parent (larger) chunks to the LLM for full context. Best of both worlds — surgical retrieval precision with broad generative context.

PRODUCTION
Chapter 05 — Hands-On

🎮 RAG Playground

Powered by Vikash Innovative Tech company policy data. Ask any question and watch the full pipeline run — retrieve, contextualize, answer.

Vikash Innovative Tech Knowledge Base
SAMPLE QUERIES
YOUR QUESTION
Pipeline Output
STEP 1 — RETRIEVED CONTEXT
STEP 2 — GENERATED ANSWER