Building a RAG Chatbot

One of the most potent applications enabled by LLMs is the development of question-answering chatbots. These are applications that can answer questions about specific source information. This tutorial demonstrates how to build a chatbot that can answer questions about AISViz documentation using a technique known as Retrieval-Augmented Generation (RAG).

Overview

A typical RAG application has two main components:

  • Indexing: a pipeline for scraping data from documentation and indexing it.

    • This usually happens offline.

  • Retrieval and generation: the actual RAG chain, which takes the user query at runtime and retrieves the relevant data from the index, then passes it to the model.

The most common complete sequence from raw docs to answer looks like:

Indexing

  1. Scrape: First, we need to scrape all documentation pages. This includes the GitBook documentation and related pages.

  2. Split: Text splitters break large documents into smaller chunks. This is useful both for indexing data and passing it into a model, since large chunks are harder to search over and won't fit in a model's finite context window.

  3. Store: We need somewhere to store and index our splits, so that they can be searched over later. This is done using the Chroma vector database and embeddings.

Retrieval and generation

  1. Retrieve: Given a user input, relevant splits are retrieved from Chroma using similarity search.

  2. Generate: An LLM produces an answer in response to a system prompt that combines both the question and the retrieved context.

Setup

Installation

This tutorial requires these dependencies:

API Keys

You'll need a GOOGLE LLM API key (or another LLM provider). Set it as an environment variable:

Components

We need to select three main components:

  • LLM: We'll use Google's Gemini models through LangChain

  • Embeddings: Hugging Face SentenceTransformers for creating document embeddings

  • Vector Store: Chroma for storing and searching document embeddings

Preview

We can create a simple indexing pipeline and RAG chain to do this in about 100 lines of code.

Detailed walkthrough

1. Indexing

Scraping Documentation

We need to first scrape all the AISViz documentation pages.

Splitting documents

Our scraped documents can be quite long, so we need to split them into smaller chunks. We'll use a simple text splitter that breaks documents into chunks of specified size with some overlap.

Storing documents with SentenceTransformers

Now we need to create embeddings for our chunks using Hugging Face SentenceTransformers and store them in the Chroma vector database.

This completes the Indexing portion of the pipeline. At this point, we have a queryable vector store containing the chunked contents of all the documentation with embeddings created by SentenceTransformers. Given a user question, we should be able to return the most relevant snippets.

2. Retrieval and Generation

Now let's write the actual application logic. We aim to create a simple function that takes a user question, searches for relevant documents using SentenceTransformers embeddings, and generates an answer using Google Gemini. A high-level breakdown:

  • API Key Setup – Load LLM (Gemini in this example) API key from environment or prompt user.

  • Model Initialization – Wrap Google Gemini (gemini-2.5-flash) with LangChain.

  • Embedding – Convert the user's question into a vector with SentenceTransformers.

  • Retrieval – Query Chroma to fetch the top-k most relevant document chunks.

  • Context Building – Assemble retrieved docs + metadata into a context string.

  • Prompting LLM – Combine system + user prompts and send to LLM.

  • Answer Generation – Return a concise response along with sources and context.

3. Building a Gradio Interface

Now let's create a simple web interface using Gradio so others can interact with our chatbot:

The full code can be found here: https://huggingface.co/spaces/mapslab/AISVIZ-BOT/tree/main

Below is a working chatbot for testing!

Last updated