⚡ RAG with LangChain + FastFlowLM + FAISS (Windows, Offline)
This guide walks you through building a Retrieval-Augmented Generation (RAG) system using:
- ✅ LangChain for orchestration
- ✅ FastFlowLM as a local LLM backend (drop-in replacement for Ollama)
- ✅ HuggingFace Embeddings via
sentence-transformers
- ✅ FAISS for fast vector search
- ✅ Multiple .txt files loaded from a folder
- ✅ Fully offline, privacy-preserving, and Windows-compatible
📦 Prerequisites
✅ System Requirements
- Python 3.10+
- Windows OS
- VS Code (recommended)
- PowerShell
✅ Install FastFlowLM (local server)
Download from GitHub Releases:
👉 https://github.com/FastFlowLM/FastFlowLM
Launch the server:
flm serve llama3.2:3b
This exposes a REST API at
http://localhost:11434
by default.
🛠️ Environment Setup
1. Create a virtual environment
mkdir rag_with_flm
cd rag_with_flm
python -m venv rag-env
.\rag-env\Scripts\activate
If script execution is blocked, run:
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
2. Install Python dependencies
pip install -U langchain langchain-community langchain-huggingface sentence-transformers faiss-cpu tiktoken ollama langchain-ollama
✅ We still install
ollama
Python package because LangChain’sOllamaLLM
class can be pointed at any local REST backend like FastFlowLM.
🔁Ollama
andFastFlowLM
uses the same base URLbase_url="http://localhost:11434"
. Thus, they are interchangable.
📂 Project Structure
rag_with_flm/
├── docs/
│ ├── un_history.txt
│ ├── peacekeeping.txt
│ └── founding_charter.txt
├── rag_demo.py
├── rag-env/
📄 Example Documents
docs/un_history.txt
The United Nations (UN) was officially established on 24 October 1945 after the ratification of its Charter. It replaced the League of Nations.
docs/peacekeeping.txt
UN Peacekeeping began in 1948. Since then, more than 70 operations have been deployed to help maintain peace and security.
docs/founding_charter.txt
The Charter of the United Nations was signed on 26 June 1945 in San Francisco, at the conclusion of the United Nations Conference on International Organization.
🧠 rag_demo.py
# rag_demo.py
import os
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaLLM # still used to wrap FastFlowLM!
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
# Load .txt files from folder
loader = DirectoryLoader('docs/', glob="**/*.txt", loader_cls=TextLoader)
docs = loader.load()
# Smart chunking
splitter = RecursiveCharacterTextSplitter(
chunk_size=350,
chunk_overlap=50,
separators=["\n\n", "\n", " ", ""]
)
chunks = splitter.split_documents(docs)
# Embed with sentence-transformers
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
# Prompt template with fallback
prompt_template = """
You are an expert assistant. Answer using ONLY the context provided.
If the answer is not present, reply: "I don't know based on the given information."
Context:
{context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(
input_variables=["context", "question"],
template=prompt_template
)
# ✅ FastFlowLM via OllamaLLM (optional: custom base_url)
llm = OllamaLLM(model="llama3.2:3b", base_url="http://localhost:11434")
# Build Retrieval-Augmented QA chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": prompt}
)
# Ask a question
query = "When was the United Nations founded and what replaced the League of Nations?"
result = qa_chain.invoke(query)
# Display answer and sources
print("\n" + "="*80)
print("🧠 Answer:\n", result["result"])
print("="*80)
print("📚 Top Sources:")
for i, doc in enumerate(result["source_documents"]):
print(f"\n--- Source {i+1} ---\n{doc.page_content}")
▶️ Run It
Start your FastFlowLM server in one terminal:
flm serve llama3.2:3b
Then in another terminal (with env activated):
python rag_demo.py
✅ Output Example
🧠 Answer:
The United Nations was founded on 24 October 1945 and it replaced the League of Nations.
📚 Top Sources:
--- Source 1 ---
The United Nations (UN) was officially established on 24 October 1945...