📑 Table of Contents
🧩 Run Open WebUI with FastFlowLM
This guide walks you through using docker-compose.yaml
to run Open WebUI connected to a local FastFlowLM instance on Windows.
✅ Prerequisites
- Docker Desktop for Windows
- During installation, enable WSL2 backend
- Reboot if prompted
- FastFlowLM
📁 Step 1: Create Project Folder
Open PowerShell and run:
mkdir open-webui && cd open-webui
This creates a clean workspace for your Docker setup.
📝 Step 2: Create docker-compose.yaml
Launch Notepad:
notepad docker-compose.yaml
Paste the following:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "3000:8080"
volumes:
- open-webui-data:/app/backend/data
environment:
# Point WebUI to FLM's OpenAI-compatible server
- OPENAI_API_BASE_URL=http://host.docker.internal:52625/v1
- OPENAI_API_KEY=dummy-key
# WebUI settings
- WEBUI_AUTH=false
- WEBUI_SECRET_KEY=dummysecretkey
- ENABLE_TITLE_GENERATION=false
- ENABLE_FOLLOW_UP_GENERATION=false
- ENABLE_TAGS_GENERATION=false
- ENABLE_RETRIEVAL_QUERY_GENERATION=false
- ENABLE_IMAGE_PROMPT_GENERATION=false
- ENABLE_WEB_SEARCH=false
- ENABLE_SEARCH_QUERY_GENERATION=false
restart: unless-stopped
volumes:
open-webui-data:
▶️ Step 3: Launch the Open WebUI Container (in PowerShell)
docker compose up -d
It could take up to 1 min before you can access Open WebUI.
This starts the container in detached mode.
You can check logs with:
docker logs -f open-webui
🌐 Step 4: Access the WebUI (in Browser)
Open browser and go to:
http://localhost:3000
You should now see the Open WebUI interface.
🧪 Step 5: Serve FastFlowLM with Model
flm serve llama3.2:1b
You can now use FastFlowLM
directly in Open WebUI.
When switching models, it may take longer time to replace the model in memory.
🧼 Step 6: Stop or Clean Up (in PowerShell)
docker compose stop
To remove it completely:
docker compose down
This also removes the container but keeps persistent volume data.
or
docker compose down -v
This removes the container and persistent volume data.
🧼 Step 7: Update Open WebUI
docker compose pull
🧠 Notes
- Want login? Set
WEBUI_AUTH=true
- You must keep FastFlowLM server running
- For persistent chat history, the volume
openwebui-data
stores user data
Note (When using Open WebUI):
The Open WebUI sends multiple background requests to the server.
To improve stability and performance, you can disable these in Settings → Chat:
- Title Auto-Generation
- Follow-Up Auto-Generation
- Chat Tags Auto-Generation
Toggle them off, then refresh the page.
🧪 More Examples
Well done 🎉 — now let’s explore more apps together!
🤖 Example: Multi Models Comparision Enabled by FLM Queuing
A step-by-step guide to launching FastFlowLM and interacting with multiple models via Open WebUI.
🌐 Step 1: Run Open WebUI with FastFlowLM
Follow the quick setup at here.
🧩 Step 2: Select and Add Models
At the top-right corner of the WebUI:
- Choose a model to begin (e.g.,
llama3.2:1b
) - Click ➕ to add other models, e.g.:
qwen3:0.6b
gemma3:1b
You’ll now see several models listed. That means each one can answer your prompt.
💬 Step 3: Interact with Models
Type anything you’re curious about in the input box.
⚠️ Please note:
- Each model will reply in sequences (not all at once)..
- The flm server dynamically loads each model based on your selection.
🎯 Step 4: Select or Merge
After receiving replies from multiple models, choose how you’d like to continue:
-
✅ Use the Best Response
Select the answer that best meets your expectations. That response will become the active context for your next question. -
🔗 Merge All Responses
Combine insights from all models and continue the conversation using your preferred model. This lets you synthesize multiple perspectives into a unified thread.
🌐 Example: Agentic AI Web Search with FastFlowLM
Step-by-step guide to powering Agentic AI web search in Open WebUI — NPU-only, lightning-fast, with Google PSE + FLM.
🛠️ Step 1: Set Up Google PSE
- Go to Google Programmable Search Engine and sign in or create an account. Click
create a search engine
. Review the Overview page. - Visit the Control Panel and click the
Add
button. - Fill in:
- A name for your search engine (e.g., flm-search)
- What to search? (e.g., select
Search the entire web
) - Search settings (e.g., enable
Image search
) - Verify you’re not a robot
- Then click
Create
- After creation, click
Customize
. - Copy and save your Search Engine ID (you’ll need it later).
- Scroll down to Programmatic Access → click Get started.
- Find Programmable Search Engine (free edition) users → click Get a Key.
- Select
Create a project
→ Enter new project name (e.g., owbui-search) → click next → clickSHOW KEY
to reveal your API key → copy and save it (you’ll need it later).
🌐 Step 2: Run Open WebUI with FastFlowLM
Follow the quick setup guide here.
🧩 Step 3: Enable Web Search in Open WebUI
With your API Key and Search Engine ID from Step 1, follow these steps:
- In the bottom-left corner, click
User
icon, then selectSettings
. - In the bottom panel, open
Admin Settings
. - From the left sidebar, click
Web Search
. - Under
General
, toggleWeb Search
to enable web search function. - Set
Web Search Engine
asgoogle_pse
. - Enter your saved:
- Google PSE API Key
- Google PSE Engine ID
- Under
Loader
, setConcurrent Requests
to 10 or more (optional). - Click
Save
.
💬 Step 4: Start Using Web Search
- Start a new chat and select your preferred model (e.g., qwen3-it:4b).
⚠️ Note: not all models handle web search well.
- Under the chat input box, Click
integrations
, and toggle Web Search to activate it .- 🔄 You’ll need to activate this every time you start a new chat.
- Ask anything you’re curious about—real-time search will enhance your answers!
🗄️ Example: Local Private Database with RAG + FastFlowLM
This example walks you through setting up a local, private knowledge base using Retrieval-Augmented Generation (RAG) powered by FastFlowLM.
RAG combines two steps:
- Retrieval – fetch the most relevant information from your knowledge base (e.g.,
.md
docs). - Generation – use an AI model to create accurate, context-aware answers based on that retrieved data.
In this example, the knowledge base is the Open WebUI documentation. With FastFlowLM running on the NPU, you get fast, efficient, and secure responses — all without sending your data to the cloud.
📝 Step 1: Download the Documentation
- Download the latest
Open WebUI
documentation. - Extract the
docs-main.zip
file to get all documentation files. - In the extracted folder, locate all files with
.md
and.mdx
extensions (tip:Ctrl+F
and search for*.md*
).
🌐 Step 2: Run Open WebUI with FastFlowLM
Follow the quick setup at here.
🧠 Step 3: Create a Knowledge Bases
- In Open WebUI,from the top-left menu, navigate to Workspace > Knowledge (top bar) > Click
+
symbol on the right side to Create a Knowledge Base. - Enter
What are you working on?
→Open WebUI Documentation
- Enter
What are you trying to achieve?
→Assistance
. - Click on
Create Knowledge
. - In the extracted folder, press
Ctrl+A
, then drag and drop the.md
and.mdx
files from the extracted folder into theOpen WebUI Documentation
knowledge base. (159 files in total as of 09/22/2025)
🧩 Step 4: Create and Configure the Model
- Go to left-top menu, navigate to Workspace > Models (top bar) > Click
+
symbol on the right side to Add New Model - Configure the Model:
- Model Name: Enter a name, e.g.
FLM_RAG
- Base Model: Choose from the available list, e.g., gemma3:4b
- Knowledge: Select
Open WebUI Documentation
from the dropdown - Capabilities: Check the options you need (e.g. enable citation to show sources)
- Model Name: Enter a name, e.g.
- Save & Create.
💬 Step 5: Examples and Usage
- Start a New Chat:
- Navigate to New Chat and select the
FLM_RAG
model.
- Navigate to New Chat and select the
- Example Queries:
🧑 User: “Introduce Open WebUI.”
🤖 Assistant: Based on the knowledge base Open WebUI Documentation
, here’s an introduction…
🧑 User: “How to use Open WebUI with Docker?”
🤖 Assistant: Here are the steps from the knowledge base Open WebUI Documentation
…