⚡ FastFlowLM (FLM)

FLM is the only NPU-first runtime built for AMD Ryzen™ AI.

Run LLMs — now with Vision support — in minutes: no GPU required, over 10× more power-efficient, and with context lengths up to 256k tokens.

Think Ollama — but laser-optimized for NPUs.

From idle silicon to instant powerFastFlowLM makes Ryzen™ AI shine.


🧪 Test Drive (Remote Demo)

✨ Don’t have a Ryzen™ AI PC? Instantly try FastFlowLM on a live AMD Ryzen™ AI 5 340 NPU with 96 GB memory (spec) — no setup needed.

🚀 Launch Now: https://open-webui.testdrive-fastflowlm.com/
🔐 Login: guest@flm.npu
🔑 Password: 0000

Note:

  • Alternatively, sign up with your own credentials instead of using the shared guest account.
  • Real-time demo powered by FastFlowLM + Open WebUI — no downloads, no installs.

Also Try:

  • 🖼️ Gemma3:4B — the first NPU-only VLM!
    Choose gemma3:4b, click + → Upload files, and add your PNG/JPG images.

  • 🌐 Web Search — local agentic AI–powered search
    Open Integrations (below the chatbox), toggle on Web Search, and start searching instantly.

  • 🗂️ RAG (Retrieval-Augmented Generation) — your secure, local knowledge system
    Select the FLM-RAG model (powered by Qwen3-Thinking-2507-4B) with a knowledge base pre-built from the FLM GitHub repo, and ask anything about FastFlowLM!

📺 Watch this short video to see how to try the remote demo in just a few clicks.

⚠️ Please note:

  • Some universities or companies may block access to the test drive site. If it doesn’t load over Wi-Fi, try switching to a cellular network.
  • FastFlowLM is designed for single-user local use. This remote demo machine may experience short wait times when multiple users access it concurrently — please be patient.
  • When switching models, it may take longer time to replace the model in memory.
  • Large prompts and VLM (gemma3:4b) may take longer — but it works! 🙂

📚 Sections

🚀 Installation

Quick 5‑minute setup guide for Windows.

🛠️ Instructions

Run FastFlowLM using the CLI mode or server mode.

📊 Benchmarks

Real-time performance comparisons vs AMD’s official stack and other tools.

🧩 Models

Supported models, quantization formats, and compatibility details.