⚙️ System Requirements

  • 🧠 Memory: 32 GB RAM or higher recommended
  • CPU/NPU: AMD Ryzen™ AI laptop with XDNA2 NPU
  • 🖥️ OS: Windows 11

While FastFlowLM can run with 16 GB RAM, complex models (e.g., 3B or 8B) may require >= 32 GB for optimal performance and longer context length (more kv cache).


🚨 CRITICAL: NPU Driver Requirpement

You must have AMD NPU driver version 32.0.203.258 or later installed for FastFlowLM to work correctly.

  • Check via:
    Task Manager → Performance → NPU
    or
    Device Manager → NPU

🔗 Download AMD Driver


💾 Installation (Windows)

A packaged Windows installer is available here:
flm-setup.exe

If you see “Windows protected your PC”, click More info, then select Run anyway.

📺 Watch the quick start video

For version history and changelog, see the release notes.


🚀 NPU Power Mode

By default, FLM runs in performance NPU power mode. You can switch to other NPU power modes (powersaver, balanced, or turbo) using the --pmode flag:

CLI mode:

flm run gemma3:4b --pmode balanced

Server mode:

flm serve gemma3:4b --pmode balanced

⚠️ Note: Using powersaver or balanced will lower NPU clock speeds and cause a significant drop in speed. For more details about NPU power mode, refer to the AMD XRT SMI Documentation.


🧪 Quick Test (CLI Mode)

After installation, do a quick test to see if FastFlowLM is properly installed. Open PowerShell, and run a model in terminal (CLI mode):

flm run llama3.2:1b

Requires internet access to HuggingFace to pull (download) the optimized model kernel. The model will be automatically downloaded to the folder: C:\Users\<USER>\Documents\flm\models\. ⚠️ If HuggingFace is not directly accessible in your region, you can manually download the model (e.g., hf-mirror) and place it in the directory.