⚡ Performance and Efficiency Benchmarks
This section reports the performance on NPU with FastFlowLM (FLM).
Note:
- Results are based on FastFlowLM v0.9.11.
- Under FLM’s default NPU power mode (Performance)
- Test system spec: AMD Ryzen™ AI 7 350 (Krakan Point) with 32 GB DRAM.
- Newer versions may deliver improved performance.
🚀 Decoding Speed (TPS, or Tokens per Second, @ different context lengths)
Model | Hardware | 1k | 2k | 4k | 8k | 16k | 32k | 64k | 128k | Model |
---|---|---|---|---|---|---|---|---|---|---|
Gemma 3 1B | NPU (FLM) | 34.2 | 33.7 | 32.6 | 31.4 | 28.3 | 24.1 | OOC | OOC | Gemma 3 1B |
Gemma 3 4B | NPU (FLM) | 14.4 | 14.4 | 14.1 | 13.7 | 13.0 | 11.9 | 10.8 | 9.2 | Gemma 3 4B |
OOC: Out Of Context Length
Each LLM has a maximum supported context window. For example, the gemma3:1b model supports up to 32k tokens.
🚀 Prefill Speed (TTFT, or Time to First Token in Seconds, with different prompt lengths)
Model | Hardware | 1k | 2k | 4k | 8k | 16k | 32k | Model |
---|---|---|---|---|---|---|---|---|
Gemma 3 1B | NPU (FLM) | 1.02 | 1.64 | 2.70 | 4.90 | 9.74 | 21.03 | Gemma 3 1B |
Gemma 3 4B | NPU (FLM) | 1.98 | 3.27 | 5.82 | 11.06 | 22.91 | 50.87 | Gemma 3 4B |
🚀 Prefill TTFT with image
Model | Hardware | Image |
---|---|---|
Gemma 3 4B | NPU (FLM) | 4.3 |
This test uses a short prompt: “Describe this image.”