⚡ Performance and Efficiency Benchmarks

This section reports the performance on NPU with FastFlowLM (FLM).

Note:

  • Results are based on FastFlowLM v0.9.20.
  • Under FLM’s default NPU power mode (Performance)
  • Test system spec: AMD Ryzen™ AI 7 350 (Krakan Point) with 32 GB DRAM.
  • Newer versions may deliver improved performance.

🚀 Decoding Speed (TPS, or Tokens per Second, starting @ different context lengths)

Model Hardware 1k 2k 4k 8k 16k 32k 64k 128k Model
gpt-oss-20b NPU (FLM) 18.2 17.9 17.3 16.2 14.4 11.8 8.7 5.7 gpt-oss-20b

🚀 Prefill Speed (TPS, or Tokens per Second, with different prompt lengths)

Model Hardware 1k 2k 4k 8k 16k 32k Model
gpt-oss-20b NPU (FLM) 198 286 354 342 267 173 gpt-oss-20b