⚡ Performance and Efficiency Benchmarks
This section reports the performance of LLaMA 3.x on NPU with FastFlowLM (FLM).
Note:
- Results are based on FastFlowLM v0.9.21.
- Under FLM’s default NPU power mode (Performance)
- Test system spec: AMD Ryzen™ AI 7 350 (Krakan Point) with 32 GB DRAM.
- Newer versions may deliver improved performance.
🚀 Decoding Speed (TPS, or Tokens per Second, starting @ different context lengths)
| Model | Hardware | 1k | 2k | 4k | 8k | 16k | 32k | 64k | 128k | Model |
|---|---|---|---|---|---|---|---|---|---|---|
| LLaMA 3.2 1B | NPU (FLM) | 62.7 | 58.8 | 52.7 | 44.8 | 33.7 | 23.6 | 14.6 | 10.6 | LLaMA 3.2 1B |
| LLaMA 3.2 3B | NPU (FLM) | 26.2 | 24.6 | 22.1 | 18.3 | 13.7 | 9.1 | 6.8 | OOM | LLaMA 3.2 3B |
| LLaMA 3.1 8B | NPU (FLM) | 12.7 | 12.4 | 11.6 | 10.4 | 8.6 | 6.3 | OOM | OOM | LLaMA 3.1 8B |
OOM: Out Of Memory
Only <50% system DRAM can be accessed by NPU
On systems with more than 32 GB DRAM, longer context lengths are supported. FLM supports the full context length available for each model.
🚀 Prefill Speed (TPS, or Tokens per Second, with different prompt lengths)
| Model | Hardware | 1k | 2k | 4k | 8k | 16k | 32k | Model |
|---|---|---|---|---|---|---|---|---|
| LLaMA 3.2 1B | NPU (FLM) | 1442 | 1766 | 1750 | 1473 | 967 | 577 | LLaMA 3.2 1B |
| LLaMA 3.2 3B | NPU (FLM) | 678 | 797 | 738 | 583 | 373 | 214 | LLaMA 3.2 3B |
| LLaMA 3.1 8B | NPU (FLM) | 384 | 447 | 426 | 376 | 267 | 167 | LLaMA 3.1 8B |