⚡ Performance and Efficiency Benchmarks
This section reports the decoding speed and power usage of LLMs on different hardware: NPU (FastFlowLM), NPU (Ryzen™ AI Software), iGPU (LM Studio), and CPU (LM Studio).
Note: Results are based on FastFlowLM v0.1.3. Newer versions may deliver up to 20% improved performance.
🚀 Decoding Speed (Tokens per Second, or TPS, at different sequence lengths)
Model | Hardware | 1k | 2k | 4k | 8k | 16k | 32k | 64k | 128k |
---|---|---|---|---|---|---|---|---|---|
LLaMA 3.2 1B | NPU (FastFlowLM) | 36.7 | 35.8 | 33.2 | 29.6 | 24.0 | 17.7 | 11.5 | 6.8 |
NPU (Ryzen™ AI SW) | 18.6 | 14.9 | NA | NA | NA | NA | NA | NA | |
iGPU | 28.7 | 19.0 | 10.9 | 6.0 | 3.2 | 1.6 | 0.8 | 0.4 | |
CPU | 54.6 | 52.6 | 42.3 | 34.1 | 24.4 | 14.8 | 8.4 | 4.5 | |
LLaMA 3.2 3B | NPU (FastFlowLM) | 16.1 | 15.4 | 14.3 | 12.4 | 9.9 | 7.0 | 4.4 | 2.6 |
NPU (Ryzen™ AI SW) | 9.0 | 6.1 | NA | NA | NA | NA | NA | NA | |
iGPU | 23.2 | 18.8 | 14.0 | 9.2 | 5.5 | 3.0 | 1.6 | 0.8 | |
CPU | 22.6 | 21.3 | 17.5 | 14.1 | 9.4 | 6.1 | 3.5 | 1.9 | |
LLaMA 3.1 8B | NPU (FastFlowLM) | 7.6 | 7.4 | 7.1 | 6.5 | 5.7 | 4.4 | 3.1 | 2.0 |
NPU (Ryzen™ AI SW) | 6.3 | 4.6 | NA | NA | NA | NA | NA | NA | |
iGPU | 11.3 | 9.9 | 7.7 | 5.4 | 3.4 | 1.9 | 1.0 | 0.5 | |
CPU | 10.3 | 7.7 | 7.6 | 6.7 | 5.8 | 3.3 | 2.0 | 1.1 |
🔎 Note: The official release of Ryzen™ AI Software 1.4 limits context length to 2,048 tokens, thus “NA” is used in the table (NPU-only mode). The hybrid mode of Ryzen™ AI Software 1.4 uses iGPU for decoding. Its performance is simliar to iGPU (LM Studio). Also, it limits context length to 2,048, thus, we did not include hybrid mode for comparison.
🔋 Power Consumption (Watts) During Decoding
Model | Method | CPU | NPU | iGPU | Total Power (W) | Efficiency Gain |
---|---|---|---|---|---|---|
LLaMA 3.2 1B | NPU (FastFlowLM) | 0.07 | 1.57 | 0 | 1.64 | – |
NPU (Ryzen™ AI SW) | 0.85 | 2.05 | 0 | 2.90 | 1.77× | |
iGPU | 0.12 | 0 | 14.00 | 14.12 | 8.61× | |
CPU | 4.90 | 0 | 0 | 4.90 | 2.99× | |
LLaMA 3.2 3B | NPU (FastFlowLM) | 0.06 | 1.33 | 0 | 1.39 | – |
NPU (Ryzen™ AI SW) | 0.95 | 2.05 | 0 | 3.00 | 2.16× | |
iGPU | 0.11 | 0 | 13.00 | 13.11 | 9.43× | |
CPU | 4.50 | 0 | 0 | 4.50 | 3.24× | |
LLaMA 3.1 8B | NPU (FastFlowLM) | 0.07 | 1.17 | 0 | 1.24 | – |
NPU (Ryzen™ AI SW) | 0.80 | 2.50 | 0 | 3.30 | 2.66× | |
iGPU | 0.11 | 0 | 14.00 | 14.11 | 11.38× | |
CPU | 4.50 | 0 | 0 | 4.50 | 3.63× |
🔋 Power Consumption (Watts) During Prefill
Model | Method | CPU | NPU | iGPU | Total Power (W) | Efficiency Gain |
---|---|---|---|---|---|---|
LLaMA 3.2 1B | NPU (FastFlowLM) | 0.31 | 0.90 | 0 | 1.21 | – |
NPU (Ryzen™ AI SW) | 0.96 | 2.05 | 0 | 3.01 | 2.49× | |
iGPU | 2.70 | 0 | 10.00 | 12.70 | 10.50× | |
LLaMA 3.2 3B | NPU (FastFlowLM) | 0.20 | 0.90 | 0 | 1.10 | – |
NPU (Ryzen™ AI SW) | 1.06 | 2.10 | 0 | 3.16 | 2.87× | |
iGPU | 2.10 | 0 | 11.00 | 13.10 | 11.91× | |
LLaMA 3.1 8B | NPU (FastFlowLM) | 0.23 | 0.86 | 0 | 1.09 | – |
NPU (Ryzen™ AI SW) | 1.20 | 2.50 | 0 | 3.70 | 3.39× | |
iGPU | 1.40 | 0 | 14.00 | 15.40 | 14.13× |
🔎 Note: CPU is not commonly used for prefill and is excluded from this table.