⚡ Performance and Efficiency Benchmarks
This section reports the decoding speed and power usage of LLMs on different hardware: NPU (FastFlowLM), NPU (Ryzen AI Software), iGPU (LM Studio), and CPU (LM Studio).
🚀 Decoding Speed (Tokens per Second, or TPS, at different sequence lengths)
Model | Hardware | 1k | 2k | 4k | 8k | 16k | 32k | 64k | 128k |
---|---|---|---|---|---|---|---|---|---|
LLaMA 3.2 1B | NPU (FastFlowLM) | 36.7 | 35.8 | 33.2 | 29.6 | 24.0 | 17.7 | 11.5 | 6.8 |
NPU (Ryzen AI SW) | 18.6 | 14.9 | NA | NA | NA | NA | NA | NA | |
iGPU | 28.7 | 19.0 | 10.9 | 6.0 | 3.2 | 1.6 | 0.8 | 0.4 | |
CPU | 54.6 | 52.6 | 42.3 | 34.1 | 24.4 | 14.8 | 8.4 | 4.5 | |
LLaMA 3.2 3B | NPU (FastFlowLM) | 16.1 | 15.4 | 14.3 | 12.4 | 9.9 | 7.0 | 4.4 | 2.6 |
NPU (Ryzen AI SW) | 9.0 | 6.1 | NA | NA | NA | NA | NA | NA | |
iGPU | 23.2 | 18.8 | 14.0 | 9.2 | 5.5 | 3.0 | 1.6 | 0.8 | |
CPU | 22.6 | 21.3 | 17.5 | 14.1 | 9.4 | 6.1 | 3.5 | 1.9 | |
LLaMA 3.1 8B | NPU (FastFlowLM) | 7.6 | 7.4 | 7.1 | 6.5 | 5.7 | 4.4 | 3.1 | 2.0 |
NPU (Ryzen AI SW) | 6.3 | 4.6 | NA | NA | NA | NA | NA | NA | |
iGPU | 11.3 | 9.9 | 7.7 | 5.4 | 3.4 | 1.9 | 1.0 | 0.5 | |
CPU | 10.3 | 7.7 | 7.6 | 6.7 | 5.8 | 3.3 | 2.0 | 1.1 |
🔎 Note: The official release of Ryzen AI Software 1.4 limits context length to 2,048 tokens, thus “NA” is used in the table (NPU-only mode). The hybrid mode of Ryzen AI Software 1.4 uses iGPU for decoding. Its performance is simliar to iGPU (LM Studio). Also, it limits context length to 2,048, thus, we did not include hybrid mode for comparison.
🔋 Power Consumption (Watts) During Decoding
Model | Method | CPU | NPU | iGPU | Total Power (W) | Efficiency Gain |
---|---|---|---|---|---|---|
LLaMA 3.2 1B | NPU (FastFlowLM) | 0.07 | 1.57 | 0 | 1.64 | – |
NPU (Ryzen AI SW) | 0.85 | 2.05 | 0 | 2.90 | 1.77× | |
iGPU | 0.12 | 0 | 14.00 | 14.12 | 8.61× | |
CPU | 4.90 | 0 | 0 | 4.90 | 2.99× | |
LLaMA 3.2 3B | NPU (FastFlowLM) | 0.06 | 1.33 | 0 | 1.39 | – |
NPU (Ryzen AI SW) | 0.95 | 2.05 | 0 | 3.00 | 2.16× | |
iGPU | 0.11 | 0 | 13.00 | 13.11 | 9.43× | |
CPU | 4.50 | 0 | 0 | 4.50 | 3.24× | |
LLaMA 3.1 8B | NPU (FastFlowLM) | 0.07 | 1.17 | 0 | 1.24 | – |
NPU (Ryzen AI SW) | 0.80 | 2.50 | 0 | 3.30 | 2.66× | |
iGPU | 0.11 | 0 | 14.00 | 14.11 | 11.38× | |
CPU | 4.50 | 0 | 0 | 4.50 | 3.63× |
🔋 Power Consumption (Watts) During Prefill
Model | Method | CPU | NPU | iGPU | Total Power (W) | Efficiency Gain |
---|---|---|---|---|---|---|
LLaMA 3.2 1B | NPU (FastFlowLM) | 0.31 | 0.90 | 0 | 1.21 | – |
NPU (Ryzen AI SW) | 0.96 | 2.05 | 0 | 3.01 | 2.49× | |
iGPU | 2.70 | 0 | 10.00 | 12.70 | 10.50× | |
LLaMA 3.2 3B | NPU (FastFlowLM) | 0.20 | 0.90 | 0 | 1.10 | – |
NPU (Ryzen AI SW) | 1.06 | 2.10 | 0 | 3.16 | 2.87× | |
iGPU | 2.10 | 0 | 11.00 | 13.10 | 11.91× | |
LLaMA 3.1 8B | NPU (FastFlowLM) | 0.23 | 0.86 | 0 | 1.09 | – |
NPU (Ryzen AI SW) | 1.20 | 2.50 | 0 | 3.70 | 3.39× | |
iGPU | 1.40 | 0 | 14.00 | 15.40 | 14.13× |
🔎 Note: CPU is not commonly used for prefill and is excluded from this table.