⚡ Performance and Efficiency Benchmarks

This section reports the decoding speed and power usage of LLMs on different hardware: NPU (FastFlowLM), NPU (Ryzen AI Software), iGPU (LM Studio), and CPU (LM Studio).


🚀 Decoding Speed (Tokens per Second, or TPS, at different sequence lengths)

Model Hardware 1k 2k 4k 8k 16k 32k 64k 128k
LLaMA 3.2 1B NPU (FastFlowLM) 36.7 35.8 33.2 29.6 24.0 17.7 11.5 6.8
  NPU (Ryzen AI SW) 18.6 14.9 NA NA NA NA NA NA
  iGPU 28.7 19.0 10.9 6.0 3.2 1.6 0.8 0.4
  CPU 54.6 52.6 42.3 34.1 24.4 14.8 8.4 4.5
LLaMA 3.2 3B NPU (FastFlowLM) 16.1 15.4 14.3 12.4 9.9 7.0 4.4 2.6
  NPU (Ryzen AI SW) 9.0 6.1 NA NA NA NA NA NA
  iGPU 23.2 18.8 14.0 9.2 5.5 3.0 1.6 0.8
  CPU 22.6 21.3 17.5 14.1 9.4 6.1 3.5 1.9
LLaMA 3.1 8B NPU (FastFlowLM) 7.6 7.4 7.1 6.5 5.7 4.4 3.1 2.0
  NPU (Ryzen AI SW) 6.3 4.6 NA NA NA NA NA NA
  iGPU 11.3 9.9 7.7 5.4 3.4 1.9 1.0 0.5
  CPU 10.3 7.7 7.6 6.7 5.8 3.3 2.0 1.1

🔎 Note: The official release of Ryzen AI Software 1.4 limits context length to 2,048 tokens, thus “NA” is used in the table (NPU-only mode). The hybrid mode of Ryzen AI Software 1.4 uses iGPU for decoding. Its performance is simliar to iGPU (LM Studio). Also, it limits context length to 2,048, thus, we did not include hybrid mode for comparison.


🔋 Power Consumption (Watts) During Decoding

Model Method CPU NPU iGPU Total Power (W) Efficiency Gain
LLaMA 3.2 1B NPU (FastFlowLM) 0.07 1.57 0 1.64
  NPU (Ryzen AI SW) 0.85 2.05 0 2.90 1.77×
  iGPU 0.12 0 14.00 14.12 8.61×
  CPU 4.90 0 0 4.90 2.99×
LLaMA 3.2 3B NPU (FastFlowLM) 0.06 1.33 0 1.39
  NPU (Ryzen AI SW) 0.95 2.05 0 3.00 2.16×
  iGPU 0.11 0 13.00 13.11 9.43×
  CPU 4.50 0 0 4.50 3.24×
LLaMA 3.1 8B NPU (FastFlowLM) 0.07 1.17 0 1.24
  NPU (Ryzen AI SW) 0.80 2.50 0 3.30 2.66×
  iGPU 0.11 0 14.00 14.11 11.38×
  CPU 4.50 0 0 4.50 3.63×

🔋 Power Consumption (Watts) During Prefill

Model Method CPU NPU iGPU Total Power (W) Efficiency Gain
LLaMA 3.2 1B NPU (FastFlowLM) 0.31 0.90 0 1.21
  NPU (Ryzen AI SW) 0.96 2.05 0 3.01 2.49×
  iGPU 2.70 0 10.00 12.70 10.50×
LLaMA 3.2 3B NPU (FastFlowLM) 0.20 0.90 0 1.10
  NPU (Ryzen AI SW) 1.06 2.10 0 3.16 2.87×
  iGPU 2.10 0 11.00 13.10 11.91×
LLaMA 3.1 8B NPU (FastFlowLM) 0.23 0.86 0 1.09
  NPU (Ryzen AI SW) 1.20 2.50 0 3.70 3.39×
  iGPU 1.40 0 14.00 15.40 14.13×

🔎 Note: CPU is not commonly used for prefill and is excluded from this table.