LLaMA 3 Benchmark Results

⚡ Performance and Efficiency Benchmarks

This section reports the decoding speed and power usage of LLMs on different hardware: NPU (FastFlowLM), NPU (Ryzen AI Software), iGPU (LM Studio), and CPU (LM Studio).

🚀 Decoding Speed (Tokens per Second, or TPS, at different sequence lengths)

Model	Hardware	1k	2k	4k	8k	16k	32k	64k	128k
LLaMA 3.2 1B	NPU (FastFlowLM)	36.7	35.8	33.2	29.6	24.0	17.7	11.5	6.8
	NPU (Ryzen AI SW)	18.6	14.9	NA	NA	NA	NA	NA	NA
	iGPU	28.7	19.0	10.9	6.0	3.2	1.6	0.8	0.4
	CPU	54.6	52.6	42.3	34.1	24.4	14.8	8.4	4.5
LLaMA 3.2 3B	NPU (FastFlowLM)	16.1	15.4	14.3	12.4	9.9	7.0	4.4	2.6
	NPU (Ryzen AI SW)	9.0	6.1	NA	NA	NA	NA	NA	NA
	iGPU	23.2	18.8	14.0	9.2	5.5	3.0	1.6	0.8
	CPU	22.6	21.3	17.5	14.1	9.4	6.1	3.5	1.9
LLaMA 3.1 8B	NPU (FastFlowLM)	7.6	7.4	7.1	6.5	5.7	4.4	3.1	2.0
	NPU (Ryzen AI SW)	6.3	4.6	NA	NA	NA	NA	NA	NA
	iGPU	11.3	9.9	7.7	5.4	3.4	1.9	1.0	0.5
	CPU	10.3	7.7	7.6	6.7	5.8	3.3	2.0	1.1

🔎 Note: The official release of Ryzen AI Software 1.4 limits context length to 2,048 tokens, thus “NA” is used in the table (NPU-only mode). The hybrid mode of Ryzen AI Software 1.4 uses iGPU for decoding. Its performance is simliar to iGPU (LM Studio). Also, it limits context length to 2,048, thus, we did not include hybrid mode for comparison.

🔋 Power Consumption (Watts) During Decoding

Model	Method	CPU	NPU	iGPU	Total Power (W)	Efficiency Gain
LLaMA 3.2 1B	NPU (FastFlowLM)	0.07	1.57	0	1.64	–
	NPU (Ryzen AI SW)	0.85	2.05	0	2.90	1.77×
	iGPU	0.12	0	14.00	14.12	8.61×
	CPU	4.90	0	0	4.90	2.99×
LLaMA 3.2 3B	NPU (FastFlowLM)	0.06	1.33	0	1.39	–
	NPU (Ryzen AI SW)	0.95	2.05	0	3.00	2.16×
	iGPU	0.11	0	13.00	13.11	9.43×
	CPU	4.50	0	0	4.50	3.24×
LLaMA 3.1 8B	NPU (FastFlowLM)	0.07	1.17	0	1.24	–
	NPU (Ryzen AI SW)	0.80	2.50	0	3.30	2.66×
	iGPU	0.11	0	14.00	14.11	11.38×
	CPU	4.50	0	0	4.50	3.63×

🔋 Power Consumption (Watts) During Prefill

Model	Method	CPU	NPU	iGPU	Total Power (W)	Efficiency Gain
LLaMA 3.2 1B	NPU (FastFlowLM)	0.31	0.90	0	1.21	–
	NPU (Ryzen AI SW)	0.96	2.05	0	3.01	2.49×
	iGPU	2.70	0	10.00	12.70	10.50×
LLaMA 3.2 3B	NPU (FastFlowLM)	0.20	0.90	0	1.10	–
	NPU (Ryzen AI SW)	1.06	2.10	0	3.16	2.87×
	iGPU	2.10	0	11.00	13.10	11.91×
LLaMA 3.1 8B	NPU (FastFlowLM)	0.23	0.86	0	1.09	–
	NPU (Ryzen AI SW)	1.20	2.50	0	3.70	3.39×
	iGPU	1.40	0	14.00	15.40	14.13×

🔎 Note: CPU is not commonly used for prefill and is excluded from this table.