⚡ Performance and Efficiency Benchmarks
This section reports the performance of LLaMA 3.x on NPU with FastFlowLM (FLM).
Note:
- Results are based on FastFlowLM v0.9.8.
- Under FLM’s default NPU power mode (Performance)
- Test system spec: AMD Ryzen™ AI 7 350 (Krakan Point) with 32 GB DRAM.
- Newer versions may deliver improved performance.
🚀 Decoding Speed (TPS, or Tokens per Second, @ different context lengths)
Model | Hardware | 1k | 2k | 4k | 8k | 16k | 32k | 64k | 128k | Model |
---|---|---|---|---|---|---|---|---|---|---|
LLaMA 3.2 1B | NPU (FLM) | 41.5 | 40.6 | 38.1 | 33.2 | 25.6 | 18.6 | 12.2 | 8.9 | LLaMA 3.2 1B |
LLaMA 3.2 3B | NPU (FLM) | 18.3 | 17.8 | 15.9 | 13.6 | 10.5 | 7.3 | 6.3 | OOM | LLaMA 3.2 3B |
LLaMA 3.1 8B | NPU (FLM) | 9.1 | 9.0 | 8.3 | 7.5 | 6.2 | 4.6 | OOM | OOM | LLaMA 3.1 8B |
OOM: Out Of Memory
On systems with more than 32 GB DRAM, longer context lengths are supported. FLM supports the full context length available for each model.
🚀 Prefill Speed (TTFT, or Time to First Token in Seconds, with different prompt lengths)
Model | Hardware | 1k | 2k | 4k | 8k | 16k | 32k | Model |
---|---|---|---|---|---|---|---|---|
LLaMA 3.2 1B | NPU (FLM) | 0.76 | 1.16 | 2.41 | 5.76 | 16.93 | 57.02 | LLaMA 3.2 1B |
LLaMA 3.2 3B | NPU (FLM) | 1.81 | 2.64 | 5.56 | 14.05 | 43.88 | 152.91 | LLaMA 3.2 3B |
LLaMA 3.1 8B | NPU (FLM) | 3.04 | 4.53 | 9.32 | 21.79 | 61.46 | 195.65 | LLaMA 3.1 8B |