โก CLI (Interactive Mode)
FastFlowLM offers a terminal-based interactive experience, similar to Ollama, but fully offline and accelerated excusively on AMD NPUs.
๐ง Pre-Run PowerShell Commands
๐ Show Help
flm help
๐ Run a Model
Run a model interactively from the terminal:
flm run llama3.2:1B
flm
is short for FastFlowLM. If the model isnโt available locally, it will be downloaded automatically. This launches FastFlowLM in CLI mode.
โฌ๏ธ Pull a Model (Download Only)
Download a model from Hugging Face without launching it:
flm pull llama3.2:3B
๐ฆ List Downloaded Models
Display all locally downloaded models:
flm list
โ Remove a Downloaded Model
Delete a model from local storage:
flm remove llama3.2:3B
๐ Run with a Text File
Load input from a local text file:
flm run llama3.2:1B "C:\Users\Public\Desktop\alice_in_wonderland.txt"
๐ Start Server Mode
Launch FastFlowLM as a local REST API server (also support OpenAI API):
flm serve llama3.2:1B
๐ง Commands Inside Interactive Mode
Once inside the CLI, use the following commands:
๐ Help
/?
Displays all available interactive commands. Highly recommended for first-time users.
๐ชช Model Info
/show
View model architecture, size, cache path, and more.
๐ Change Model
/load [model_name]
Unload the current model and load a new one. KV cache will be cleared.
๐พ Save Conversation
/save
Save the current conversation history to disk.
๐งน Clear Memory
/clear
Clear the KV cache (model memory) for a fresh start.
๐ Show Runtime Stats
/status
Display runtime statistics like token count, throughput, etc.
๐ฐ๏ธ Show History
/history
Review the current sessionโs conversation history.
๐ Toggle Verbose Mode
/verbose
Enable detailed performance metrics per turn. Run again to disable.
๐ Quit Interactive Mode
/bye
Exit the CLI.
โ๏ธ Set Hyperparameters
/set
Customize decoding parameters like
top_k
,top_p
,temperature
, etc.
โ ๏ธ Note: Providing invalid or extreme values may cause inference errors.