โšก CLI (Interactive Mode)

FastFlowLM offers a terminal-based interactive experience, similar to Ollama, but fully offline and accelerated excusively on AMD NPUs.


๐Ÿ”ง Pre-Run PowerShell Commands

๐Ÿ†˜ Show Help

flm help

๐Ÿš€ Run a Model

Run a model interactively from the terminal:

flm run llama3.2:1B

flm is short for FastFlowLM. If the model isnโ€™t available locally, it will be downloaded automatically. This launches FastFlowLM in CLI mode.


โฌ‡๏ธ Pull a Model (Download Only)

Download a model from Hugging Face without launching it:

flm pull llama3.2:3B

๐Ÿ“ฆ List Downloaded Models

Display all locally downloaded models:

flm list

โŒ Remove a Downloaded Model

Delete a model from local storage:

flm remove llama3.2:3B

๐Ÿ“„ Run with a Text File

Load input from a local text file:

flm run llama3.2:1B "C:\Users\Public\Desktop\alice_in_wonderland.txt"

๐ŸŒ Start Server Mode

Launch FastFlowLM as a local REST API server (also support OpenAI API):

flm serve llama3.2:1B

๐Ÿง  Commands Inside Interactive Mode

Once inside the CLI, use the following commands:


๐Ÿ†˜ Help

/?

Displays all available interactive commands. Highly recommended for first-time users.


๐Ÿชช Model Info

/show

View model architecture, size, cache path, and more.


๐Ÿ”„ Change Model

/load [model_name]

Unload the current model and load a new one. KV cache will be cleared.


๐Ÿ’พ Save Conversation

/save

Save the current conversation history to disk.


๐Ÿงน Clear Memory

/clear

Clear the KV cache (model memory) for a fresh start.


๐Ÿ“Š Show Runtime Stats

/status

Display runtime statistics like token count, throughput, etc.


๐Ÿ•ฐ๏ธ Show History

/history

Review the current sessionโ€™s conversation history.


๐Ÿ” Toggle Verbose Mode

/verbose

Enable detailed performance metrics per turn. Run again to disable.


๐Ÿ‘‹ Quit Interactive Mode

/bye

Exit the CLI.


โš™๏ธ Set Hyperparameters

/set

Customize decoding parameters like top_k, top_p, temperature, etc.

โš ๏ธ Note: Providing invalid or extreme values may cause inference errors.