🧩 Model Card: embeddinggemma-300m
- Type: Embedding (Sentence Similarity)
- Think: No
- Base Model: google/embeddinggemma-300m
- Max Chunk Size: 2048
- Default Context Length: NA
▶️ Run with FastFlowLM in PowerShell:
Embedding model requires to use with an LLM (load concurrently) for Server Mode. Embedding model does not work under CLI Mode.
Server Mode
Start with embedding model enabled:
flm serve gemma3:4b --embed 1 # Load the embedding model (embed-gemma:300m) in the background, with concurrent LLM loading (gemma3:4b).
Send file(s) to POST /v1/embeddings via any OpenAI Client or Open WebUI.
see more API details here → /v1/embeddings/
Example 1: OpenAI Client
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:52625/v1", # FastFlowLM's local API endpoint
api_key="flm", # Dummy key (FastFlowLM doesn't require authentication)
)
resp = client.embeddings.create(
model="embed-gemma",
input="Hi, everyone!"
)
print(resp.data[0].embedding)
Example 2: Open WebUI
- Follow Open WebUI setup guide.
- In the bottom-left corner, click
Usericon, then selectSettings. - In the bottom panel, open
Admin Settings. - In the left sidebar, navigate to Documents.
- Set Embedding Model Engine to OpenAI.
- Enter:
API Base URL:
http://localhost:52625/v1(Open WebUI Desktop) orhttp://host.docker.internal:52625/v1(Open WebUI in Docker)
API KEY:flm(any value works)
Embedding Model:embed-gemma:300m - Save the setting.
- Follow the RAG + FastFlowLM example to launch your Local Private Database with RAG all powered by FLM.