#Gemma

1 post with this tag

Found 1 post

2026-06-08

Local Gemma was too slow with AIdaemon until I fixed llama.cpp and the prompt size

I wanted AIdaemon on local Gemma 4 26B through llama.cpp, not Ollama. Generation ran at ~45 tok/s on an M4 Pro. Agent turns still felt stuck because prefill on 14k-token prompts took 8 to 9 seconds before the model wrote a single word.

aisoftware-developmentopen-source

Get the latest posts and insights delivered to your inbox.

Unsubscribe anytime. No spam, ever.

Blog archive

#Gemma

Local Gemma was too slow with AIdaemon until I fixed llama.cpp and the prompt size

Stay Updated