Skip to main content
David Loor
AboutServicesProjectsBlogContact
←Back to Blog

#Gemma

1 post with this tag

Found 1 post
2026-06-08
13 min read

Local Gemma was too slow with AIdaemon until I fixed llama.cpp and the prompt size

I wanted AIdaemon on local Gemma 4 26B through llama.cpp, not Ollama. Generation ran at ~45 tok/s on an M4 Pro. Agent turns still felt stuck because prefill on 14k-token prompts took 8 to 9 seconds before the model wrote a single word.

aisoftware-developmentopen-source

Stay Updated

Get the latest posts and insights delivered to your inbox.

Unsubscribe anytime. No spam, ever.

Blog archive
  • Local Gemma was too slow with AIdaemon until I fixed llama.cpp and the prompt size
David Loor

AI, Cloud & Web Solutions Architect

AboutServicesProjectsBlogBookshelf

© 2026 David Loor. All rights reserved.

davo20019@gmail.com