#Local Llm

3 posts with this tag

Found 3 posts

2026-06-18

My agent kept losing facts it had already saved

AIdaemon could save a fact and then fail to find it when I asked for it. Semantic search ranks by what's on the same topic, so the short fact holding the real answer gets buried under wordier ones nearby. A reranker reads each candidate against the question and puts the right one back on top.

aisoftware-developmentopen-source

2026-06-08

13 min read

Local Gemma was too slow with AIdaemon until I fixed llama.cpp and the prompt size

I wanted AIdaemon on local Gemma 4 26B through llama.cpp, not Ollama. Generation ran at ~45 tok/s on an M4 Pro. Agent turns still felt stuck because prefill on 14k-token prompts took 8 to 9 seconds before the model wrote a single word.

aisoftware-development+1

2025-11-01

9 min read

How to run open source LLMs (AI) on your computer?

Learn how to run open source AI models (LLMs) directly on your computer. No cloud, no subscriptions, complete privacy. A beginner-friendly guide using llama.cpp.

AITutorial+1

Get the latest posts and insights delivered to your inbox.

Unsubscribe anytime. No spam, ever.

Blog archive

#Local Llm

My agent kept losing facts it had already saved

Local Gemma was too slow with AIdaemon until I fixed llama.cpp and the prompt size

How to run open source LLMs (AI) on your computer?

Stay Updated