The shape of it
Your agent calls PMB over MCP (local stdio). Reads go through a hybrid ranker; writes return in under a millisecond and embed in the background. Everything lands in one SQLite file, with vectors next to it.Read - hybrid recall (~35ms warm)
BM25 (lexical) + dense vectors + an entity graph, fused with
Reciprocal-Rank-Fusion and an optional cross-encoder rerank. One call -
prepare(message) - returns project context, surfaced lessons, recent
activity and open goals in 4-16ms.Write - async (sub-ms return)
The MCP tool returns immediately; the embed + vector insert happen on a
background thread. No LLM call on the read or write path, ever, by default.
Memory that doesn’t wait to be asked
The hard part of agent memory isn’t storing - it’s getting the agent to use what’s stored. Soft instructions get skipped, so PMB wires hooks at the protocol level (on Claude Code):Auto-recall
Every prompt is classified (sub-ms) and the matching memory is injected
before the model thinks. The agent never decides to call
recall.Ambient write
If the agent forgets to record its work, PMB synthesizes one entry from the
observed actions - outcome-scored, tagged
source=autowrite, reversible.Session restore
After the context window compacts, PMB rebuilds “where you left off” so the
agent picks the thread back up instead of re-asking you.
Follow-through
At turn end PMB checks which surfaced lessons actually showed up in the
work, and marks them followed - without the model self-reporting.
Dedup, four layers
Exact → semantic → borderline → manual
Exact → semantic → borderline → manual
Exact text match → cosine ≥ 0.92 auto-merge → cosine 0.80-0.92 borderline
(verified later) → manual review in the dashboard. Old values are archived,
never deleted; full history is queryable as-of any point in time.
Multilingual with no language packs: the default embedder covers 50+
languages, so a Russian query finds an English fact. The cold lexical path
self-compiles from your own traffic - a language you use gets faster over
time, zero config.