How it works

The shape of it

Your agent calls PMB over MCP (local stdio). Reads go through a hybrid ranker; writes return in under a millisecond and embed in the background. Everything lands in one SQLite file, with vectors next to it.

Read - hybrid recall (~35ms warm)

BM25 (lexical) + dense vectors + an entity graph, fused with Reciprocal-Rank-Fusion and an optional cross-encoder rerank. One call - prepare(message) - returns project context, surfaced lessons, recent activity and open goals in 4-16ms.

Write - async (sub-ms return)

The MCP tool returns immediately; the embed + vector insert happen on a background thread. No LLM call on the read or write path, ever, by default.

Store - files on disk

Every event lives in SQLite; vectors live in LanceDB beside it. Both are files - copy them anywhere with cp, commit them, put them on a USB stick.

Memory that doesn’t wait to be asked

The hard part of agent memory isn’t storing - it’s getting the agent to use what’s stored. Soft instructions get skipped, so PMB wires hooks at the protocol level (on Claude Code):

Auto-recall

Every prompt is classified (sub-ms) and the matching memory is injected before the model thinks. The agent never decides to call recall.

Ambient write

If the agent forgets to record its work, PMB synthesizes one entry from the observed actions - outcome-scored, tagged source=autowrite, reversible.

Session restore

After the context window compacts, PMB rebuilds “where you left off” so the agent picks the thread back up instead of re-asking you.

Follow-through

At turn end PMB checks which surfaced lessons actually showed up in the work, and marks them followed - without the model self-reporting.

Dedup, four layers

Exact → semantic → borderline → manual

Exact text match → cosine ≥ 0.92 auto-merge → cosine 0.80-0.92 borderline (verified later) → manual review in the dashboard. Old values are archived, never deleted; full history is queryable as-of any point in time.

Multilingual with no language packs: the default embedder covers 50+ languages, so a Russian query finds an English fact. The cold lexical path self-compiles from your own traffic - a language you use gets faster over time, zero config.

Troubleshooting Memory model

⌘I

​The shape of it

​Memory that doesn’t wait to be asked

Auto-recall

Ambient write

Session restore

Follow-through

​Dedup, four layers

The shape of it

Memory that doesn’t wait to be asked

Dedup, four layers