Skip to main content

The big picture

  • Storage - every event in SQLite; vectors in LanceDB beside it. Both are files on disk; copy them anywhere with cp.
  • Recall - BM25 (lexical) + dense vector + entity graph, fused with Reciprocal-Rank-Fusion plus an optional cross-encoder rerank.
  • Writes - async; the MCP tool returns sub-ms, embedding happens on a background thread.

One warm daemon, many clients

Rather than each agent loading its own model and vector store, PMB runs one warm daemon (Engine + model + LanceDB) that N clients share for instant recall.
ModeTransportWhat runs
Local (default)daemon + HTTP, stdio proxy where neededOne warm runtime; Claude Code/Cursor point at the daemon, Codex via pmb mcp proxy.
Local stdiostdioOne per-client process - compatible, but more cold-start cost.
Team / multi-machinestreamable HTTP + bearer tokenOne shared server for remote agents (behind a private network).

Two surfaces, one engine

PMB reaches the agent two ways, and they’re complementary:

MCP tools

The deliberate ceiling - prepare, recall, record_batch… the agent calls these on purpose.

Lifecycle hooks

The involuntary floor - auto-recall, ambient write, session restore, follow-through. Memory works without the model remembering to call a tool.
On hook-enabled hosts (Claude Code, Codex), MCP and hooks are kept together