Dates are intentionally absent - this is a one-maintainer project. Order
signals priority, not a schedule.
Shipped - v0.9 “Anchor Engine”
Language packs are no longer required. The RU/UK packs were deleted; many languages now ride one mechanism instead of a pack each:- Semantic anchors - English exemplars classify intents + keyed-fact extraction; the multilingual embedder transfers them cross-lingually.
- ALD (anchor→lexicon distillation) - the cold lexical path self-compiles
from your own traffic into
$PMB_HOME/lang/auto.yaml. - Measured - RU/UK recall top-1 = 1.00; 101-query multilingual eval top-3 ≈ 0.91; a blocking CI gate runs the eval with packs off so recall can’t regress.
Next
Validate ALD on real traffic
Add a cold-path-coverage-over-time metric and report how fast it self-heals
a language in real use.
Latency on commodity hardware
Re-measure recall + anchor p95 on a normal machine and tighten the gate.
Default-on keyed extraction
Promote v0.9 keyed extraction to default only once field false-positive rate
is measured.
Stronger embedder path
Make
bge-m3 a first-class documented upgrade for CJK / lower-resource
languages.Ingest & storage
- Backup via litestream (continuous SQLite replication to a bucket you own).
- Optional cloud-sync - bring-your-own bucket, never a PMB-hosted service.
- tree-sitter project indexing for Rust / TypeScript.
- Image OCR so screenshots and scanned PDFs become searchable memory.