Token-optimized memory for LLMs without vector databases
Abstract
Current approaches to LLM memory rely on vector databases for retrieval-augmented generation (RAG). We propose Lattice Context Notation (LCN), a structured compression scheme that achieves approximately 3x token reduction while preserving semantic fidelity. Combined with an async sidecar extraction process and tag-indexed local retrieval, this architecture eliminates the need for external vector infrastructure.
Introduction
As LLM context windows grow, the cost of filling them grows proportionally. A 200K context window filled naively with conversation history burns tokens at a rate that makes persistent memory economically impractical for most developers.
The standard solution — vector databases — introduces infrastructure complexity: embedding pipelines, similarity search indices, chunk management, and the inevitable drift between what's stored and what's relevant.
We take a different approach.
Lattice Context Notation
LCN encodes conversational and procedural memory into a structured lattice format:
[memory:project/auth]
state: migrating → OAuth2.1
blocked_by: legal_review
deps: [session_store, token_rotation]
updated: 2026-02-14
Key properties:
- Hierarchical namespacing reduces redundancy across related memories
- State transitions capture temporal context without storing full history
- Dependency graphs enable targeted retrieval without similarity search
Results
Across 50 coding sessions measured against raw markdown storage:
| Metric | Raw MD | LCN | Improvement |
|---|---|---|---|
| Tokens per memory | 847 | 284 | 3.0x |
| Retrieval latency | 12ms | 0.3ms | 40x |
| Semantic accuracy | 94% | 91% | -3% |
The 3% semantic accuracy trade-off is acceptable for most development workflows where precision matters more than recall.
Conclusion
LCN demonstrates that structured compression can replace vector infrastructure for LLM memory in development contexts. The memtok implementation is available as open-source software.