Supplement

Evaluated Sources & References

Companion to: Living Memory & Alexandria Technical Overview
Evaluation period: February – March 2026 (6+ sessions)
Date: March 2026

How to Read This Supplement

The technical overview makes claims about the state of the field, validated strengths, and gaps. This supplement provides the sources behind those claims so readers can evaluate them independently.

Between February and March 2026, we conducted a structured pressure-test of the Living Memory and Alexandria architecture against 20+ external sources spanning academic papers, open-source projects, industry articles, and commercial products. Each source was evaluated in a dedicated document that compares its approach to ours, identifies what we should adopt, and notes where our design diverges.

Not all original source materials are preserved (some were live demos, videos, or paywalled content). Where external links exist, they are included below. Our evaluation documents are available in the project repository under docs/research/.

Key Claims and Their Sources

Claims from the technical overview, mapped to the sources that support or challenge them.

Claim Supporting sources Notes
Propositional framing has no evaluated equivalent MIE, Cognee, Letta/MemGPT, Hindsight, MAPLE, Memory Survey, CrewAI MIE stores flat text; Cognee uses triplets; Letta uses filesystem; Hindsight uses embeddings. None render claims with explicit reasoning chains.
Memory evolution (decay, condensation, supersession) ahead of the field Engram, Cognee, Memory Survey, Letta/MemGPT, MAPLE Engram has decay tiers (closest). Cognee has memify (spaced repetition). Most systems implement formation and retrieval only, not evolution.
Forgetting design is the most complete evaluated Memory Survey, Engram, MAPLE, Letta/MemGPT Memory Survey places forgetting at the apex of its taxonomy. Engram has partial decay. No evaluated system has our full forgetting pipeline (pruning + supersession + condensation + stale detection).
3D hierarchical token-level memory at the apex of the 2026 survey's taxonomy Memory in the Age of Agents Survey (Hu et al.) The survey's taxonomy has six levels; our design maps to the highest. Direct correspondence documented in evaluation.
Session lifecycle validated independently Anthropic Memory Tool, MAPLE, Memory Survey Anthropic uses context editing + auto-injected prompts. MAPLE uses M/L/P with session boundaries. Survey validates lifecycle as a design pattern.
No one is building the full knowledge engineering discipline All 20+ sources No evaluated system treats knowledge claims as installed capabilities requiring dependency tracking, impact analysis, capability testing, and deprecation management. This is an absence claim; future sources may disprove it.
Memory poisoning amplifies blast radius via propositional framing Memory Poisoning Threat Model (internal), ATHF Internal analysis. ATHF's LOCK pattern validates the need for trust boundaries on procedural memory.
For simple retrieval, mechanism doesn't matter (Letta 74% on LoCoMo with filesystem) Letta/MemGPT, Hindsight Letta's LoCoMo results. Hindsight scores higher (TEMPR) but uses a different benchmark. The point: graph value is in discovery, reasoning, and evolution, not speed.

Papers and Surveys

MAPLE: Memory, Learning, Personalization Framework Paper
Deepak Babu Piskala — ALA 2026 @ AAMAS 2026
Introduced the M/L/P decomposition we adopted for our architecture. Context rot taxonomy (poisoning, distraction, confusion, clash). Validated hierarchical token-level memory and event-triggered learning. Source of our conflict resolution gap identification.
Contributed: M/L/P architecture, context rot taxonomy, event-triggered learning design
Memory in the Age of Agents (Survey) Paper
Hu et al. — NUS, Fudan, Renmin, Peking, Oxford, Georgia Tech
Comprehensive 2026 survey establishing the Forms-Functions-Dynamics taxonomy for agent memory. Places 3D hierarchical token-level memory at the apex. Validates session lifecycle, forgetting-as-architecture, and episodic memory as a structural gap across the field.
Contributed: Taxonomy placement, episodic memory gap validation, offline consolidation cycle
ALMA: Learning to Continually Learn via Meta-learning Agentic Memory Designs Paper
zksha
Meta-learns memory designs as executable code. Reflect-generate-verify and examine-before-commit patterns. AlfWorld/TextWorld benchmarks. Validated our approach to memory as active architecture, not passive storage.
Contributed: Reflect-generate-verify pattern, examine-before-commit discipline
Everything is Context: Agentic File System Paper
Xu et al. — CSIRO Data61, ArcBlock, U Tasmania, UNSW
File-system abstraction for context engineering. Introduced Constructor/Updater/Evaluator vocabulary. Validated the concept of context manifest (what was selected, excluded, and why).
Contributed: "Context Evaluator" as named architectural role, context manifest concept
Cornelius: Agentic Note-Taking (24 Articles) Article series
Cornelius (@molt_cornelius)
24-article series on knowledge management. Article 24 ("What Search Cannot Find") crystallized the graph-vs-search distinction: search retrieves what you ask for; graph traversal discovers what you didn't know to ask. Foundation for our knowledge graph architecture.
Contributed: Graph-vs-search framing, metabolism metaphor, Alexandria naming inspiration

Open-Source Implementations

Letta / MemGPT Open source
UC Berkeley; letta-ai
OS-inspired virtual context management. MemFS (Context Repositories) with git-backed memory. Scores 74% on LoCoMo with filesystem-only approach. Critical benchmark: proved that for simple retrieval, the mechanism doesn't matter. Our graph's value must come from discovery, reasoning, and evolution.
Contributed: Benchmarking gap identification, filesystem-vs-graph strategic framing
Cognee Open source
topoteretes
Hybrid RAG knowledge engine with knowledge graph construction from unstructured data. Memify (spaced repetition, decay) for memory strengthening. Identified our gap: we have forgetting but not the inverse (demand-driven increase of retrieval probability).
Contributed: Memory strengthening gap, knowledge quality evaluation layer concept
Engram Open source
Terronex-dev
TypeScript neural memory format with decay tiers, quality metadata, and HNSW indexing. Hierarchical temporal multi-modal memory. Closest to our decay architecture. Validated our approach; we adopted their DecayConfig concept.
Contributed: Decay tiers design, time decay in retrieval, quality block pattern
MIE (Memory Intelligence Engine) Open source
Kraklabs
Go-based memory graph with five typed nodes (Facts, Decisions, Entities, Events, Topics). MCP server. CozoDB backend. Agent-as-Evaluator pattern. Confirmed our propositional framing advantage: MIE stores flat text; we store claims with reasoning chains.
Contributed: Confirmed propositional framing differentiation, Agent-as-Evaluator pattern
CIE (Code Intelligence Engine) Open source
Kraklabs
Code intelligence engine with CozoDB backend, Tree-sitter parsing, 20+ MCP tools. Operational complement to MIE. Architecture reference for how code-specific knowledge could integrate with a general knowledge graph.
Contributed: Architecture reference for code-specific knowledge integration
memU Open source
NevaMind-AI
24/7 proactive memory framework with two-bot architecture. Scores 92% on LoCoMo. Memory-as-filesystem metaphor. Demonstrated that proactive memory (suggesting relevant context unprompted) is valuable and feasible.
Contributed: Proactive memory concept, LoCoMo benchmark reference
ATHF (Agentic Threat Hunting Framework) Open source
Nebulock-Inc
LOCK pattern for agent trust boundaries. MITRE ATT&CK coverage mapping. Domain-specific application of our agentic memory pattern in cybersecurity. Validated the need for trust boundaries on procedural memory (skills).
Contributed: LOCK pattern for trust boundaries, skill signature verification concept
QMD v2 Open source
tobi
Optional P-layer retrieval with vault search and SDK. Evaluated as potential retrieval component. Decision: keep boundary with getContext; QMD operates at a different layer.
Contributed: Confirmed getContext retrieval boundary is correct

Industry Sources

Hindsight (Vectorize) Commercial
Vectorize, Inc.
Retain/recall/reflect architecture with TEMPR (semantic, BM25, graph, temporal) retrieval. LongMemEval SOTA. Closest production analogue to our architecture. Identified temporal retrieval and reflect-style API as gaps in our design.
Contributed: Temporal retrieval gap, reflect-style API design, closest commercial analogue
Anthropic Memory Tool Commercial
Anthropic
memory_20250818: file-system model with context editing and auto-injected prompts. Validated session lifecycle pattern independently. Our architecture is ahead on structured learning, forgetting, and knowledge graph; Anthropic is ahead on seamless tool integration.
Contributed: Independent validation of session lifecycle, context injection pattern
LangChain Agent Builder Memory Industry
LangChain
Memory-as-filesystem approach following the COALA framework. Validated /remember commands, scheduled reflection, and agent-suggested compaction as practical patterns.
Contributed: Validated slash commands, scheduled reflection, compaction suggestions
CrewAI Cognitive Memory Industry
João Moura, CrewAI
Encode/consolidate/recall/extract/forget pipeline. Retrieval confidence scoring. Evidence gaps tracking. Composite scoring for claim quality. Validated our pipeline phases; introduced evidence_gaps as a concept we should consider.
Contributed: Retrieval confidence concept, evidence gaps tracking pattern
Context Graphs: AI's Trillion Dollar Opportunity Industry
Jaya Gupta & Ashu Garg, Foundation Capital
VC thesis on context graphs as the next infrastructure layer for AI. Distinguishes systems of record (current state) from context graphs (decision traces: why, who, when, what precedent). Our architecture aligns with this framing through decision traces and entity-anchored claims.
Contributed: Context graph framing, validation of decision traces as architectural primitive
Google Always-On Memory Agent Industry
Google Cloud
Single-user consolidation agent with timer-based memory. Uses Gemini 2.0 Flash. No entity graph, no multi-agent, no forgetting. Reference point for the simplest viable memory agent; our architecture addresses the gaps this design doesn't attempt.
Contributed: Baseline reference for minimal viable memory agent
Monigatti: Memory in AI Agents Industry
Leonie Monigatti
Taxonomy overview covering CoALA and Letta frameworks. Distinction between "agent memory" (passive store) and "agentic memory" (self-managing). Ecosystem map of memory implementations. Identified episodic memory as a structural gap across the field.
Contributed: Agent vs agentic memory distinction, episodic memory gap, ecosystem mapping
McGowan: Personal Agent Memory System Industry
Christopher McGowan
Practical demonstration of /nap, /wake, /take note commands for agent memory. Pre-compaction checkpoint pattern. Validated the federated memory principle: lightweight local memory bridged to a governed store.
Contributed: Slash command patterns, nap/wake lifecycle, federated memory principle

Internal Evaluations

Memory Poisoning Threat Model Internal
Internal threat analysis identifying three attack vectors (direct injection, reflection poisoning, digestive pipeline injection) and the blast radius amplification from propositional framing. Led to the Memory Integrity and Trust Model ADR and Phase 1 safety roadmap.
Contributed: Trust tiers design, audit log requirements, security gap identification
8 RAG Architectures Framework Evaluation Internal
Mapped eight RAG architecture patterns (Naive through Agentic) against our design. Identified gaps in Adaptive routing, Corrective RAG (read-time validation), and HyDE for semantic discovery. Confirmed our positioning beyond standard RAG patterns.
Contributed: RAG evolution framing, Corrective RAG gap, Adaptive routing design
Consolidated Pressure-Test Synthesis Internal
Master synthesis document consolidating findings from all sessions into 45 recommendations (R1-R44) and 25 design items (D1-D25). Established the seven critical gaps, five validated strengths, and the phased roadmap.
Contributed: Roadmap structure, gap prioritization, strength validation

How We Evaluated

Each source was evaluated using a consistent approach:

1. Understand the source on its own terms: what problem does it solve, for whom, with what trade-offs?

2. Map to our architecture: where does it overlap, where does it diverge, where does it address a gap we have?

3. Identify adoptable patterns: what should we take from this source? What's the minimal useful adoption?

4. Pressure-test our assumptions: does this source invalidate or weaken any of our architectural bets?

5. Document honestly: where the source is ahead of us, say so. Where we're ahead, explain why with evidence.

Limitations

This is not a formal systematic review. Sources were selected based on relevance, availability, and discovery during active research, not from a comprehensive literature search. Some sources were evaluated from blog posts or README files, not from peer-reviewed publications. The evaluation was conducted by a small team and reflects our architectural perspective. We encourage readers to evaluate the original sources independently and draw their own conclusions.