2. Architecture Patterns
Three dominant design philosophies currently shape agent memory systems in production:
2.1 Vector Store Approach (Memory as Semantic Retrieval)
Philosophy: Store past interactions as dense vector embeddings; retrieve via cosine similarity
How It Works
- Conversations are chunked and embedded into vector space using models like OpenAI's text-embedding-3
- When queried, the agent retrieves the most semantically similar fragments based on cosine distance
- Retrieved fragments are concatenated into the prompt context window
Strengths
- Conceptually simple: one embedding model, one vector database
- Fast: ANN indices enable sub-millisecond retrieval at scale
- Unstructured: easy ingestion of any text-like data
- Proven at scale: platforms like Pinecone, Weaviate, and Milvus serve millions of queries
Limitations
- Surface-level recall: Cosine similarity doesn't understand temporal relationships or structural reasoning
- Retrieval noise: Top-k may return fragments that share vocabulary but not intent
- Lost relationships: A fact about "who worked with whom on what project" is flattened into a vector
- No temporal reasoning: Cannot answer "what happened before X?" reliably
- Recall decay: Empirically, recall drops below 60% in extended, noisy contexts (HaluMem benchmark, 2025)
Memory Vector Store Pattern:
┌──────────────────┐
│ Conversation │
│ "John left │
│ the project" │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Embed to Vector │
│ [0.23, 0.89] │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Store in Index │
│ (HNSW, DiskANN) │
└────────┬─────────┘
│
Query: "Team members"
│
▼
┌──────────────────┐
│ Retrieve Top-K │
│ Fragments by │
│ Cosine Similarity │
└──────────────────┘
2.2 Summarization Approach (Memory as Compression)
Philosophy: Periodically compress raw transcripts into rolling summaries, discarding original detail
How It Works
- Agent runs for N turns, accumulating a transcript
- LLM is prompted to summarize key facts, decisions, and context into a condensed form
- Summary replaces the original transcript in the prompt
- On next query, agent works with summary + recent history
Strengths
- Token-efficient: summaries save 70-90% of token budget
- Unified context: single, coherent narrative rather than fragmented retrieval
- No specialized infrastructure: runs on any LLM via prompting
- Adaptive: human-readable summaries can be manually edited or curated
Limitations
- Information loss: Summarization is lossy; details discarded may matter later
- Latency: Summarizing a 10K-token transcript costs money and time
- Hallucination risk: LLM-generated summaries can misrepresent facts
- Timing: When to summarize? Too frequent = expensive; too infrequent = context grows anyway
- Conflict resolution: If summary says "X" but recent history says "not X," which wins?
2.3 Knowledge Graph Approach (Memory as Structured Relationships)
Philosophy: Organize memories as a dynamic, temporally-aware graph of entities, relationships, and events
How It Works
- Conversations and data are parsed to extract entities (people, places, events) and relationships
- Graph nodes and edges are stored with timestamps and attributes
- Query traverses the graph to retrieve relevant subgraphs or ego-networks
- Retrieved subgraph is serialized into the prompt
Key Innovations in Production Systems (Zep, Graphiti, Mem0+Graph)
- Temporal awareness: Edges carry timestamps; queries can ask "who was involved in Q3?"
- Hybrid indexing: Combines semantic embeddings, keyword search, and graph traversal
- Bitemporal modeling: Tracks both when events occurred (event time) and when they were recorded (assertion time)
- Hierarchical subgraphs: Large graphs are partitioned for efficient retrieval
Strengths
- Structured reasoning: Answers "who did what with whom and when?"
- Complex traversal: Can follow multi-hop paths (A→B→C→answer)
- Temporal reasoning: Supports time-bounded queries
- Proven superior accuracy: Zep outperforms MemGPT baseline by 18.5% on deep memory retrieval
- Reduced latency: Structured queries run in ~constant time independent of graph scale
Limitations
- Construction cost: Entity extraction, relationship detection, and graph maintenance are expensive
- Consistency challenges: Concurrent updates to graph nodes/edges can introduce conflicts
- Hallucination from updates: Inconsistent node merges or edge deletions can corrupt reasoning
- Operational complexity: Requires careful schema design and continuous validation
- Tooling overhead: Need custom extraction and update pipelines
Knowledge Graph Memory Pattern:
┌──────────────┐
│ Conversation │
└──────┬───────┘
│
▼
┌─────────────────────┐
│ Entity Extraction │
│ (Entities & Rels) │
└──────┬──────────────┘
│
▼
┌─────────────────────┐
│ Store as Temporal │
│ Knowledge │
│ Graph (TKG) │
│ │
│ John ──works──→ │
│ (2025-Q1) Project A│
│ │
└──────┬──────────────┘
│
Query: "John's Q1 projects"
│
▼
┌──────────────────────┐
│ Graph Traversal + │
│ Temporal Filtering │
│ → Subgraph Retrieval │
└──────────────────────┘
2.4 Hybrid & Multi-Tier Architectures
Production systems increasingly combine approaches:
MemGPT-Style (Virtual Context Management)
Inspired by hierarchical memory in operating systems, MemGPT divides memory into tiers:
- Core Memory (In-Context, "RAM"): Current task state, recent events. Limited size (e.g., 2K tokens)
- Archival Memory (Persistent, "Disk"): Full transcript history. Externally stored
- Control Flow: LLM-callable functions to read/write between core and archival
Agent autonomously "pages" relevant information between tiers, mimicking OS virtual memory. This achieves context windows far exceeding the model's native limit.
Mem0 (Structured Summarization + Graph)
Mem0 combines extraction, consolidation, and retrieval:
- Automatically extracts facts, preferences, and relationships from conversations
- Consolidates facts to avoid duplication and resolve conflicts
- Supports both vector and graph-based retrieval
- Empirical results: 26% relative accuracy improvement, 91% latency reduction, 90% token savings
Zep (Temporal Knowledge Graph)
Zep is a memory-as-a-service platform built on temporal knowledge graphs. Key features:
- Auto-extracts entities, relationships, and facts from multiple data sources
- Stores as a time-aware graph; traversal is aware of temporal constraints
- Integrated semantic embeddings for hybrid search
- Benchmarks: Outperforms MemGPT on Deep Memory Retrieval (DMR) and LongMemEval
7. References & Sources
This report synthesizes findings from academic research, technical blogs, and production system documentation:
7.1 Foundational Papers (arXiv)
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
arXiv:2504.19413 (April 2025)
Proposes Mem0, a scalable memory-centric architecture with 26% accuracy improvement, 91% latency reduction, 90% token savings.
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
arXiv:2501.13956 (January 2025)
Introduces temporal knowledge graphs for agent memory with 18.5% accuracy gains over MemGPT on deep memory retrieval.
MemGPT: Towards LLMs as Operating Systems
arXiv:2310.08560 (October 2023)
Foundational work on virtual context management and hierarchical memory tiers for LLMs.
A Survey on the Memory Mechanism of Large Language Model based Agents
arXiv:2404.13501 (April 2024)
Comprehensive survey covering memory mechanisms, evaluation, and future directions.
MemoriesDB: A Temporal-Semantic-Relational Database for Long-Term Agent Memory
arXiv:2511.06179 (November 2025)
Proposes temporal-semantic-relational database combining SQL, vectors, and temporal reasoning.
HaluMem: Evaluating Hallucinations in Memory Systems of Agents
arXiv:2511.03506 (January 2026)
Comprehensive benchmark revealing memory hallucination patterns: recall <60%, accuracy <62%.
Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory
arXiv:2603.02473 (February 2026)
Shows performance breakdowns manifest at retrieval stage rather than utilization.
LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions
arXiv:2509.18970 (September 2025)
Taxonomy of hallucinations in agent systems, including memory forgetting failures.
Memory in the Age of AI Agents: A Survey
arXiv:2512.13564 (December 2025)
Latest comprehensive survey outlining memory automation, RL integration, multimodal memory, and trustworthiness as research frontiers.
A-Mem: Agentic Memory for LLM Agents
arXiv:2502.12110 (February 2025)
Proposes agentic memory mechanisms with specialized extraction and consolidation.
QSAF: A Novel Mitigation Framework for Cognitive Degradation in Agentic AI
arXiv:2507.15330 (July 2025)
Addresses cognitive degradation in agents: reasoning breakdown, memory retrieval failure, planning loss, output decay.
AI Agents Need Memory Control Over More Context
arXiv:2601.11653 (January 2026)
Demonstrates multi-turn failures driven by weak memory control, not missing knowledge.
Agent Memory Below the Prompt: Persistent KV Cache for Multi-Agent LLM Inference
arXiv:2603.04428 (February 2026)
Addresses edge device constraints and KV cache persistence for multi-agent workflows.
Memory OS of AI Agent
arXiv:2506.06326 (July 2025)
Introduces MemoryOS, a systematic operating system for agent memory management.
EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning
arXiv:2601.xxxx (January 2026)
Self-organizing memory system for long-horizon reasoning tasks.
7.2 Technical Blog Posts & Industry Articles
Memory for AI Agents: A New Paradigm of Context Engineering
The New Stack (January 16, 2026)
In-depth exploration of memory architectures, context rot, and three design philosophies.
Building smarter AI agents: AgentCore long-term memory deep dive
AWS Machine Learning Blog (October 15, 2025)
Amazon Bedrock AgentCore memory system for extraction, consolidation, and retrieval.
AI Agent Memory Storage: SQL vs Vector Databases - Complete Guide
BSWEN Documentation (March 6, 2026)
Comparison of SQL, vector databases, and hybrid approaches for agent memory.
Building AI Agents with Persistent Memory Using Google ADK and Milvus
Milvus Blog
Integration patterns for production-ready agents with long-term memory.
Building AI Agents with Redis Memory Management
Redis Blog (April 29, 2025)
Short-term and long-term memory using Redis, LangGraph integration.
Building AI Agents with Persistent Memory: A Unified Database Approach
Tiger Data Blog (January 20, 2026)
PostgreSQL + TimescaleDB + pgvectorscale for unified agent memory.
Postgres for Agents
Tiger Data Blog (October 29, 2025)
PostgreSQL ecosystem for agent memory: pgvectorscale, pgai, full-text search.
How Do Vector Databases Power Agentic AI's Memory and Knowledge Systems?
Monetizely (August 30, 2025)
Vector databases and knowledge graphs in agentic AI memory systems.
Graphiti: Knowledge Graph Memory for an Agentic World
Neo4j Developer Blog (August 7, 2025)
Graphiti framework for real-time knowledge graphs in Neo4j.
Building AI Agents with Knowledge Graph Memory: A Comprehensive Guide to Graphiti
Medium by Saeed Hajebi (June 20, 2025)
Detailed guide to knowledge graph memory systems for agents.
Why LLM Memory Still Fails - A Field Guide for Builders
DEV Community (July 29, 2025)
Practical analysis of RAG limitations and agentic RAG approaches.
How to Make AI Agents Accurate: Stop Treating Memory Like Chat History
Medium by Dinand Tinholt (December 17, 2025)
Signal-to-noise ratio issues in naive memory approaches.
Design Patterns for Long-Term Memory in LLM-Powered Architectures
Serokell Blog (December 9, 2025)
Architectural patterns for persistent agent memory.
Top 6 Reasons Why AI Agents Fail in Production and How to Fix Them
Maxim AI (October 17, 2025)
Six failure modes including hallucination, prompt injection, latency, context window limitations.
Agent State Management: Redis vs Postgres for AI Memory
SitePoint (February 2026)
Comparison of Redis and PostgreSQL for agent memory tiers.
Agentic AI: Implementing Long-Term Memory
Towards Data Science (June 24, 2025)
Overview of vectors and knowledge graphs for agent memory.
Vector Database Use Cases: RAG, Search & More
Redis Blog (February 2026)
Vector databases as production-ready infrastructure for AI agents.
PgVector for AI Memory in Production Applications
Ivan Turkovic Blog (November 16, 2025)
PostgreSQL pgvector for production agent memory systems.
How We Made PostgreSQL a Better Vector Database
Tiger Data Blog (December 9, 2025)
pgvectorscale performance improvements and deployment patterns.
7.3 GitHub Repositories & Open Source Projects
7.4 Preprints & Emerging Work
Memory in LLM-based Multi-agent Systems: Mechanisms, Challenges, and Collective
TechRxiv Preprint
Early-stage research on multi-agent memory challenges.
Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory
OpenReview (October 8, 2025)
Structured memory for multi-agent LLM systems.
Knowledge Graph-Guided Retrieval Augmented Generation
arXiv:2502.06864 (February 2025)
KG-guided RAG addressing hallucination issues.
Simple is Effective: The Roles of Graphs and LLMs in Knowledge-Graph-Based RAG
OpenReview (October 4, 2024)
Analysis of graph-based vs vector RAG tradeoffs.
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments
arXiv:2602.06075 (February 2026)
Benchmark for agent memory in mobile and GUI scenarios.
ICLR 2026 Workshop Proposal: MemAgents - Memory for LLM-Based Agentic Systems
OpenReview
Emerging workshop establishing agent memory as a research focus area.
7.5 Industry Resources & Databases