Memory System

How Edward remembers everything — types, retrieval, extraction, and background enrichment.

Overview

Edward's memory system is what makes him different from a stateless chatbot. Every conversation is mined for memorable information — facts, preferences, context, instructions — and stored in PostgreSQL with vector embeddings. On future turns, relevant memories are retrieved and injected into the LLM context so Edward can reference things you told him weeks ago.

Memory Types

Each memory is classified into one of four types during extraction:

TypeDescriptionExample
factObjective information about the user or world"User's dog is named Luna"
preferenceUser likes, dislikes, or style preferences"Prefers dark mode in all apps"
contextSituational or temporal context"Starting a new job at Acme Corp next Monday"
instructionExplicit directives from the user"Always respond in bullet points"

Temporal Nature

Memories also carry a temporal nature that affects how they're weighted over time:

TemporalDescriptionBehavior
timelessPermanently relevant factsNo decay — always full weight
temporaryShort-lived contextDecays over time, eventually irrelevant
evolvingFacts that may changeBoosted when recently updated, decays otherwise

Memory Tiers

Each memory is assigned a confidence tier:

TierDescription
observationInferred from conversation — may not be explicitly stated
beliefReasonably confident based on context
knowledgeExplicitly stated by the user — high confidence

Hybrid Retrieval

When Edward needs to recall memories, he uses a hybrid scoring approach:

  • 70% vector similarity — pgvector cosine distance using all-MiniLM-L6-v2 embeddings (384 dimensions)
  • 30% BM25 keyword matching — traditional text search for exact term hits

This combination catches both semantically similar memories and ones that share specific keywords. The context budget is capped at 8,000 characters to avoid overwhelming the LLM.

Memory Extraction

After every conversation turn, Edward runs a memory extraction step using Claude Haiku 4.5. The extractor analyzes the conversation and identifies any new memorable information. For each extracted memory, it assigns:

  • Memory type (fact, preference, context, instruction)
  • Importance score (0-10)
  • Temporal nature (timeless, temporary, evolving)
  • Confidence tier (observation, belief, knowledge)

Duplicate detection prevents the same information from being stored multiple times. Existing memories are updated rather than duplicated.

Deep Retrieval

For complex conversations, Edward activates deep retrieval — a pre-turn gate that runs when the message is short or the conversation has reached 3+ turns. It fires 4 parallel memory queries:

  1. The original user message
  2. 3 Haiku-rewritten query variations targeting different angles

Results are deduplicated and merged, giving the LLM a richer context window than a single query would provide.

Reflection

After each turn, a fire-and-forget reflection step generates 3-5 Haiku-crafted queries to find memories related to the current conversation. The results are stored in the memory_enrichments table and loaded on the next turn to provide deeper context. This runs asynchronously and adds zero latency to the current response.

Consolidation

An hourly background loop clusters related memories via Haiku. It creates:

  • Memory connections — links between related memories
  • Memory flags — quality and staleness markers

Consolidation is disabled by default and can be enabled via the REST API or settings UI.

Memory Tools

Edward has direct access to memory management tools (always available, not gated by skills):

ToolDescription
remember_updateCreate or update a memory
remember_forgetDelete a specific memory
remember_searchSearch memories by query