Memory System

How Edward remembers everything — types, retrieval, extraction, and background enrichment.

Overview

Edward's memory system is what makes him different from a stateless chatbot. Every conversation is mined for memorable information — facts, preferences, context, instructions — and stored in PostgreSQL with vector embeddings. On future turns, relevant memories are retrieved and injected into the LLM context so Edward can reference things you told him weeks ago.

Memory Types

Each memory is classified into one of four types during extraction:

Type	Description	Example
`fact`	Objective information about the user or world	"User's dog is named Luna"
`preference`	User likes, dislikes, or style preferences	"Prefers dark mode in all apps"
`context`	Situational or temporal context	"Starting a new job at Acme Corp next Monday"
`instruction`	Explicit directives from the user	"Always respond in bullet points"

Temporal Nature

Memories also carry a temporal nature that affects how they're weighted over time:

Temporal	Description	Behavior
`timeless`	Permanently relevant facts	No decay — always full weight
`temporary`	Short-lived context	Decays over time, eventually irrelevant
`evolving`	Facts that may change	Boosted when recently updated, decays otherwise

Memory Tiers

Each memory is assigned a confidence tier:

Tier	Description
`observation`	Inferred from conversation — may not be explicitly stated
`belief`	Reasonably confident based on context
`knowledge`	Explicitly stated by the user — high confidence

Hybrid Retrieval

When Edward needs to recall memories, he uses a hybrid scoring approach:

70% vector similarity — pgvector cosine distance using all-MiniLM-L6-v2 embeddings (384 dimensions)
30% BM25 keyword matching — traditional text search for exact term hits

This combination catches both semantically similar memories and ones that share specific keywords. The context budget is capped at 8,000 characters to avoid overwhelming the LLM.

Memory Extraction

After every conversation turn, Edward runs a memory extraction step using Claude Haiku 4.5. The extractor analyzes the conversation and identifies any new memorable information. For each extracted memory, it assigns:

Memory type (fact, preference, context, instruction)
Importance score (0-10)
Temporal nature (timeless, temporary, evolving)
Confidence tier (observation, belief, knowledge)

Duplicate detection prevents the same information from being stored multiple times. Existing memories are updated rather than duplicated.

Deep Retrieval

For complex conversations, Edward activates deep retrieval — a pre-turn gate that runs when the message is short or the conversation has reached 3+ turns. It fires 4 parallel memory queries:

The original user message
3 Haiku-rewritten query variations targeting different angles

Results are deduplicated and merged, giving the LLM a richer context window than a single query would provide.

Reflection

After each turn, a fire-and-forget reflection step generates 3-5 Haiku-crafted queries to find memories related to the current conversation. The results are stored in the memory_enrichments table and loaded on the next turn to provide deeper context. This runs asynchronously and adds zero latency to the current response.

Consolidation

An hourly background loop clusters related memories via Haiku. It creates:

Memory connections — links between related memories
Memory flags — quality and staleness markers

Consolidation is disabled by default and can be enabled via the REST API or settings UI.

Memory Tools

Edward has direct access to memory management tools (always available, not gated by skills):

Tool	Description
`remember_update`	Create or update a memory
`remember_forget`	Delete a specific memory
`remember_search`	Search memories by query