Skip to content

LLM Memory

2 posts with the tag “LLM Memory”

Claude Memory: A Different Philosophy

The two leading AI assistants, Claude and ChatGPT, have adopted completely opposite strategies in implementing their "memory" functions. This difference profoundly reflects their respective product positioning, target user bases, and design philosophies.

Claude's Memory System: An Explicit, Controllable Tool
Section titled “Claude's Memory System: An Explicit, Controllable Tool”

Claude's memory function is designed as a tool that users actively invoke, rather than a continuously running background service. Its main characteristics are:

  1. Blank Slate: Each conversation starts from a clean state without preloading any user profiles or history.
  2. Explicit Invocation: The memory function only activates when users use explicit commands like "What did we discuss last time?"
  3. Raw History Search: It doesn't create AI-generated user summaries or compressed profiles, but instead recalls information by performing real-time searches through users' raw chat history.
  4. Two Main Search Tools:
    • conversation_search: Searches through all historical records based on keywords or topics.
    • recent_chats: Retrieves conversations based on time ranges (e.g., "the last 10 conversations" or "the last week of November last year").
ChatGPT's Memory System: An Implicit, Automatic Experience
Section titled “ChatGPT's Memory System: An Implicit, Automatic Experience”

In contrast to Claude, ChatGPT's memory function is designed for the mass consumer market, characterized by:

  1. Always-On: The memory function loads automatically without user intervention, providing instant personalized experiences.
  2. User Profiling: The system continuously learns user preferences and patterns to build detailed user profiles.
  3. Pursuit of a "Magical" Experience: The goal is to make the product feel intelligent, thoughtful, and seamless, so users don't need to think about how it works.

This design divergence stems from the two companies' different market strategies:

  • Claude Targets Professional Users: Its user base consists mainly of technical professionals like developers and researchers. These users understand how LLMs work, prefer precise control, and accept the additional latency that comes with invoking memory. For them, memory is a powerful, predictable professional tool where privacy and controllability are crucial.

  • ChatGPT Targets the Mass Market: Its user base includes various ordinary consumers like students and parents. They want a product that works out-of-the-box and is easy to use, automatically remembering their information. This is a typical consumer tech strategy: first attract and retain massive users through a "magical" experience, then explore monetization models later.

The author believes that the two giants taking completely opposite paths indicates that the design space for AI memory functions is extremely vast, with no single correct answer. The optimal solution depends on the product's target users and specific needs. Currently, this field is still in its early exploratory stages ("Cambrian explosion"), with major companies trying different approaches, far from establishing industry standards.

Latest Update: Shortly after the article was published, Anthropic (Claude's parent company) announced a new memory feature for its Team and Enterprise accounts that appears closer to ChatGPT's automatic profiling model. This indicates that the development and evolution of AI memory is progressing at an extremely rapid pace.

ChatGPT Memory and the Bitter Lesson

The author of this article reverse-engineered ChatGPT's memory system by directly questioning it, revealing its operational principles and internal structure.

The Four Key Components of ChatGPT's Memory System
Section titled “The Four Key Components of ChatGPT's Memory System”

ChatGPT's memory system primarily consists of four components, all provided to the model during each interaction:

  1. Interaction Metadata:

    • Includes user device information (screen size, browser/OS), usage patterns (topic preferences, message length, activity levels), etc.
    • The model can leverage this data to implicitly infer the user's context (e.g., automatically recognizing iPhone usage), thereby delivering more targeted responses.
  2. Recent Conversation Content:

    • Contains summaries of the user's messages from the last several dozen conversations (excluding AI responses).
    • This helps establish connections across different conversations, allowing the model to better understand context. For instance, after multiple consecutive conversations about travel to Japan, it can infer that "there" refers to Japan.
  3. Model Set Context:

    • Facts explicitly provided by the user, which can be viewed and deleted anytime in the settings—e.g., "I am allergic to shellfish."
    • This is the highest-priority, fully user-controlled "source of truth" that can override information from other memory modules.
  4. User Knowledge Memories:

    • This is the newest and most core component. It consists of highly condensed AI-generated summaries periodically created by OpenAI from the user's extensive conversation history.
    • These memories are invisible and not directly editable by the user. They contain extremely detailed information about the user's profession, interests, projects, technical stack, brand preferences, etc.
    • While incredibly information-dense, they may include outdated or inaccurate content (e.g., a trip the user planned but never took).

The article points out that ChatGPT's memory system does not use complex techniques like Retrieval-Augmented Generation (RAG) or vector databases to filter relevant memories.

Instead, it adopts a "brute force" yet effective approach: during each interaction, it packs all four types of memory information into the model's context window.

This reflects OpenAI's core bets:

  1. The model is sufficiently intelligent: Powerful models can inherently discern and utilize relevant information within massive contexts while ignoring the irrelevant.
  2. Compute and context windows will become increasingly cheaper: As technology advances, the cost of sending all this information will become negligible.

This reaffirms the lesson articulated by reinforcement learning pioneer Rich Sutton in his 2019 essay "The Bitter Lesson"—rather than building complex engineered solutions, it's more effective to dedicate resources to enhancing the model's inherent capabilities and computational power.

ChatGPT's memory functionality resembles the training process of an LLM: "User Knowledge Memories" act like a large but slow-to-update base model, while the other three components function as steering layers for real-time adjustment and correction (similar to RLHF and in-context learning).

  1. User Knowledge Memories: Act like a pre-trained model, condensing long-term information but prone to becoming outdated.
  2. Model Set Context: Equivalent to the user's RLHF, holding the highest priority.
  3. Recent Conversation Content: Analogous to immediate in-context learning.
  4. Interaction Metadata: Functions like system default parameters, providing environmental signals.

Future challenges lie not only in technology (e.g., updating "User Knowledge Memories" more frequently) but also at the product level: how to handle outdated information, how to validate facts, and the privacy and ethical concerns arising from AI building detailed profiles of users.