Skip to content

Blog

What kind of education should children receive to avoid being replaced by AI?

The ability to engage in continuous learning, adapt quickly, demonstrate resilience, understand human nature, and collaborate globally will be key to remaining irreplaceable by AI in the future.

  1. Learning Ability:
    It involves not only acquiring knowledge but also mastering the methods of learning. Fostering critical thinking, problem-solving skills, and the habit of self-directed learning enables children to continuously grow and evolve in an era of rapidly changing information.

  2. Adaptability:
    The capacity to flexibly adjust thinking and behavior in response to rapidly evolving technologies, industries, and social environments. This includes embracing new technologies, coping with uncertainty, and quickly finding one’s place in new contexts.

  3. Resilience:
    The mental strength to recover from failure and keep moving forward. It involves not only withstanding pressure and challenges but also transforming setbacks into opportunities for growth, while maintaining a positive mindset and motivation over the long term.

  4. Understanding Human Needs:
    Cultivating empathy and insight to genuinely understand others’ problems and expectations. This is not only the foundation for creating valuable products and services but also key to demonstrating the irreplaceable value of humans in an era of human-machine coexistence.

  5. Engaging with the World:
    Possessing a global perspective and cross-cultural communication skills to collaborate effectively with people from diverse backgrounds. At the same time, it involves understanding the relationship between society, technology, and ethics, and actively participating in building a responsible and sustainable future.

The Opportunities, Advantages, and Business Models of AI Startups

According to Aaron Levie, founder of Box, the current AI wave presents a historic window of opportunity for startups. He believes that the true disruptive power of AI lies in its ability to solve problems that traditional software cannot handle, particularly those related to the vast amounts of "unstructured data" within enterprises, thereby creating entirely new markets and business models.

  • Core Opportunity: Unlocking the Value of "Unstructured Data." 80% of corporate data (such as contracts, documents, emails, and presentations) is unstructured and was previously impossible to automate. AI agents now enable computers to "read" and manipulate this data, allowing businesses to transform this information into queryable, automatable knowledge bases. This represents a massive blue-ocean market.
  • Finding New "Nouns and Verbs." Traditional enterprise software markets (e.g., CRM, HR systems) are becoming saturated. The opportunity for AI startups lies in identifying specialized areas that have historically relied entirely on human effort and lacked mature software solutions (e.g., specific legal tasks, niche market research) and leveraging AI agents to "productize" and "software-ize" them for the first time.
  • Enabling Economically Infeasible Work. Many valuable tasks that were not executed due to high labor costs (e.g., translating marketing materials into 100 languages) can now be accomplished with low-cost AI agents. This creates new growth pathways for businesses and opportunities for startups serving these emerging needs.
  • Historic Window of Opportunity. Levie emphasizes that the next 2–3 years are critical for birthing the next wave of billion-dollar companies. Once this window closes, the market landscape will stabilize, and the cost of disruption will rise significantly.
  • Gaining Asymmetric Leverage. Large companies (e.g., Amazon) may use AI primarily to improve efficiency and reduce costs. For startups, however, AI serves as a powerful lever, enabling a 50-person team to achieve the output of a 500-person team, thereby accelerating growth in product development, market expansion, and customer service.
  • Agility and Focus on Emerging Markets. Established giants like Workday will prioritize providing AI services to their existing tens of thousands of large enterprise customers. This leaves tens of millions of small and medium-sized businesses globally, as well as niche markets not yet covered by incumbents, as "uncharted territory" for startups to capture.
  • Focus on "Core" Business, Avoiding Internal Competition. Most companies will not develop all their internal software (e.g., HR systems) in-house, as it falls outside their "core" business. They prefer to purchase specialized third-party solutions. Therefore, startups need not worry about customers using AI to "DIY" and replicate their products, as long as they can deliver stable, professional services.
  • Shifting from "Per-Seat Pricing" to "Consumption/Value-Based Pricing." Traditional SaaS models charge based on the number of users (seats), which has a limited market ceiling. AI agents break this mold, allowing startups to charge based on the volume of work performed (e.g., the number of contracts reviewed, reports generated) rather than per user.
  • Value-Based Pricing, Not Cost-Based. The marginal cost of AI tasks (e.g., token fees) may be extremely low (e.g., $0.10), but startups can charge significantly more (e.g., $2.00) because this price remains highly attractive compared to the original labor cost (e.g., $10.00). Profit margins depend on the value-added software, workflows, and unique context built on top of the underlying AI models.
  • Hybrid Subscription and Consumption Models. Pure consumption-based models can lead to revenue volatility. A better approach is a hybrid model: a base subscription fee plus overage charges based on usage. This ensures recurring revenue while allowing startups to benefit from increased customer usage.
  • Leveraging Industry "Deflationary Economics." The underlying costs of AI and cloud computing (e.g., computing power, storage) will continue to decline, but software service prices typically remain stable. This means that, as long as the product continues to innovate, the company's profit margins will naturally increase over time, creating a highly favorable business environment.

Claude Memory: A Different Philosophy

The two leading AI assistants, Claude and ChatGPT, have adopted completely opposite strategies in implementing their "memory" functions. This difference profoundly reflects their respective product positioning, target user bases, and design philosophies.

Claude's Memory System: An Explicit, Controllable Tool
Section titled “Claude's Memory System: An Explicit, Controllable Tool”

Claude's memory function is designed as a tool that users actively invoke, rather than a continuously running background service. Its main characteristics are:

  1. Blank Slate: Each conversation starts from a clean state without preloading any user profiles or history.
  2. Explicit Invocation: The memory function only activates when users use explicit commands like "What did we discuss last time?"
  3. Raw History Search: It doesn't create AI-generated user summaries or compressed profiles, but instead recalls information by performing real-time searches through users' raw chat history.
  4. Two Main Search Tools:
    • conversation_search: Searches through all historical records based on keywords or topics.
    • recent_chats: Retrieves conversations based on time ranges (e.g., "the last 10 conversations" or "the last week of November last year").
ChatGPT's Memory System: An Implicit, Automatic Experience
Section titled “ChatGPT's Memory System: An Implicit, Automatic Experience”

In contrast to Claude, ChatGPT's memory function is designed for the mass consumer market, characterized by:

  1. Always-On: The memory function loads automatically without user intervention, providing instant personalized experiences.
  2. User Profiling: The system continuously learns user preferences and patterns to build detailed user profiles.
  3. Pursuit of a "Magical" Experience: The goal is to make the product feel intelligent, thoughtful, and seamless, so users don't need to think about how it works.

This design divergence stems from the two companies' different market strategies:

  • Claude Targets Professional Users: Its user base consists mainly of technical professionals like developers and researchers. These users understand how LLMs work, prefer precise control, and accept the additional latency that comes with invoking memory. For them, memory is a powerful, predictable professional tool where privacy and controllability are crucial.

  • ChatGPT Targets the Mass Market: Its user base includes various ordinary consumers like students and parents. They want a product that works out-of-the-box and is easy to use, automatically remembering their information. This is a typical consumer tech strategy: first attract and retain massive users through a "magical" experience, then explore monetization models later.

The author believes that the two giants taking completely opposite paths indicates that the design space for AI memory functions is extremely vast, with no single correct answer. The optimal solution depends on the product's target users and specific needs. Currently, this field is still in its early exploratory stages ("Cambrian explosion"), with major companies trying different approaches, far from establishing industry standards.

Latest Update: Shortly after the article was published, Anthropic (Claude's parent company) announced a new memory feature for its Team and Enterprise accounts that appears closer to ChatGPT's automatic profiling model. This indicates that the development and evolution of AI memory is progressing at an extremely rapid pace.

Defeating Nondeterminism in LLM Inference

Model Deterministic Nondeterministic User requests Other user requests Output

The nondeterminism of LLM inference is a systemic problem. It originates from the conflict between underlying computational libraries—which are designed for maximum performance and are sensitive to batch size—and the dynamic server loads of the real world. A solution exists, which is to enforce the use of batch-invariant computational kernels, but this typically comes at the cost of sacrificing some performance.

The non-reproducibility (nondeterminism) of LLM (Large Language Model) inference results is not, as commonly believed, a simple combination of the randomness of GPU parallel computing and floating-point calculation errors. The true culprits are: the lack of "Batch Invariance" in core computational operations (kernels), combined with the constantly changing load on the server (i.e., varying batch sizes).

  1. Common Misconception vs. The Facts

    • Common Misconception ("Concurrency + Floating Point" Hypothesis): It is widely believed that because floating-point addition is non-associative (i.e., (a+b)+c ≠ a+(b+c)), and GPUs execute these additions in a non-deterministic parallel order, the results become random.
    • The Facts Pointed Out by the Article: This hypothesis is incomplete. While floating-point non-associativity is the root cause of numerical differences, the vast majority of computational cores used in LLM inference (the forward pass), such as matrix multiplication, are themselves "run-to-run deterministic." That is, for a fixed batch of input, multiple runs will produce the exact same result.
  2. The True Source of Nondeterminism

    • Lack of "Batch Invariance": Although a single computational kernel is deterministic, its result is affected by the batch size. For example, when computing a vector, the numerical result will be slightly different when it is processed alone (batch size=1) versus with thousands of other vectors (batch size=1000). This is because, to optimize performance for different batch sizes, the underlying system uses different computational strategies and instructions, which in turn changes the accumulation order of floating-point numbers.
    • Variable Server Load: From a user's perspective, their requests are dynamically grouped with other users' requests into a batch by the inference server. The server's load changes in real-time, meaning a user's same request might be processed in a batch of size 8 this time, and a batch of size 128 the next time.
    • The Result of the Combination: A computational kernel that lacks "batch invariance" is applied in a system with "non-deterministic batch sizes," ultimately leading to the nondeterminism perceived by the user.
How to Achieve Deterministic Inference (i.e., Achieve "Batch Invariance")
Section titled “How to Achieve Deterministic Inference (i.e., Achieve "Batch Invariance")”

The article points out that to achieve fully reproducible inference, every computational step in the model must be made batch-invariant, primarily involving these three parts:

  • RMSNorm: Relatively easy to implement. It only requires sticking to one parallelization strategy and avoiding switching to strategies that would change the order of operations, even if it means slightly worse performance on small batches.
  • Matrix Multiplication: More challenging. High-performance matrix multiplication libraries select different Tensor Core instructions or parallel strategies (like Split-K) based on input dimensions. To achieve determinism, one must enforce the use of a single kernel configuration, which sacrifices peak performance at certain dimensions.
  • Attention Mechanism: The most complex. It must be invariant not only to batch size but also to how sequences are processed (e.g., chunked prefill, decoding with a KV Cache). This means that when a token computes its attention, the internal order of operations must be identical regardless of how much context (KV Cache) it has.

ChatGPT Memory and the Bitter Lesson

The author of this article reverse-engineered ChatGPT's memory system by directly questioning it, revealing its operational principles and internal structure.

The Four Key Components of ChatGPT's Memory System
Section titled “The Four Key Components of ChatGPT's Memory System”

ChatGPT's memory system primarily consists of four components, all provided to the model during each interaction:

  1. Interaction Metadata:

    • Includes user device information (screen size, browser/OS), usage patterns (topic preferences, message length, activity levels), etc.
    • The model can leverage this data to implicitly infer the user's context (e.g., automatically recognizing iPhone usage), thereby delivering more targeted responses.
  2. Recent Conversation Content:

    • Contains summaries of the user's messages from the last several dozen conversations (excluding AI responses).
    • This helps establish connections across different conversations, allowing the model to better understand context. For instance, after multiple consecutive conversations about travel to Japan, it can infer that "there" refers to Japan.
  3. Model Set Context:

    • Facts explicitly provided by the user, which can be viewed and deleted anytime in the settings—e.g., "I am allergic to shellfish."
    • This is the highest-priority, fully user-controlled "source of truth" that can override information from other memory modules.
  4. User Knowledge Memories:

    • This is the newest and most core component. It consists of highly condensed AI-generated summaries periodically created by OpenAI from the user's extensive conversation history.
    • These memories are invisible and not directly editable by the user. They contain extremely detailed information about the user's profession, interests, projects, technical stack, brand preferences, etc.
    • While incredibly information-dense, they may include outdated or inaccurate content (e.g., a trip the user planned but never took).

The article points out that ChatGPT's memory system does not use complex techniques like Retrieval-Augmented Generation (RAG) or vector databases to filter relevant memories.

Instead, it adopts a "brute force" yet effective approach: during each interaction, it packs all four types of memory information into the model's context window.

This reflects OpenAI's core bets:

  1. The model is sufficiently intelligent: Powerful models can inherently discern and utilize relevant information within massive contexts while ignoring the irrelevant.
  2. Compute and context windows will become increasingly cheaper: As technology advances, the cost of sending all this information will become negligible.

This reaffirms the lesson articulated by reinforcement learning pioneer Rich Sutton in his 2019 essay "The Bitter Lesson"—rather than building complex engineered solutions, it's more effective to dedicate resources to enhancing the model's inherent capabilities and computational power.

ChatGPT's memory functionality resembles the training process of an LLM: "User Knowledge Memories" act like a large but slow-to-update base model, while the other three components function as steering layers for real-time adjustment and correction (similar to RLHF and in-context learning).

  1. User Knowledge Memories: Act like a pre-trained model, condensing long-term information but prone to becoming outdated.
  2. Model Set Context: Equivalent to the user's RLHF, holding the highest priority.
  3. Recent Conversation Content: Analogous to immediate in-context learning.
  4. Interaction Metadata: Functions like system default parameters, providing environmental signals.

Future challenges lie not only in technology (e.g., updating "User Knowledge Memories" more frequently) but also at the product level: how to handle outdated information, how to validate facts, and the privacy and ethical concerns arising from AI building detailed profiles of users.