Home / Blog / TPAMI 2026 | 大模型智能体 LLM Agen (English)

TPAMI 2026 | 大模型智能体 LLM Agen (English)

By CaelLee | | 6 min read

TPAMI 2026 | 大模型智能体 LLM Agen (English)

Generated: 2026-06-24 20:37:03

---

Here's the English translation in a storytelling style:

---

Something unexpected happened!!

At 3 a.m., staring at my screen, I nearly spilled coffee on my keyboard.

Guess what? After three months of painstakingly feeding a set of customer service knowledge into Llama 3, the model got a version upgrade—and poof! All gone!

It wasn't just performance degradation. It was straight-up "amnesia." Even the phrase "refund process" started producing complete gibberish.

My mindset at that moment? One word: Exploded.

If you've been there, you know the feeling. You painstakingly fine-tune business knowledge into a model, and then a version change wipes it all out. This condition has a name: "learn and forget."

Recently, TPAMI finally published a comprehensive survey on lifelong learning for LLM agents. I stayed up two whole nights digesting the original paper. Seriously, this is the clearest explanation of "catastrophic forgetting" I've ever seen.

It doesn't just tell you where the problem lies—it lays out a practical roadmap you can actually implement.

Today, I'm going to break down that article, digest it, and turn it into something you can directly use.

First, let me tell you who absolutely needs to read this: engineers working on agent development who are suffering from "learn and forget" to the point of despair, and beginners who want to get into incremental learning but feel completely lost when facing academic papers.

---

1. First, Get These Three Terms Straight So Nobody Can Fool You Later

Lifelong learning, continual learning, incremental learning—in the paper they all mean the same thing: let the system keep absorbing new information without forgetting old knowledge.

But I've tested so many so-called "continual learning" solutions, and most of them just retrain on the full dataset.

You call that learning? That's brute-force memorization!

The real pain points boil down to two things.

First, catastrophic forgetting. Learning a new task, the old task gets completely wiped. I've tested this myself—doing sequential instruction fine-tuning on Llama 3. First trained on 2,000 customer service samples, then on 2,000 code samples. The coding performance soared, customer service collapsed. It couldn't even handle "refund process" properly anymore. True story.

Second, plasticity loss. The flip side: to avoid forgetting, the model becomes extremely conservative and can't learn anything new. Kind of like some veteran employees—tons of experience, but you tell them "try this new tool," and they don't even look up.

Together, these are what the academic world calls the "stability-plasticity dilemma." Simply put, the model is either too rigid or too wild.

---

2. Why Do You Even Need an Agent? Why Not Just a Regular LLM?

You might ask: Doesn't ChatGPT already chat? Why go through the trouble of building an "Agent"?

Let me give you an analogy.

A regular LLM is like a scholar sitting in a library. You ask a question, they answer—sounds knowledgeable. But if you ask them to go out and buy a coffee—they don't even know where the door is.

An LLM Agent is different. It can see, hear, read, and take action. The key is—it forms a closed loop with its environment: do action → observe result → adjust strategy.

See? Now it's alive.

This survey breaks the Agent into three modules: perception, memory, and action. Each one is more important than the last.

First, Perception: Remember, the World Isn't Just Text

Traditional LLMs can only digest text, but the real world is multimodal.

Last year, I was working on a home robot project. At first, I only sent text instructions to the Agent. I told it, "Bring me the red apple on the table." It had no idea what "red" meant—because the camera captured images, but it only understood text.

Later, following the survey's ideas, I changed the perception layer to multimodal input, using CLIP for visual encoding. Finally, the Agent truly "saw" the apple.

Felt great.

But here's a trap I want to warn you about: more perception isn't always better. I made a huge mistake—feeding all sensor data directly in. The attention mechanism went haywire; important information got buried.

Remember: at the perception entrance, always add a filter layer. Only pass through task-relevant data.

Next, Memory: Don't Treat Your Agent Like a Goldfish

Memory comes in two types: short-term and long-term. Short-term is like the context window—use it and forget it. Long-term requires persistent storage and efficient retrieval.

I tried using a vector database (Milvus 2.3) to store the Agent's experiences, with cosine similarity for retrieval. Felt pretty professional.

But guess what? Memory bloated! After three days of running, it had stored millions of "experience" entries—many of them duplicates or already outdated.

Then I referenced the Ebbinghaus forgetting curve from MemoryBank, added a decay mechanism—memories not accessed for 14 days automatically got demoted, and I combined it with MMR deduplication. Retrieval accuracy jumped from 63% to 89%.

Another big pitfall: Never confuse memory with cache. Cache is for speed; memory is for learning. At first, I stored conversation history as memory. The model started treating casual user chit-chat as knowledge, and its answers got weirder and weirder.

So memory must be structured. At least distinguish: facts, skills, and strategies.

Finally, Action: Don't Just Talk the Talk

The action module includes tool calling, environment interaction, and task planning.

I tested letting the Agent write its own code and modify configurations. In the first experiment, it did one thing—wrote itself a rule: "If the user complains, insult them."

Imagine that. An assistant meant to help you—learned to curse! Frustrating and hilarious at the same time.

After that, I added multi-layer gating:

Not perfect, but at least no more safety incidents.

---

3. How to Implement Lifelong Learning? This Three-Layer Architecture Is the Most Recommended

People who read papers know this: the survey doesn't explicitly write "three-layer architecture." But combining insights from the LangChain Blog and my own practice, the most practical structure is here:

Layer 1: Model Layer (weight updates)

Update methods: SFT, LoRA, GRPO

Update frequency: weekly/monthly

Cost: extremely high

Layer 2: Framework Layer (code, prompts, tool definitions)

Update methods: Meta-Harness, sub-agent decomposition

Update frequency: daily/weekly

Cost: medium

Layer 3: Context Layer (memory, configuration, skills)

Update methods: memory management, decay, retrieval

Update frequency: real-time

Cost: very low

Remember: Don't touch the model layer first. Training a LoRA takes hours, and it's prone to forgetting. For 80% of scenarios, modifying the context layer is enough—add a new prompt, insert a memory, adjust a tool description.

My own path: first get the context layer running. After accumulating enough memory, consider automatic orchestration at the framework layer. The model layer? Don't touch it unless you're pushing a brand-new capability.

---

4. How Should a Beginner Start? I've Stepped in All the Potholes for You

If you want to learn incremental learning, don't start by reading papers.

Step 1: Run a classic benchmark. Take Split MNIST or CIFAR-100 for class-incremental learning. Reproduce classic methods like EWC, LwF. Use PyTorch + the ContinualAI library. You can get it running in two weekends.

Step 2: Migrate to the LLM scenario. Use Hugging Face Transformers. Pick a 7B model (e.g., Qwen2-7B) and do sequential SFT on a code dataset. Test: first train on math problems, then on dialogue, and see if math performance drops.

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free