AI paper index
From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents
One-line summary
An AI research paper on From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents.
Engineering notes
Engineering notes will be added by the aipentium editorial team.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为大语言模型、生成式AI、ChatGPT相关技术、计算机视觉、深度学习等高价值论文补充中文说明。
Original abstract
Memory is a core component of AI agents, enabling them to accumulate knowledge across interactions and improve performance. However, persistent memory introduces the risk of memory poisoning, where a single adversarial memory write can exert long-term influence over agent behavior. We present a systematic study of memory poisoning in LLM-based agents. We identify four memory write channels and nine structural vulnerabilities in model capabilities, system prompt design, and agent system architecture that make these channels exploitable. Based on these vulnerabilities, we develop a taxonomy of six classes of memory poisoning attacks. Furthermore, we design MPBench -- a benchmark for evaluating memory poisoning attacks, and show that agents designed to write and retrieve memory more aggressively are more exploitable. We also show that existing prompt injection defenses fail to cover memory poisoning attacks. Our findings provide a foundation for understanding and mitigating memory poisoning attacks against AI agents.
Links and sources
Need this topic turned into a technical roadmap?
aipentium can prepare a custom AI literature review, code map, dataset map, and B2B technology assessment.
Request B2B AI research
Comments