
๐๐ก๐๐ญ ๐ข๐ฌ ๐๐๐ ๐๐จ๐ง๐ญ๐๐ฑ๐ญ ๐๐จ๐ญ ?
"LLM context rot" is a phenomenon where the performance of a large language model ๐๐๐ ๐ซ๐๐๐๐ฌย as the length of its ๐ข๐ง๐ฉ๐ฎ๐ญ ๐๐จ๐ง๐ญ๐๐ฑ๐ญ ๐ข๐ง๐๐ซ๐๐๐ฌ๐๐ฌ.
A recent research at ๐๐ก๐ซ๐จ๐ฆ๐ evaluated 18 large language models, including state-of-the-art models like GPT-4.1, Claude 4, Gemini 2.5, and Qwen3.
The researchers at Chroma used a combination of controlled experiments to isolate the effects of context length:-
1๏ธโฃ ๐๐ฑ๐ญ๐๐ง๐๐๐ ๐๐๐๐๐ฅ๐ ๐ข๐ง ๐ ๐๐๐ฒ๐ฌ๐ญ๐๐๐ค (๐๐๐๐): To go beyond simple lexical matching, they created variations of the NIAH task. This included testing for semantic matches (where the "๐ง๐๐๐๐ฅ๐" was semantically similar but not an exact match to the question) and altering the "๐ก๐๐ฒ๐ฌ๐ญ๐๐๐ค" content with different distractors.
2๏ธโฃ ๐๐จ๐ง๐ ๐๐๐ฆ๐๐ฏ๐๐ฅ: This evaluation involved using long conversational chat histories to test the models' ability to retrieve information.
3๏ธโฃ ๐๐๐ฉ๐๐๐ญ๐๐ ๐๐จ๐ซ๐๐ฌ ๐๐๐ฌ๐ค: A simple synthetic task was used to see how models performed on basic text replication as the context length increased.
The research revealed that the ๐๐ฌ๐ฌ๐ฎ๐ฆ๐ฉ๐ญ๐ข๐จ๐ง ๐จ๐ ๐ฎ๐ง๐ข๐๐จ๐ซ๐ฆ ๐๐จ๐ง๐ญ๐๐ฑ๐ญ ๐ฉ๐ซ๐จ๐๐๐ฌ๐ฌ๐ข๐ง๐ is incorrect and that model performance degrades in surprising and non-uniform ways as the ๐ข๐ง๐ฉ๐ฎ๐ญ ๐ฅ๐๐ง๐ ๐ญ๐ก ๐ข๐ง๐๐ซ๐๐๐ฌ๐๐ฌ.
Some other key findings were:-
โข ๐๐๐ ๐ซ๐๐๐๐ญ๐ข๐จ๐ง ๐ฐ๐ข๐ญ๐ก ๐๐๐ง๐ ๐ญ๐ก: Performance consistently declined across all experiments as the input length grew.
โข ๐๐๐ฆ๐๐ง๐ญ๐ข๐ ๐ฏ๐ฌ. ๐๐๐ฑ๐ข๐๐๐ฅ ๐๐๐ญ๐๐ก๐ข๐ง๐ : Models struggled more with tasks that required semantic understanding and matching compared to those that relied on direct lexical retrieval.
โข ๐๐ฆ๐ฉ๐๐๐ญ ๐จ๐ ๐๐ข๐ฌ๐ญ๐ซ๐๐๐ญ๐จ๐ซ๐ฌ: Distractor content had a significant and non-uniform impact on performance, with the effect becoming more pronounced at longer context lengths.
โข ๐๐๐ฒ๐ฌ๐ญ๐๐๐ค ๐๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐: In a surprising finding, models performed better when the haystack's sentences were randomly shuffled than when they were presented in a logically coherent structure. This suggests that the model's attention mechanisms can be misled by the surface coherence of the input.
โข ๐๐๐๐๐ฅ๐-๐๐ฎ๐๐ฌ๐ญ๐ข๐จ๐ง ๐๐ข๐ฆ๐ข๐ฅ๐๐ซ๐ข๐ญ๐ฒ: The rate of performance degradation was accelerated when the similarity between the "needle" (the target information) and the question was lower.
Therefore, this turns out to be yet another instance indicating the importance of ๐๐จ๐ง๐ญ๐๐ฑ๐ญ ๐๐ง๐ ๐ข๐ง๐๐๐ซ๐ข๐ง๐ .
If you want to know more about context engineering, refer to - https://lnkd.in/egmhgHsa
Chroma Research -https://lnkd.in/eBwv_v_h
About the Author
Unknown Author
AI Expert & Content Creator
Related Posts
Getting Started with AI
Learn the basics of artificial intelligence
Machine Learning Fundamentals
Understanding ML algorithms and applications