Accumulating Context Changes the Beliefs of Language Models

Overview

As language model assistants become more autonomous with improved memory and context capabilities, they increasingly accumulate text in their context windows without explicit user intervention. This paper reveals a critical hidden risk: the belief profiles of models—their understanding of the world as manifested in their responses and actions—may silently change as context accumulates.

We systematically investigate how accumulating context through two primary mechanisms—talking (engaging in multi-turn conversations) and reading (processing extended texts)—can shift model beliefs. Our framework evaluates both stated beliefs (explicit responses to questions) and behaviors (actions taken through tool use in agentic systems).

Key findings:

Highly malleable beliefs: GPT-5 exhibits a 54.7% shift in stated beliefs after 10 rounds of discussion about moral dilemmas and safety queries
Exposure effects: Grok-4 shows a 27.2% shift on political issues after reading texts from opposing positions
Behavior alignment: Belief shifts are reflected in actual behaviors in agentic systems, though with partial misalignment
Cumulative impact: Longer conversations and deeper reading lead to more pronounced shifts, with effects varying by model and content type

These findings expose fundamental concerns about the reliability of LMs in long-term, real-world deployments, where user trust grows with continued interaction even as hidden belief drift accumulates. The malleability we document suggests that models' opinions and actions can become unreliable after extended use—a critical challenge for persistent AI systems.

Main Results

Do LM assistants change their beliefs with accumulating context?

Yes, systematically and substantially. Our experiments reveal that LM assistants exhibit significant changes in both stated beliefs and behaviors across multiple models and contexts. The table below shows aggregate shift percentages across different tasks and models:

Main results table showing belief shift percentages

Intentional vs. Non-Intentional Shifts

Intentional tasks (debate and persuasion) produce larger immediate shifts:

GPT-5 shows the highest susceptibility to persuasion, with 72.7% belief shift when exposed to information-based arguments
Claude-4-Sonnet exhibits more moderate shifts (24.9-27.2%) in intentional settings
Persuasion techniques matter: information and empathy-based approaches are most effective

Non-intentional tasks (reading and research) show smaller but meaningful shifts:

Grok-4 is most susceptible to passive exposure, showing 27.2% shift after in-depth reading
Research tasks produce smaller shifts (1.7-10.8%) due to more diverse information gathering
Open-source models (GPT-OSS-120B, DeepSeek-V3.1) show uniformly low sensitivity in non-intentional settings

Stated Belief vs. Behavior

We observe partial misalignment between belief shifts and behavioral changes:

Stated beliefs shift more readily than behaviors (e.g., GPT-5: 54.7% vs 40.6% in debate)
Behavioral changes grow with longer interactions, even when stated beliefs stabilize early
This divergence suggests models may state belief changes without fully enacting them, or vice versa

Effect of Context Length

In conversations: Stated beliefs change early (within 2-4 rounds), while behavioral changes accumulate over longer interactions (up to 10 rounds).

In reading:

Conservative topics show cumulative effects—longer reading (5k → 80k tokens) leads to progressively larger shifts
Progressive topics show early emergence—shifts appear at 5k tokens and plateau, remaining stable even with 80k tokens
This asymmetry suggests different mechanisms for belief formation depending on content alignment with initial positions

Model Differences

Different models show distinct vulnerability patterns:

GPT-5: Most affected by explicit persuasion; moderately affected by passive exposure
Claude-4-Sonnet: More vulnerable to prolonged exposure than persuasion; shows largest shifts in reading tasks
Grok-4: Highest overall susceptibility to passive reading (27.2%)
Open-source models: Generally more robust but still show moderate shifts under debate (24.4-44.4%)

Information vs. Exposure Effects

Our embedding analysis reveals that belief shifts are not driven primarily by access to specific topic-relevant information. When we mask the most semantically relevant sentences, shifts remain largely unchanged. This suggests that shifts emerge from broader contextual framing accumulated throughout reading, rather than from exposure to particular facts—consistent with findings that narrow behavioral conditioning can lead to wider alignment drift beyond the intended domain.

Author Contributions

Jiayi Geng is responsible for the overall planning and execution of the project including core idea formation, data collection, evaluation protocol design and implementation, experimentation on the intentional tasks, analysis, and core paper writing.

Howard Chen contributed to the core idea, data collection, experimentation on the non-intentional tasks, analysis and core paper writing.

Ryan Liu contributed to discussions and experimental exploration, and assisted in paper writing (Related Work).

Manoel Horta Ribeiro contributed to give feedback on the idea and review the manuscript.

Robb Willer contributed to the experiment design of the intentional tasks and the evaluation protocol design.

Graham Neubig contributed overall framing of the project, advised the design of the experiments and evaluation protocols, and contributed to core paper writing.

Thomas L. Griffiths contributed to the early conceptual development of the project, helping shape the core idea, and advised on the experimental and evaluation protocol design as well as the paper writing.

Acknowledgments

This paper was supported by grants from Fujitsu, the Microsoft AFMR, and the NOMIS Foundation. We also thank Izzy Benjamin Gainsburg, Amanda Bertsch, Lindia Tjuatja, Lintang Sutawika, Yueqi Song, and Emily Xiao for their valuable feedback and discussion.

BibTeX


          @article{geng2025accumulating,
            title={Accumulating Context Changes the Beliefs of Language Models}, 
            author={Geng, Jiayi and Chen, Howard and Liu, Ryan and Horta Ribeiro, Manoel and Willer, Robb and Neubig, Graham and Griffiths, Thomas L.},
            journal={arXiv preprint arXiv:2511.01805},
            year={2025}
          }