ChatGPT Loss of Context Affecting Users Study
ChatGPT Needs Therapy
An HCI Case Study on Context Loss in Large Language Models by Omar Mnfy
This project investigates a core breakdown in AI-assisted work: when ChatGPT forgets the conversation’s context. Across 12 weeks, I ran pilot studies, think-aloud sessions, contextual inquiries, surveys, and prototype tests to understand why context loss happens, how users cope, and what design features might meaningfully improve long, multi-step interactions.
My full work includes research, low-fi prototyping, user testing, data analysis, and final design recommendations. This portfolio page presents the entire project as a coherent story.
1. Project Overview
Large Language Models excel in single-turn tasks, but students often rely on ChatGPT for multi-step academic workflows that require memory across turns. When ChatGPT drifts or forgets details, users repair context manually, lose trust, or abandon the task.
My research explores both the experience of context loss and design interventions that could improve reliability:
– Clarifying Questions
– Editable Conversation Memory
– Context Anchors
– A hybrid combined system
2. Research Goals
Across all studies, my goals were to:
Understand how students detect and react to context loss.
Identify emotional, behavioral, and cognitive impacts of drift.
Observe real-time strategies for repairing lost context.
Evaluate user interest in memory-aware features such as visible memory panels, fact-locking, and contextual anchors.
Test multiple prototype concepts to determine what design patterns best support trust, continuity, and usability.
3. Methods Used
This portfolio highlights all research methods I applied and how each contributed unique insights.
• Contextual Inquiry (Think-Aloud Protocol)
Used to observe how users behave while ChatGPT begins drifting. This method allowed me to see frustration points, how they rebuild context, and what cues they rely on.
• Pilot Study
To refine the tasks, prompts, and observational structure before running the final sessions.
• Directed Storytelling
Used to collect past experiences of confusion, repair attempts, and coping strategies.
• Survey Research
Designed to measure frequency of context loss, disruption level, coping mechanisms, and interest in proposed features like fact-locking and visible memory.
• Low-Fidelity Paper Prototyping
A physical “paper phone” with interchangeable screens allowed me to test concepts cheaply while capturing honest reactions.
• Usability Testing with SUS and Likert Scales
Used to assess clarity, usability, cognitive load, and trust of the three prototype concepts.
• Semi-Structured Interviews
Post-task interviews captured emotional reactions and design preferences.
4. Participants
Across all stages, I tested with:
• 7 participants in the contextual study (in-person + virtual)
• 5 participants in the prototype test
• Survey respondents (students who use ChatGPT for academic work)
Participants represented a range of majors (Computer Science, Psychology, Biology, Business, English), ensuring diverse workflows and expectations.
5. Study 1 – Contextual Inquiry
Observing Users When ChatGPT Loses Context
Participants completed three tasks designed to trigger drift:
Roast me using all previous details
Identify an album image, then answer follow-up questions
Write a structured three-scene play and check consistency details (ceramic duck, unseen assistant)
Key Observations
• Users attempted to patch context by rewriting bios, adding hint blocks, or resetting threads.
• Participants used creative repair techniques:
– Creating a “Context Log”
– Using checklists
– Forcing assumptions echoing
• Trust dropped whenever ChatGPT invented details, contradicted itself, or ignored constraints.
• Most participants said a visible memory panel or constraint-locking system would significantly reduce mental load.
6. Study 2 – Survey Research
Measuring Frequency, Disruption, and Desired Features
The survey measured:
• How often ChatGPT loses context
• How disruptive it is
• What repair strategies users employ
• Interest in potential design features
• Preferences for transparency, rule-following, and reasoning visibility
Key Takeaways
• Most students encounter context loss “sometimes” to “often.”
• The most common coping strategy is rephrasing or starting a new chat.
• Students strongly support:
– Visible memory panel
– Fact-locking
– Reasoning or assumption visibility
• Trust increases when ChatGPT follows constraints rigorously and avoids inventing facts.
7. Study 3 – Prototype Design and Testing
Three Concepts to Prevent Context Loss
I prototyped three low-fidelity interface concepts and evaluated them through think-aloud testing and SUS ratings.
Concept A: Clarifying Questions
ChatGPT asks targeted follow-up questions whenever the request is ambiguous.
Concept B: Editable Memory Panel
A small box displays what ChatGPT believes the conversation is about. Users can edit or correct it.
Concept C: Context Anchors
Users can set a persistent “anchor” summarizing the session’s purpose.
8. Prototype Findings
Usability Scores (1–7 scale)
Clarifying Questions: 5.8
Editable Memory: 5.8
Context Anchors: 6.0
What Users Said
• Clarifying questions help but can feel repetitive.
• Memory panel provides transparency but can be mentally taxing.
• Anchors feel natural, clean, and low-effort.
• No single solution fits all tasks; preferences vary by discipline.
9. Synthesis and Insights
Across all methods, four major themes emerged.
Insight 1: Users Want Control Over Context
Whether through clarifying questions or a memory panel, users want visibility and agency.
Insight 2: Cognitive Load Must Stay Low
Too many prompts or too much manual editing creates friction, even if it increases accuracy.
Insight 3: Contextual Drift Happens Across Tasks
Creative tasks, academic work, and coding all drift differently, suggesting multiple solutions rather than one.
Insight 4: Transparency Increases Trust
When the system shows assumptions or constraints, users forgive errors more easily.
10. The Hybrid Solution
The Most Promising Design Direction
Participants consistently gravitated toward a combined system:
• A small, unobtrusive context summary (always visible)
• Occasional clarifying questions triggered only by ambiguity
• An optional anchor for long conversations
• “Constraint-locking” or “fact-locking” for high-precision tasks
• A 1-click reset to prevent contamination across tasks
This hybrid approach balances transparency, speed, and cognitive effort.
11. Final Design Direction
If developed further, I would evolve this into:
• A modular context-management dashboard
• Visual layers showing what the model is using as its memory
• User-editable anchors and assumptions
• Auto-detection of drift with proactive correction prompts
• Modes based on the task type (creative vs precise)
• Error-checking features that cite earlier context before answering
This aligns with both user feedback and the needs uncovered in all studies.
12. Reflection on the Project
This project taught me how complexity reveals itself only when using the right methods. A simple complaint (“ChatGPT forgets things”) turned into a multi-layered human-computer interaction issue touching memory, trust, cognitive load, and interface design.
Choosing the Think-Aloud Protocol was crucial. Watching breakdowns happen live revealed behaviors that surveys alone would never show. The prototype phase showed that even lightweight sketches can clarify which ideas resonate.
If I extended this as an independent study, I would build a working mid-fi prototype and test it on longer academic workflows, such as essay writing or code debugging.
13. Acknowledgments
Thank you to all participants from Pitzer College and Pomona College who volunteered their time, shared their frustrations, and pushed this project in meaningful directions.
Gratitude to my classmates, collaborators, and instructors in CS120 Human-Centered Computing for critiques, discussions, and support throughout the project.
