Using LLMs to score Hierarchical Organization of Narrative Recall

Thumbnail Image

Files

Publication or External Link

Date

2024

Citation

Abstract

Research has suggested that the ability to comprehend narratives relies on hierarchical organization of memory. Disorders like dementia can be detected by measures of narrative recall, however these often rely on human graders. Here we explored how LLMs can be leveraged to measure hierarchical organization of narrative recall (HONR). We had participants (n=4) recall two, 45-50 sentence stories. We then submitted the recalls (n=8) to a two-part process where we first asked chatGPT to connect the sentences in the recall to the sentences in the actual stories. Next we took the output from ChatGPT and devised two measures of HONR: Event Bunching and Linearity as proxies for Local and Global Coherence, respectively. We compared a sample of human scored responses to those semi-automated responses and found a high interrater reliability, suggesting LLMs can reliably answer questions about narrative structure.

Notes

Rights

Attribution-NonCommercial-NoDerivs 3.0 United States
http://creativecommons.org/licenses/by-nc-nd/3.0/us/