Investigating Conversational Access to Historical Content

Loading...
Thumbnail Image

Files

Publication or External Link

Date

Advisor

Oard, Douglas W.

Citation

Abstract

Since the inception of writing systems, people have left behind written traces of their lives. Modern media systems further extend these traces to include spoken and digital records, raising the possibility that future systems could reconstruct aspects of a person’s life. This dissertation supports that vision by investigating conversational interaction with representations of historical figures that are grounded in historical content, commonly referred to as virtual immortality. This dissertation leverages recent advances in retrieval methods for conversational search, large language models (LLMs), and prompting techniques to propose a conceptual framework for a conversational agent, comprising three stages: retrieval, contextualization, and style transfer. Rather than building a full end-to-end system, the dissertation begins with an investigation into each component individually.

First, foundational techniques for conversational search are developed through participation in TREC CAsT 2021, using web-scale document collections. Experiments demonstrate that automatic query rewriting improves retrieval performance and user-perceived utility. Second, the work examines a domain-specific setting using a curated Ronald Reagan collection---diaries, interview transcripts, and public papers. A retrieval-based prototype for single-turn interactions supports semi-structured interviews with experts from libraries, archives, and museums (LAM). The study reveals implications for immersive experiences, archival assurance, and engagement of inquiring visitors. To help bridge this gap, the third part of the dissertation focuses on text rewriting. Specifically, the task of contextualization aims to generate clause-level elaborations for uncontextualized mentions. Reagan’s letters are used as input due to their brevity, and human annotations are collected to investigate system performance. Results show that prompting LLMs with detailed task instructions and few-shot examples can achieve near-human performance, confirmed by manual inspection. Finally, the dissertation investigates text style transfer---rewriting historical content to match a conversational style reflective of Reagan’s public representation. This is framed as a data-driven task, in which LLMs are prompted using comparable examples drawn from texts with topical overlap. Experiments across four sets of paired corpora show that single-turn, rather than chain-of-thought prompting, combined with short comparable examples, yields the best results. Automatic measures for entailment align well with human assessment of content preservation, while style classifiers struggle to capture the subtleties of stylistic variation when used to assess style strength. In sum, this dissertation offers an empirical investigation into components that could support conversational interaction with historical figures. It concludes with a discussion of the opportunities and limitations of such systems and aims to inspire future interdisciplinary work with born-digital historical collections.

Notes

Rights