Modeling Language Development: How Machine Learning can Enhance Analysis of the Language Environment

Harvey, James

Modeling Language Development: How Machine Learning can Enhance Analysis of the Language Environment

Files

Honors Thesis Final.pdf (1.06 MB)

No. of downloads: 195

Date

2024-12-18

Authors

Harvey, James

Advisor

Huang, Yi Ting
Newman, Rochelle
Domanski, Sophie

DRUM DOI

https://doi.org/10.13016/dqfi-fjzy

Abstract

Language sampling elicits a representative picture of a child’s language and provides methods for assessing functional communication beyond what is offered by standardized tests. Naturalistic sampling reduces time costs, and offers an ideal way to assess differences in home language associated with differences in socioeconomic status (SES). Unfortunately, naturalistic dense recordings present challenges in terms of how to scale analysis and extract meaningful information. This study investigates the application and analysis of the Language ENvironment Analysis system (LENA) for sampling home language using technology-assisted transcription and topic modeling. To evaluate the efficacy of transcription, segments were selected in reference to their amount of meaningful speech as measured by LENA, and transcribed by Whisper, OpenAI’s automatic speech recognition software. Research assistants trimmed text files to retain available adult language separated by utterance. Results suggest that this method of sampling, technology-assisted transcription, and automated analysis of traditional language metrics reproduces expected associations between parental input, SES, and standardized child vocabulary size. Topic models did not identify activity contexts, likely due to the nature of the input. This research presents a validated pipeline to produce dense representative data that utilizes modern approaches to reduce traditional time costs.

URI (handle)

http://hdl.handle.net/1903/33554

Rights

Attribution-NoDerivs 3.0 United States
http://creativecommons.org/licenses/by-nd/3.0/us/

Collections

Hearing & Speech Sciences Undergraduate Honors Theses

Full item page