Using a high-dimensional model of semantic space to predict neural activity
Jackson, Alice Freeman
Bolger, Donald J
MetadataShow full item record
This dissertation research developed the GOLD model (Graph Of Language Distribution), a graph-structured semantic space model constructed based on co-occurrence in a large corpus of natural language, with the intent that it may be used to explore what information may be present about relationships between words in such a model and the degree to which this information may be used to predict brain responses and behavior in language tasks. The present study employed GOLD to examine genera relatedness as well as two specific types of relationship between words: semantic similarity, which refers to the degree of overlap in meaning between words, and associative relatedness, which refers to the degree to which two words occur in the same schematic context. It was hypothesized that this graph-structured model of language constructed based on co-occurrence should easily capture associative relatedness, because this type of relationship is thought to be present directly in lexical co-occurrence. Additionally, it was hypothesized that semantic similarity may be extracted from the intersection of the set of first-order connections, because two words that are semantically similar may occupy similar thematic or syntactic roles across contexts and thus would co-occur lexically with the same set of nodes. Based on these hypotheses, a set of relationship metrics were extracted from the GOLD model, and machine learning techniques were used to explore predictive properties of these metrics. GOLD successfully predicted behavioral data as well as neural activity in response to words with varying relationships, and its predictions outperformed those of certain competing models. These results suggest that a single-mechanism account of learning word meaning from context may suffice to account for a variety of relationships between words. Further benefits of graph models of language are discussed, including their transparent record of language experience, easy interpretability, and increased psychologically plausibility over models that perform complex transformations of meaning representation.