University of Maryland DRUM  
University of Maryland Digital Repository at the University of Maryland

Digital Repository at the University of Maryland (DRUM) >
Theses and Dissertations from UMD >
UMD Theses and Dissertations >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1903/12383

Title: Topic Modeling for Wikipedia Link Disambiguation
Authors: Skaggs, Bradley Alan
Advisors: Getoor, Lise C
Department/Program: Computer Science
Type: Thesis
Sponsors: Digital Repository at the University of Maryland
University of Maryland (College Park, Md.)
Subjects: Computer science
Keywords: disambiguation
link prediction
topic modeling
wikipedia
Issue Date: 2011
Abstract: Many articles in the online encyclopedia Wikipedia have hyperlinks to ambiguous article titles. To improve the reader experience, any link to an ambiguous title should be replaced with a link to one of the unambiguous meanings. We propose a novel statistical topic model, which we refer to as the Link Text Topic Model (LTTM), that can suggest new link targets for existing ambiguous links in Wikipedia articles. For evaluation, we develop a method for extracting ground truth from snapshots of Wikipedia at different points in time. We evaluate LTTM on this ground truth, and demonstrate its superiority over existing link- and content-based approaches. Finally, we build a web service that uses LTTM to suggest unambiguous articles for human editors wanting to fix ambiguous links.
URI: http://hdl.handle.net/1903/12383
Appears in Collections:UMD Theses and Dissertations
Computer Science Theses and Dissertations

Files in This Item:

File Description SizeFormatNo. of Downloads
Skaggs_umd_0117N_12844.pdf803.07 kBAdobe PDF1101View/Open

All items in DRUM are protected by copyright, with all rights reserved.

 

DRUM is brought to you by the University of Maryland Libraries
University of Maryland, College Park, MD 20742-7011 (301)314-1328.
Please send us your comments