VISUAL ANALYTICS FOR OPEN-ENDED TASKS IN TEXT MINING

Park, Deokgun

VISUAL ANALYTICS FOR OPEN-ENDED TASKS IN TEXT MINING

dc.contributor.advisor	Elmqvist, Niklas	en_US
dc.contributor.author	Park, Deokgun	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2018-07-17T06:20:02Z
dc.date.available	2018-07-17T06:20:02Z
dc.date.issued	2018	en_US
dc.description.abstract	Overview of documents using topic modeling and multidimensional scaling is helpful in understanding topic distribution. While we can spot clusters visually, it is challenging to characterize them. My research investigates an interactive method to identify clusters by assigning attributes and examining the resulting distributions. ParallelSpaces examines the understanding of topic modeling applied to Yelp business reviews, where businesses and their reviews each constitute a separate visual space. Exploring these spaces enables the characterization of each space using the other. However, the scatterplot-based approach in ParallelSpaces does not generalize to categorical variables due to overplotting. My research proposes an improved layout algorithm for those cases in our follow-up work, Gatherplots, which eliminate overplotting in scatterplots while maintaining individual objects. Another limitation in clustering methods is the fixed number of clusters as a hyperparameter. TopicLens is a Magic Lens-type interaction technique, where the documents under the lens are clustered according to topics in real time. While ParallelSpaces help characterize the clusters, the attributes are sometimes limited. To extend the analysis by creating a custom mixture of attributes, CommentIQ is a comment moderation tool where moderators can adjust model parameters according to the context or goals. To help users analyze documents semantically, we develop a technique for user-driven text mining by building a dictionary for topics or concepts in a follow-up study, ConceptVector, which uses word embedding to generate dictionaries interactively and uses those dictionaries to analyze the documents. My dissertation contributes interactive methods to overview documents to integrate the user in text mining loops that currently are non-interactive. The case studies we present in this dissertation provide concrete and operational techniques for directly improving several state-of-the-art text mining algorithms. We summarize those generalizable lessons and discuss the limitations of the visual analytics approach.	en_US
dc.identifier	https://doi.org/10.13016/M26T0H03F
dc.identifier.uri	http://hdl.handle.net/1903/21003
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	information visualization	en_US
dc.subject.pquncontrolled	open-ended tasks	en_US
dc.subject.pquncontrolled	text mining	en_US
dc.subject.pquncontrolled	visual analytics	en_US
dc.title	VISUAL ANALYTICS FOR OPEN-ENDED TASKS IN TEXT MINING	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Park_umd_0117E_19010.pdf
Size:: 43.07 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations