University of Maryland DRUM  
University of Maryland Digital Repository at the University of Maryland

DRUM >
Theses and Dissertations from UM >
UM Theses and Dissertations >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1903/11999

Title: A distributional and syntactic approach to fine-grained opinion mining
Authors: Sayeed, Asad Basheer
Advisors: Weinberg, Amy S
Department/Program: Computer Science
Type: Dissertation
Sponsors: Digital Repository at the University of Maryland
University of Maryland (College Park, Md.)
Keywords: 0984 Computer science
0290 Linguistics
0800 Artificial intelligence
computational linguistics, crowdsourcing, machine learning, natural language processing, opinion mining, sentiment analysis
Issue Date: 2011
Abstract: This thesis contributes to a larger social science research program of analyzing the diffusion of IT innovations. We show how to automatically discriminate portions of text dealing with opinions about innovations by finding {source, target, opinion} triples in text. In this context, we can discern a list of innovations as targets from the domain itself. We can then use this list as an anchor for finding the other two members of the triple at a ``fine-grained'' level---paragraph contexts or less. We first demonstrate a vector space model for finding opinionated contexts in which the innovation targets are mentioned. We can find paragraph-level contexts by searching for an ``expresses-an-opinion-about'' relation between sources and targets using a supervised model with an SVM that uses features derived from a general-purpose subjectivity lexicon and a corpus indexing tool. We show that our algorithm correctly filters the domain relevant subset of subjectivity terms so that they are more highly valued. We then turn to identifying the opinion. Typically, opinions in opinion mining are taken to be positive or negative. We discuss a crowd sourcing technique developed to create the seed data describing human perception of opinion bearing language needed for our supervised learning algorithm. Our user interface successfully limited the meta-subjectivity inherent in the task (``What is an opinion?'') while reliably retrieving relevant opinionated words using labour not expert in the domain. Finally, we developed a new data structure and modeling technique for connecting targets with the correct within-sentence opinionated language. Syntactic relatedness tries (SRTs) contain all paths from a dependency graph of a sentence that connect a target expression to a candidate opinionated word. We use factor graphs to model how far a path through the SRT must be followed in order to connect the right targets to the right words. It turns out that we can correctly label significant portions of these tries with very rudimentary features such as part-of-speech tags and dependency labels with minimal processing. This technique uses the data from the crowdsourcing technique we developed as training data. We conclude by placing our work in the context of a larger sentiment classification pipeline and by describing a model for learning from the data structures produced by our work. This work contributes to computational linguistics by proposing and verifying new data gathering techniques and applying recent developments in machine learning to inference over grammatical structures for highly subjective purposes. It applies a suffix tree-based data structure to model opinion in a specific domain by imposing a restriction on the order in which the data is stored in the structure.
URI: http://hdl.handle.net/1903/11999
Appears in Collections:Computer Science Theses and Dissertations
UM Theses and Dissertations

Files in This Item:

File Description SizeFormatNo. of Downloads
Sayeed_umd_0117E_12565.pdf842.09 kBAdobe PDF188View/Open

All items in DRUM are protected by copyright, with all rights reserved.

 

DRUM is brought to you by the University of Maryland Libraries
University of Maryland, College Park, MD 20742-7011 (301)314-1328.
Please send us your comments. -
All Contents