A distributional and syntactic approach to fine-grained opinion mining

Sayeed, Asad Basheer

A distributional and syntactic approach to fine-grained opinion mining

dc.contributor.advisor	Weinberg, Amy S	en_US
dc.contributor.author	Sayeed, Asad Basheer	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2011-10-08T06:12:30Z
dc.date.available	2011-10-08T06:12:30Z
dc.date.issued	2011	en_US
dc.description.abstract	This thesis contributes to a larger social science research program of analyzing the diffusion of IT innovations. We show how to automatically discriminate portions of text dealing with opinions about innovations by finding {source, target, opinion} triples in text. In this context, we can discern a list of innovations as targets from the domain itself. We can then use this list as an anchor for finding the other two members of the triple at a ``fine-grained'' level---paragraph contexts or less. We first demonstrate a vector space model for finding opinionated contexts in which the innovation targets are mentioned. We can find paragraph-level contexts by searching for an ``expresses-an-opinion-about'' relation between sources and targets using a supervised model with an SVM that uses features derived from a general-purpose subjectivity lexicon and a corpus indexing tool. We show that our algorithm correctly filters the domain relevant subset of subjectivity terms so that they are more highly valued. We then turn to identifying the opinion. Typically, opinions in opinion mining are taken to be positive or negative. We discuss a crowd sourcing technique developed to create the seed data describing human perception of opinion bearing language needed for our supervised learning algorithm. Our user interface successfully limited the meta-subjectivity inherent in the task (``What is an opinion?'') while reliably retrieving relevant opinionated words using labour not expert in the domain. Finally, we developed a new data structure and modeling technique for connecting targets with the correct within-sentence opinionated language. Syntactic relatedness tries (SRTs) contain all paths from a dependency graph of a sentence that connect a target expression to a candidate opinionated word. We use factor graphs to model how far a path through the SRT must be followed in order to connect the right targets to the right words. It turns out that we can correctly label significant portions of these tries with very rudimentary features such as part-of-speech tags and dependency labels with minimal processing. This technique uses the data from the crowdsourcing technique we developed as training data. We conclude by placing our work in the context of a larger sentiment classification pipeline and by describing a model for learning from the data structures produced by our work. This work contributes to computational linguistics by proposing and verifying new data gathering techniques and applying recent developments in machine learning to inference over grammatical structures for highly subjective purposes. It applies a suffix tree-based data structure to model opinion in a specific domain by imposing a restriction on the order in which the data is stored in the structure.	en_US
dc.identifier.uri	http://hdl.handle.net/1903/11999
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Linguistics	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pquncontrolled	computational linguistics	en_US
dc.subject.pquncontrolled	crowdsourcing	en_US
dc.subject.pquncontrolled	machine learning	en_US
dc.subject.pquncontrolled	natural language processing	en_US
dc.subject.pquncontrolled	opinion mining	en_US
dc.subject.pquncontrolled	sentiment analysis	en_US
dc.title	A distributional and syntactic approach to fine-grained opinion mining	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sayeed_umd_0117E_12565.pdf
Size:: 842.09 KB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations