Cost-sensitive Information Acquisition in Structured Domains

dc.contributor.advisorGetoor, Lise Cen_US
dc.contributor.authorBilgic, Mustafaen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2010-10-07T06:02:35Z
dc.date.available2010-10-07T06:02:35Z
dc.date.issued2010en_US
dc.description.abstractMany real-world prediction tasks require collecting information about the domain entities to achieve better predictive performance. Collecting the additional information is often a costly process that involves acquiring the features describing the entities and annotating the entities with target labels. For example, document collections need to be manually annotated for classification and lab tests need to be ordered for medical diagnosis. Annotating the whole document collection and ordering all possible lab tests might be infeasible due to limited resources. In this thesis, I explore effective and efficient ways of choosing the right features and labels to acquire under limited resources. For the problem of feature acquisition, we are given entities with missing features and the task is to classify them with minimum cost. The likelihood of misclassification can be reduced by acquiring features but acquiring features incurs costs as well. The objective is to acquire the right set of features that balance acquisition and misclassification cost. I introduce a technique that can reduce the space of possible sets of features to consider for acquisition by exploiting the conditional independence properties in the underlying probability distribution. For the problem of label acquisition, I consider two real-world scenarios. In the first one, we are given a previously trained model and a budget determining how many labels we can acquire, and the objective is to determine the right set of labels to acquire so that the accuracy on the remaining ones is maximized. I describe a system that can automatically learn and predict on which entities the underlying classifier is likely to make mistakes and it suggests acquiring the labels of the entities that lie in a high density potentially-misclassified region. In the second scenario, we are given a network of entities that are unlabeled and our objective is to learn a classification model that will have the least future expected error by acquiring minimum number of labels. I describe an active learning technique that can exploit the relationships in the network both to select informative entities to label and to learn a collective classifier that utilizes the label correlations in the network.en_US
dc.identifier.urihttp://hdl.handle.net/1903/10907
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pqcontrolledArtificial Intelligenceen_US
dc.subject.pquncontrolledactive inferenceen_US
dc.subject.pquncontrolledactive learningen_US
dc.subject.pquncontrolledclassificationen_US
dc.subject.pquncontrolledfeature acquisitionen_US
dc.subject.pquncontrolledprobabilistic graphical modelsen_US
dc.subject.pquncontrolledstatistical relational learningen_US
dc.titleCost-sensitive Information Acquisition in Structured Domainsen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bilgic_umd_0117E_11566.pdf
Size:
4.68 MB
Format:
Adobe Portable Document Format