Quality-Aware Data Source Management

dc.contributor.advisorDeshpande, Amolen_US
dc.contributor.advisorGetoor, Liseen_US
dc.contributor.authorRekatsinas, Theodorosen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2015-09-18T05:36:49Z
dc.date.available2015-09-18T05:36:49Z
dc.date.issued2015en_US
dc.description.abstractData is becoming a commodity of tremendous value in many domains. The ease of collecting and publishing data has led to an upsurge in the number of available data sources --- sources that are highly heterogeneous in the domains they cover, the quality of data they provide, and the fees they charge for accessing their data. However, most existing data integration approaches, for combining information from a collection of sources, focus on facilitating integration itself but are agnostic to the actual utility or the quality of the integration result. These approaches do not optimize for the trade-off between the utility and the cost of integration to determine which sources are worth integrating. In this dissertation, I introduce a framework for quality-aware data source management. I define a collection of formal quality metrics for different types of data sources, including sources that provide both structured and unstructured data. I develop techniques to efficiently detect the content focus of a large number of diverse sources, to reason about their content changes over time and to formally compute the utility obtained when integrating subsets of them. I also design efficient algorithms with constant factor approximation guarantees for finding a set of sources that maximizes the utility of the integration result given a cost budget. Finally, I develop a prototype quality-aware data source management system and demonstrate the effectiveness of the developed techniques on real-world applications.en_US
dc.identifierhttps://doi.org/10.13016/M2H06K
dc.identifier.urihttp://hdl.handle.net/1903/16928
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledData Integrationen_US
dc.subject.pquncontrolledData Source Managementen_US
dc.subject.pquncontrolledData Sourcesen_US
dc.subject.pquncontrolledQualityen_US
dc.titleQuality-Aware Data Source Managementen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Rekatsinas_umd_0117E_16354.pdf
Size:
4.88 MB
Format:
Adobe Portable Document Format