A Latent Dirichlet Model for Unsupervised Entity Resolution

dc.contributor.authorBhattacharya, Indrajit
dc.contributor.authorGetoor, Lise
dc.date.accessioned2006-12-15T21:49:27Z
dc.date.available2006-12-15T21:49:27Z
dc.date.issued2005-08-19
dc.description.abstractIn this paper, we address the problem of entity resolution, where given many references to underlying objects, the task is to predict which references correspond to the same object. We propose a probabilistic model for collective entity resolution. Our approach differs from other recently proposed entity resolution approaches in that it is a) unsupervised, b) generative and c) introduces a hidden `group' variable to capture collections of entities which are commonly observed together. The entity resolution decisions are not considered on an independent pairwise basis, but instead decisions are made collectively. We focus on how the use of relational links among the references can be exploited. We show how we can use Gibbs Sampling to infer the collaboration groups and the entities jointly from the observed co-author relationships among entity references and how this improves entity resolution performance. We demonstrate the utility of our approach on two real-world bibliographic datasets. In addition, we present preliminary results on characterizing conditions under which collaborative information is useful.en
dc.format.extent209060 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/4022
dc.language.isoen_USen
dc.relation.ispartofseriesUM Computer Science Departmenten
dc.relation.ispartofseriesCS-TR-4740en
dc.titleA Latent Dirichlet Model for Unsupervised Entity Resolutionen
dc.typeTechnical Reporten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
cs-tr-4740.pdf
Size:
204.16 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.81 KB
Format:
Item-specific license agreed upon to submission
Description: