Geometric Methods in Machine Learning and Data Mining

Agarwal, Arvind

Geometric Methods in Machine Learning and Data Mining

dc.contributor.advisor	Daume III, Hal	en_US
dc.contributor.author	Agarwal, Arvind	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2013-02-07T07:02:52Z
dc.date.available	2013-02-07T07:02:52Z
dc.date.issued	2012	en_US
dc.description.abstract	In machine learning, the standard goal of is to find an appropriate statistical model from a <italic> model space</italic> based on the training data from a <italic> data space</italic>; while in data mining, the goal is to find interesting patterns in the data from a <italic> data space</italic>. In both fields, these spaces carry geometric structures that can be exploited using methods that make use of these geometric structures (we shall call them geometric methods), or the problems themselves can be formulated in a way that naturally appeal to these methods. In such cases, studying these geometric structures and then using appropriate geometric methods not only gives insight into existing algorithms, but also helps build new and better algorithms. In my research, I develop methods that exploit geometric structure of problems for a variety of machine learning and data mining problems, and provide strong theoretical and empirical evidence in favor of using them. My dissertation is divided into two parts. In the first part, I develop algorithms to solve a well known problem in data mining i.e. distance embedding problem. In particular, I use tools from <italic> computational geometry</italic> to build a unified framework for solving a distance embedding problem known as <italic> multidimensional scaling</italic> (MDS). This geometry-inspired framework results in algorithms that can solve different variants of MDS better than previous state-of-the-art methods. In addition, these algorithms come with many other attractive properties: they are simple, intuitive, easily parallelizable, scalable, and can handle missing data. Furthermore, I extend my unified MDS framework to build scalable algorithms for dimensionality reduction, and also to solve a sensor network localization problem for mobile sensors. Experimental results show the effectiveness of this framework across all problems. In the second part of my dissertation, I turn to problems in machine learning, in particular, use geometry to reason about conjugate priors, develop a model that hybridizes between discriminative and generative frameworks, and build a new set of generative-process-driven kernels. More specifically, this part of my dissertation is devoted to the study of the geometry of the space of probabilistic models associated with statistical generative processes. This study --- based on the theory well grounded in <italic> information geometry</italic> --- allows me to reason about the appropriateness of <italic> conjugate</italic> priors from a <italic> geometric</italic> perspective, and hence gain insight into the large number of existing models that rely on these priors. Furthermore, I use this study to build hybrid models more naturally i.e., by combining discriminative and generative methods using the <italic> geometry underlying them</italic>, and also to build a family of kernels called <italic> generative kernels</italic> that can be used as off-the-shelf tool in any kernel learning method such as support vector machines. My experiments of generative kernels demonstrate their effectiveness providing further evidence in favor of using geometric methods.	en_US
dc.identifier.uri	http://hdl.handle.net/1903/13628
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	Conjugate Prior	en_US
dc.subject.pquncontrolled	Generative Kernel	en_US
dc.subject.pquncontrolled	Information Geometry	en_US
dc.subject.pquncontrolled	Machine Learning	en_US
dc.subject.pquncontrolled	Multidimensional Scaling	en_US
dc.title	Geometric Methods in Machine Learning and Data Mining	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Agarwal_umd_0117E_13788.pdf
Size:: 1.35 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations