Geometric Methods in Machine Learning and Data Mining
Daume III, Hal
MetadataShow full item record
In machine learning, the standard goal of is to find an appropriate statistical model from a <italic> model space</italic> based on the training data from a <italic> data space</italic>; while in data mining, the goal is to find interesting patterns in the data from a <italic> data space</italic>. In both fields, these spaces carry geometric structures that can be exploited using methods that make use of these geometric structures (we shall call them geometric methods), or the problems themselves can be formulated in a way that naturally appeal to these methods. In such cases, studying these geometric structures and then using appropriate geometric methods not only gives insight into existing algorithms, but also helps build new and better algorithms. In my research, I develop methods that exploit geometric structure of problems for a variety of machine learning and data mining problems, and provide strong theoretical and empirical evidence in favor of using them. My dissertation is divided into two parts. In the first part, I develop algorithms to solve a well known problem in data mining i.e. distance embedding problem. In particular, I use tools from <italic> computational geometry</italic> to build a unified framework for solving a distance embedding problem known as <italic> multidimensional scaling</italic> (MDS). This geometry-inspired framework results in algorithms that can solve different variants of MDS better than previous state-of-the-art methods. In addition, these algorithms come with many other attractive properties: they are simple, intuitive, easily parallelizable, scalable, and can handle missing data. Furthermore, I extend my unified MDS framework to build scalable algorithms for dimensionality reduction, and also to solve a sensor network localization problem for mobile sensors. Experimental results show the effectiveness of this framework across all problems. In the second part of my dissertation, I turn to problems in machine learning, in particular, use geometry to reason about conjugate priors, develop a model that hybridizes between discriminative and generative frameworks, and build a new set of generative-process-driven kernels. More specifically, this part of my dissertation is devoted to the study of the geometry of the space of probabilistic models associated with statistical generative processes. This study --- based on the theory well grounded in <italic> information geometry</italic> --- allows me to reason about the appropriateness of <italic> conjugate</italic> priors from a <italic> geometric</italic> perspective, and hence gain insight into the large number of existing models that rely on these priors. Furthermore, I use this study to build hybrid models more naturally i.e., by combining discriminative and generative methods using the <italic> geometry underlying them</italic>, and also to build a family of kernels called <italic> generative kernels</italic> that can be used as off-the-shelf tool in any kernel learning method such as support vector machines. My experiments of generative kernels demonstrate their effectiveness providing further evidence in favor of using geometric methods.