CUR Matrix Approximation Through Convex Optimization

dc.contributor.advisorBalan, Radu Ven_US
dc.contributor.authorLinehan, Kathrynen_US
dc.contributor.departmentApplied Mathematics and Scientific Computationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2024-06-29T06:00:30Z
dc.date.available2024-06-29T06:00:30Z
dc.date.issued2024en_US
dc.description.abstractIn this dissertation we present work on the CUR matrix approximation. Specifically, we present 1) an approximation of the proximal operator of the L-infinity norm using a neural network, 2) a novel deterministic CUR formulation and algorithm, and 3) a novel application of CUR as a feature selection method to determine discriminant proteins when clustering protein expression data in a self-organizing map (SOM). The proximal operator of the L-infinity norm arises in our CUR algorithm. Since the computation of the proximal operator of the L-infinity norm requires a sort of the input data (or at least a partial sort similar to quicksort), we present a neural network to approximate the proximal operator. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach, and present an algorithm to calculate the proximal operator of the L-infinity norm exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation. Next, we present a novel deterministic CUR formulation that uses convex optimization to form the matrices C and R, and a corresponding algorithm that uses bisection to ensure that the user selected number of columns appear in C and the user selected number of rows appear in R. We implement the algorithm using the surrogate functional technique of Daubechies et al. [Communications on Pure and Applied Mathematics, 57.11 (2004)] and extend the theory of this approach to apply to our CUR formulation. Numerical results are presented that demonstrate the effectiveness of our CUR algorithm as compared to the singular value decomposition (SVD) and other CUR algorithms. Last, we use our CUR approximation as a feature selection method in the application by Higuera et al. [PLOS ONE, 10(6) (2015)] to determine discriminant proteins when clustering protein expression data in an SOM. This is a novel application of CUR and to the best of our knowledge, this is the first use of CUR on protein expression data. We compare the performance of our CUR algorithm to other CUR algorithms and the Wilcoxon rank-sum test (the original feature selection method in the work).en_US
dc.identifierhttps://doi.org/10.13016/tuk7-1ihu
dc.identifier.urihttp://hdl.handle.net/1903/32931
dc.language.isoenen_US
dc.subject.pqcontrolledApplied mathematicsen_US
dc.subject.pquncontrolledconvex optimizationen_US
dc.subject.pquncontrolledCUR matrix approximationen_US
dc.subject.pquncontrolledfeature selectionen_US
dc.subject.pquncontrolledproximal operatoren_US
dc.titleCUR Matrix Approximation Through Convex Optimizationen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Linehan_umd_0117E_24208.pdf
Size:
3.1 MB
Format:
Adobe Portable Document Format