Mathematics Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/2793
Browse
4 results
Search Results
Item Causal Survival Analysis – Machine Learning Assisted Models: Structural Nested Accelerated Failure Time Model and Threshold Regression(2022) Chen, Yiming; Lee, Mei-Ling ML; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Time-varying confounding for intervention complicates causal survival analysis when the data are collected in a longitudinal manner. Traditional survival models that only adjust for time-dependent covariates provide a biased causal conclusion for the intervention effect. Some techniques have been developed to address this challenge. Nevertheless, these existing methods may still lack power, and suffer from computational burden given high dimensional data with a temporally connected nature. The first part of this dissertation focuses on one of the methods that deal with time-varying confounding, the Structural Nested Model and associated G-estimation. Two Neural Networks (GE-SCORE and GE-MIMIC) were proposed to estimate the Structural Nested Accelerated Failure Time Model. The proposed algorithms can provide less biased and individualized intervention causal effect estimation. The second part explored the causal interpretations and applications of the First-Hitting-Time based Threshold Regression Model using a Wiener process. Moreover, a Neural Network expansion of this specific type of Threshold Regression (TRNN) was explored for the first time.Item Innovations In Time Series Forecasting: New Validation Procedures to Improve Forecasting Accuracy and A Novel Machine Learning Strategy for Model Selection(2021) Varela Alvarenga, Gustavo; Kedem, Benjamin; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This dissertation is divided into two parts. The first part introduces the p-Holdout family of validation schemes for minimizing the generalization error rate and improving forecasting accuracy. More specifically, if one wants to compare different forecasting methods, or models, based on their performance, one may choose to use “out-of-sample tests” based on formal hypothesis tests, or “out-of-sample tests” based on data-driven procedures that directly compare the models using an error measure (e.g., MSE, MASE). To distinguish between the two “out-of-sample tests” terminologies seen in the literature, we will use the term “out-of-sample tests” for the former and “out-of-sample validation” for the latter. Both methods rely on some form of data split. We call these data partition methods “validation schemes.” We also provide a history of their use with time-series data, along with their formulas and the formulas for the associated out-of-sample generalization errors. We also attempt to organize the different terminologies used in the statistics, econometrics, and machine learning literature into one set of terms. Moreover, we noticed that the schemes used in a time series context overlook one crucial characteristic of this type of data: its seasonality. We also observed that deseasonalizing is not often done in the machine learning literature. With this in mind, we introduce the p-Holdout family of validation schemes. It has three new procedures that we have developed specifically to consider a series’ periodicity. Our results show that when applied to benchmark data and compared to state-of-the-art schemes, the new procedures are computationally inexpensive, improve the forecast accuracy, and greatly reduce, on average, the forecast error bias, especially when applied to non-stationary time series.In the second part of this dissertation, we introduce a new machine learning strategy to select forecasting models. We call it the GEARS (generalized and rolling sample) strategy. The “generalized” part of the name is because we use generalized linear models combined with partial likelihood inference to estimate the parameters. It has been shown that partial likelihood inference enables very flexible conditions that allow for correct time series analysis using GLMs. With this, it becomes easy for users to estimate multivariate (or univariate) time series models. All they have to do is provide the right-hand side variable, the variables that should enter the left-hand side of the model, and their lags. GLMs also allow for the inclusion of interactions and all sorts of non-linear links. This easy setup is an advantage over more complicated models like state-space and GARCH. And the fact that we can include covariates and interactions is an advantage over ARIMA, Theta-method, and other univariate methods. The “rolling sample” part relates to estimating the parameters over a sample of a fixed size that “moves forward” at different “rounds” of estimation (also known as “folds”). This part resembles the “rolling window” validation scheme, but ours does not start at T = 1. The “best” model is taken from the set with all possible combinations of covariates - and their respective lags - included in the right-hand side of the forecasting model. Its selection is based on the minimization of the average error measure over all folds. Once this is done, the best model’s estimated coefficients are used to get the out- of-sample forecasts. We applied the GEARS method to all the 100,000 time-series used in the 2018’s M-Competition, the M4 Forecasting Competition. We produced one-step-ahead forecasts for each series and compared our results with the submitted approaches and the bench- mark methods. The GEARS strategy yielded the best results - in terms of the smallest overall weighted average of the forecast errors - more often than any of the twenty-five top methods in that competition. We had the best results in 8,750 cases out of the 100,000, while the procedure that won the competition had better results in fewer than 7,300 series. Moreover, the GEARS strategy shows promise when dealing with multivariate time series. Here, we estimated several forecasting models based on a complex formulation that includes covariates with variable and fixed lags, quadratic terms, and interaction terms. The accuracy of the forecasts obtained with GEARS was far superior than the one observed for the predictions from an ARIMA. This result and the fact that our strategy for dealing with multivariate series is far simpler than VAR, State Space, or Cointegration approaches shines a light in the future of our procedure. An R package was written for the GEARS strategy. A prototype web application - using the R package “Shiny” - was also developed to disseminate this method.Item Harmonic Analysis and Machine Learning(2018) Pekala, Michael; Czaja, Wojciech; Levy, Doron; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This dissertation considers data representations that lie at the interesection of harmonic analysis and neural networks. The unifying theme of this work is the goal for robust and reliable machine learning. Our specific contributions include a new variant of scattering transforms based on a Haar-type directional wavelet, a new study of deep neural network instability in the context of remote sensing problems, and new empirical studies of biomedical applications of neural networks.Item Feature extraction in image processing and deep learning(2018) Li, Yiran; Czaja, Wojciech; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This thesis develops theoretical analysis of the approximation properties of neural networks, and algorithms to extract useful features of images in fields of deep learning, quantum energy regression and cancer image analysis. The separate applications are connected by using representation systems in harmonic analysis; we focus on deriving proper representations of data using Gabor transform in this thesis. A novel neural network with proven approximation properties dependent on its size is developed using Gabor system. In quantum energy regression, invariant representation of chemical molecules using electron densities is obtained based on the Gabor transform. Additionally, we dig into pooling functions, the feature extractor in deep neural networks, and develop a novel pooling strategy originated from the maximal function with stability property and stable performance. Anisotropic representation of data using the Shearlet transform is also explored in its ability to detect regions of interests of nuclei in cancer images.