Anomaly Detection in Time Series: Theoretical and Practical Improvements for Disease Outbreak Detection

dc.contributor.advisorShmueli, Galiten_US
dc.contributor.authorLotze, Thomas Harveyen_US
dc.contributor.departmentApplied Mathematics and Scientific Computationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2010-02-19T06:42:17Z
dc.date.available2010-02-19T06:42:17Z
dc.date.issued2009en_US
dc.description.abstractThe automatic collection and increasing availability of health data provides a new opportunity for techniques to monitor this information. By monitoring pre-diagnostic data sources, such as over-the-counter cough medicine sales or emergency room chief complaints of cough, there exists the potential to detect disease outbreaks earlier than traditional laboratory disease confirmation results. This research is particularly important for a modern, highly-connected society, where the onset of disease outbreak can be swift and deadly, whether caused by a naturally occurring global pandemic such as swine flu or a targeted act of bioterrorism. In this dissertation, we first describe the problem and current state of research in disease outbreak detection, then provide four main additions to the field. First, we formalize a framework for analyzing health series data and detecting anomalies: using forecasting methods to predict the next day's value, subtracting the forecast to create residuals, and finally using detection algorithms on the residuals. The formalized framework indicates the link between the forecast accuracy of the forecast method and the performance of the detector, and can be used to quantify and analyze the performance of a variety of heuristic methods. Second, we describe improvements for the forecasting of health data series. The application of weather as a predictor, cross-series covariates, and ensemble forecasting each provide improvements to forecasting health data. Third, we describe improvements for detection. This includes the use of multivariate statistics for anomaly detection and additional day-of-week preprocessing to aid detection. Most significantly, we also provide a new method, based on the CuScore, for optimizing detection when the impact of the disease outbreak is known. This method can provide an optimal detector for rapid detection, or for probability of detection within a certain timeframe. Finally, we describe a method for improved comparison of detection methods. We provide tools to evaluate how well a simulated data set captures the characteristics of the authentic series and time-lag heatmaps, a new way of visualizing daily detection rates or displaying the comparison between two methods in a more informative way.en_US
dc.identifier.urihttp://hdl.handle.net/1903/9857
dc.subject.pqcontrolledStatisticsen_US
dc.subject.pqcontrolledBiology, Bioinformaticsen_US
dc.subject.pqcontrolledBiology, Biostatisticsen_US
dc.subject.pquncontrolledanomaly detectionen_US
dc.subject.pquncontrolledbiosurveillanceen_US
dc.subject.pquncontrolledcontrol chartsen_US
dc.subject.pquncontrolledepidemiologyen_US
dc.subject.pquncontrolledforecastingen_US
dc.subject.pquncontrolledtime seriesen_US
dc.titleAnomaly Detection in Time Series: Theoretical and Practical Improvements for Disease Outbreak Detectionen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lotze_umd_0117E_10808.pdf
Size:
3.01 MB
Format:
Adobe Portable Document Format