Tackling Uncertainties and Errors in the Satellite Monitoring of Forest Cover Change

Thumbnail Image


Publication or External Link






This study aims at improving the reliability of automatic forest change detection. Forest change detection is of vital importance for understanding global land cover as well as the carbon cycle. Remote sensing and machine learning have been widely adopted for such studies with increasing degrees of success. However, contemporary global studies still suffer from lower-than-satisfactory accuracies and robustness problems whose causes were largely unknown.

Global geographical observations are complex, as a result of the hidden interweaving geographical processes. Is it possible that some geographical complexities were not expected in contemporary machine learning? Could they cause uncertainties and errors when contemporary machine learning theories are applied for remote sensing?

This dissertation adopts the philosophy of error elimination. We start by explaining the mathematical origins of possible geographic uncertainties and errors in chapter two. Uncertainties are unavoidable but might be mitigated. Errors are hidden but might be found and corrected. Then in chapter three, experiments are specifically designed to assess whether or not the contemporary machine learning theories can handle these geographic uncertainties and errors. In chapter four, we identify an unreported systemic error source: the proportion distribution of classes in the training set. A subsequent Bayesian Optimal solution is designed to combine Support Vector Machine and Maximum Likelihood. Finally, in chapter five, we demonstrate how this type of error is widespread not just in classification algorithms, but also embedded in the conceptual definition of geographic classes before the classification. In chapter six, the sources of errors and uncertainties and their solutions are summarized, with theoretical implications for future studies.

The most important finding is that, how we design a classification largely pre-determines what we eventually get out of it. This applies for many contemporary popular classifiers including various types of neural nets, decision tree, and support vector machine. This is a cause of the so-called overfitting problem in contemporary machine learning. Therefore, we propose that the emphasis of classification work be shifted to the planning stage before the actual classification. Geography should not just be the analysis of collected observations, but also about the planning of observation collection. This is where geography, machine learning, and survey statistics meet.