A COMPARATIVE ANALYSIS OF RANDOM FOREST AND LOGISTIC REGRESSION FOR WEED RISK ASSESSMENT
MetadataShow full item record
Invasive species have largely negative impacts on the environment and the economy. The management and regulation of invasive plants are facilitated using screening tools, such as weed risk assessments (WRAs) to predict the invasive potential of non-native plants. The identification of these species and their subsequent regulation on importation helps to reduce the risk of future ecosystem and economic costs. Globally, there are many different types of highly useful WRAs already available. However, in this day of big data and powerful predictive analytics, there is an increasing demand for the development of new and more robust screening tools. In this thesis, I use the machine learning algorithm, Random forests, to develop a new WRA. I show that random forest model has greater predictive accuracies than an existing logistic regression model and that random forest is a better learner. In addition, variable importance analysis was performed to identify factors associated with invasive status classification of non-native plants. The study suggests that random forests make powerful weed risk screening tools and should be utilized for assessing invasive risk potential along with other WRAs. An integrative approach for evaluating weed risk can greatly serve to facilitate the WRA process.