Plant, Emily (2019) Statistical comparison of additive regression tree methods on ecological grassland data. Honours thesis, University of Southern Queensland. (Unpublished)
|
Text (Whole Thesis)
Thesis Emily Plant - Printing version.pdf Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0. Download (2MB) | Preview |
Abstract
Boosted regression tree (BRT) and Bayesian additive regression tree (BART) models are both additive tree models that are theoretically well defined. However,BART is a relatively new technique to the field of ecology, while BRTs are widely used. By exploring the differences, range of obtainable results and relative limitations of both methods, this project aims to fill a gap in ecologists’ collective knowledge to facilitate the use of both methods by ecologists in the future as well as determine if BART has some benefits over the widely used BRT method.
100 BRT and 729 BART models were fit on each of two grasslands datasets. One data set contained data from a period of drought, and the other dataset represents the recovery phase from the drought.
These two grassland datasets had 13 hydroclimatic and land use predictor variables. The response variable for both datasets was Enhanced Vegetation Index (EVI) trend, which is interpreted as a measure of grassland degradation and recovery. The settable parameters of both methods (BRT and BART) were varied to compare the performance of each method.
The models for each method were evaluated using three prediction error statistics; root mean square error (RMSE), mean absolute error (MAE), and the cofficient of determination (R2). The best models across the two methods were assessed by inspecting the relative importance of predictor variables and two-way interactions, and the prediction error statistics. All analysis was conducted in R using the dismo package to fit boosted regression trees, and the bartMachine package to fit Bayesian regression trees.
BRT and BART models exhibited similar variable and interaction importance selection abilities, but the BART method generated models with similar or more favourable prediction error statistics than the BRT method (BART explained an additional 10.17% of variation than BRT on the drought dataset, and an additional 11.92% on the wetting dataset), indicating that BARTs may be more effective at modelling ecological data. BARTs also had other benefits including shorter run times, more reasonable defaults in its software implementation, and greater functionality of said software implementation, beyond model building and prediction functions.
There are some limitations to this study. Most notably, all models were only fit to two datasets from the one ecology scenario (grassland decline and recovery).
Additionally, these datasets contained no missing data (and further, missing data was not simulated), so the relative abilities of BRT and BART to fit models and predict from missing data were not investigated. Therefore, future work in this area should include studies comparing BRT and BART models on multiple datasets (some of which should contain missing data) from a diverse range of ecological scenarios.
Statistics for this ePrint Item |
Actions (login required)
Archive Repository Staff Only |