Statistical comparison of additive regression tree methods on ecological grassland data

Plant, Emily (2019) Statistical comparison of additive regression tree methods on ecological grassland data. Honours thesis, University of Southern Queensland. (Unpublished)

Text (Whole Thesis)
Thesis Emily Plant - Printing version.pdf
Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0.

Download (2MB) | Preview


Boosted regression tree (BRT) and Bayesian additive regression tree (BART) models are both additive tree models that are theoretically well defined. However,BART is a relatively new technique to the field of ecology, while BRTs are widely used. By exploring the differences, range of obtainable results and relative limitations of both methods, this project aims to fill a gap in ecologists’ collective knowledge to facilitate the use of both methods by ecologists in the future as well as determine if BART has some benefits over the widely used BRT method.

100 BRT and 729 BART models were fit on each of two grasslands datasets. One data set contained data from a period of drought, and the other dataset represents the recovery phase from the drought.

These two grassland datasets had 13 hydroclimatic and land use predictor variables. The response variable for both datasets was Enhanced Vegetation Index (EVI) trend, which is interpreted as a measure of grassland degradation and recovery. The settable parameters of both methods (BRT and BART) were varied to compare the performance of each method.

The models for each method were evaluated using three prediction error statistics; root mean square error (RMSE), mean absolute error (MAE), and the cofficient of determination (R2). The best models across the two methods were assessed by inspecting the relative importance of predictor variables and two-way interactions, and the prediction error statistics. All analysis was conducted in R using the dismo package to fit boosted regression trees, and the bartMachine package to fit Bayesian regression trees.

BRT and BART models exhibited similar variable and interaction importance selection abilities, but the BART method generated models with similar or more favourable prediction error statistics than the BRT method (BART explained an additional 10.17% of variation than BRT on the drought dataset, and an additional 11.92% on the wetting dataset), indicating that BARTs may be more effective at modelling ecological data. BARTs also had other benefits including shorter run times, more reasonable defaults in its software implementation, and greater functionality of said software implementation, beyond model building and prediction functions.

There are some limitations to this study. Most notably, all models were only fit to two datasets from the one ecology scenario (grassland decline and recovery).
Additionally, these datasets contained no missing data (and further, missing data was not simulated), so the relative abilities of BRT and BART to fit models and predict from missing data were not investigated. Therefore, future work in this area should include studies comparing BRT and BART models on multiple datasets (some of which should contain missing data) from a diverse range of ecological scenarios.

Statistics for USQ ePrint 45472
Statistics for this ePrint Item
Item Type: Thesis (Non-Research) (Honours)
Item Status: Live Archive
Faculty/School / Institute/Centre: Historic - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 - 31 Dec 2021)
Supervisors: King, Rachel; Kath, Jarrod
Qualification: Bachelor of Science (Honours)
Date Deposited: 08 Dec 2021 00:25
Last Modified: 26 Jun 2023 05:31
Uncontrolled Keywords: additive regression tree methods; ecological grassland data
Fields of Research (2008): 01 Mathematical Sciences > 0104 Statistics > 010401 Applied Statistics
05 Environmental Sciences > 0501 Ecological Applications > 050199 Ecological Applications not elsewhere classified
Fields of Research (2020): 41 ENVIRONMENTAL SCIENCES > 4102 Ecological applications > 410299 Ecological applications not elsewhere classified
49 MATHEMATICAL SCIENCES > 4905 Statistics > 490501 Applied statistics
Socio-Economic Objectives (2008): D Environment > 96 Environment > 9605 Ecosystem Assessment and Management > 960510 Ecosystem Assessment and Management of Sparseland, Permanent Grassland and Arid Zone Environments
Socio-Economic Objectives (2020): 10 ANIMAL PRODUCTION AND ANIMAL PRIMARY PRODUCTS > 1005 Pasture, browse and fodder crops > 100503 Native and residual pastures

Actions (login required)

View Item Archive Repository Staff Only