Jackson, Lachlan (2024) Automated Stuttering Detection to Assist Speech Pathologists. [USQ Project]
|
Text (Project – redacted)
Jackson_L_Dissertation_Redacted.pdf Download (4MB) |
Abstract
Stuttering is a complex speech disorder affecting millions of children and adults worldwide. Currently, speech pathologists are the primary health care professionals who provide assessment and treatment to improve the severity of stuttering behaviours. Such behaviours are sporadic throughout the lifespan with no known cure or defined therapeutic agent to date. More specifically, existing techniques for assessing stuttering are manual, inconsistent, and costly for patients. This highlights the necessity for greater innovative solutions to improve the efficacy and efficiency of clinical practice. This dissertation aimed to address such challenges by developing a novel, automated approach to detecting stuttering events.
A rigorous literature review in automated stuttering detection revealed the top performing deep learning architectures. Three of which were developed for this research and evaluated on the SEP-28K dataset. More specifically a Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Convolutional Long Short-Term Memory (Conv-LSTM) models. These architectures were guided by performance metrics for optimisation; a strong focus was placed on class weights to address dataset imbalances. These models were then compared against the innovative pre-trained Audio Spectrogram Transformer (AST) model for the binary classification task; detecting interjection disfluencies.
Findings revealed the hybrid Conv-LSTM outperformed traditional CNN and LSTM models, achieving an 8.72% increase in performance. The AST model made further advancements over the Conv-LSTM model by an additional 4.22%. Subsequently, the AST was then utilised to detect all types of stuttering, achieving an average F1 score of 0.5370 across the five disfluency classes. It is important to note, these results are competitive with iv existing literature. However, they are not directly comparable due to the differences in validation protocols used.
The AST model demonstrated capable performance when utilising a shallow MLP network. The visualisation of latent codes from the MLP head further support these observations through the separation and grouping between fluent and disfluent samples. As a result, the pretrained transformer effectively generalised to stuttering specific features. While additional research is necessary to refine the model’s efficacy, such results promise a favourable avenue for further research within the stuttering detection field.
|
Statistics for this ePrint Item |
| Item Type: | USQ Project |
|---|---|
| Item Status: | Live Archive |
| Faculty/School / Institute/Centre: | Current – Faculty of Health, Engineering and Sciences - School of Engineering (1 Jan 2022 -) |
| Supervisors: | Leis, John |
| Qualification: | Bachelor of Engineering (Honours) (Electrical and Electronic) |
| Date Deposited: | 09 Mar 2026 03:05 |
| Last Modified: | 09 Mar 2026 03:05 |
| Uncontrolled Keywords: | stuttering; speech pathologists |
| URI: | https://sear.unisq.edu.au/id/eprint/53052 |
Actions (login required)
![]() |
Archive Repository Staff Only |
