Woolf, Rodney J. (2005) Data mining using Matlab. [USQ Project]
|
PDF
dissert.pdf Download (824kB) |
Abstract
Data mining is a relatively new field emerging in many disciplines. It is becoming more
popular as technology advances, and the need for efficient data analysis is required.
The aim of data mining itself is not to provide strict rules by analysing the full data
set, data mining is used to predict with some certainty while only analysing a small
portion of the data. This project seeks to compare the efficiency of a decision tree
induction method with that of the neural network method.
MATLAB has inbuilt data mining toolboxes. However the decision tree induction
method is not as yet implemented. Decision tree induction has been implemented in
several forms in the past. The greatest contribution to this method has been made by
DR John Ross Quinlan, who has brought forward this method in the form of ID3, C4.5
and C5 algorithms. The methodologies used within ID3 and C4.5 are well documented
and therefore provide a strong platform for the implementation of this method within
a higher level language.
The objectives of this study are to fully comprehend two methods of data mining,
namely decision tree induction and neural networks. The decision tree induction
method is to be implemented within the mathematical computer language MATLAB.
The results found when analysing some suitable data will be compared with the results
from the neural network toolbox already implemented in MATLAB.
The data used to compare and contrast the two methods included voting records from
the US House of Representatives, which consists of yes, no and undecided votes on sixteen
separate issues. The voters are grouped into categories according to their political
party. This can be either republican or democratic. The objective of using this data
set is to predict what party a congressman is affiliated with by analysing their voting
trends.
The findings of this study reveal that the decision tree method can accurately predict
outcomes if an ideal data set is used for building the tree. The neural network method
has less accuracy in some situations however it is more robust towards unexpected data.
Statistics for this ePrint Item |
Actions (login required)
Archive Repository Staff Only |