CIS 730
Due:
Extended deadline
(request on Fri if needed):
This short programming
assignment is designed to apply your theoretical understanding of supervised
inductive learning to some simple experimental data sets.
Refer to the course intro
handout for guidelines on working with other students.
Note: Remember to submit your solutions in electronic
form using the course Yahoo! Group, ksu-cis730-fall_2003 and produce them only
from your personal source code, scripts, and documents from the machine
learning applications used in this MP (not
common work or sources other than the textbook or properly cited
references).
Problems
First, log into your course accounts
on the KDD Core (Ringil, Fingolfin, Nienna, Frodo, Samwise, Merry, Pippin) and
make sure your home directory is in order. Notify admin@www.kddresearch.org (and cc: cis730ta@www.kddresearch.org) if
you have any problems at this stage.
On KDD group
systems, MLC++ 2.01 is installed in /usr. The documentation for
this package can be found at http://www.sgi.com/tech/mlc.
You can just set your path environment variable in your .tcshrc or .cshrc and
the MLCDIR in your .login, then run Inducer.
1. (35 points total) Comparing Inducers: ID3,
Simple Bayes, C4.5
Your solution to this problem must
be in MS Excel, PostScript, or PDF format, and you must use a spreadsheet (I
recommend GNUmeric or Excel 2000/XP) to record your solution.
a)
(15 points) Follow the
instructions in the MLC++ Utilities 2.0 User Guide (http://www.sgi.com/tech/mlc/util/util.ps)
to create a table comparing the ID3 results on the following data sets –
Pima, CRX, and Mushroom – with Discrete Naïve Bayes.
Show the following: training error, test set error, generalization error, and confusion
matrix (predicted vs. actual class labels). You may use percentages
for the first three of these, but show the variance (+/- x%) when it is given
as well.
b)
(10 points) Plot an example
learning curve for Vote, using ID3 and Naïve Bayes.
2. (15 points) Building Bayesian networks.
Read the Hugin tutorial at www.hugin.com
(as of
Extra credit
(25 points) WEKA 3. Try
the Waikato Environment for Knowledge Analysis (WEKA) v3.2.3 on one
of the above 4 data sets from the UC Irvine Machine Learning Database
Repository (UCI-MLDBR, http://www.ics.uci.edu/~mlearn/MLRepository.html)
and report the same results for ID3 in WEKA in the same format as
above. This package can be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/.