CIS 730

Introduction to Artificial Intelligence

Fall, 2003

 

Homework Assignment 4 (Machine Problem)

 

Sunday, 16 November 2003

Due: Mon 01 December 2003 (before midnight Tue, 02 December 2003)

Extended deadline (request on Fri if needed): Wed 03 December 2003

 

This short programming assignment is designed to apply your theoretical understanding of supervised inductive learning to some simple experimental data sets.

 

Refer to the course intro handout for guidelines on working with other students.

 

Note: Remember to submit your solutions in electronic form using the course Yahoo! Group, ksu-cis730-fall_2003 and produce them only from your personal source code, scripts, and documents from the machine learning applications used in this MP (not common work or sources other than the textbook or properly cited references).

 

Problems

 

First, log into your course accounts on the KDD Core (Ringil, Fingolfin, Nienna, Frodo, Samwise, Merry, Pippin) and make sure your home directory is in order.  Notify admin@www.kddresearch.org (and cc: cis730ta@www.kddresearch.org) if you have any problems at this stage.

 

On KDD group systems, MLC++ 2.01 is installed in /usr.  The documentation for this package can be found at http://www.sgi.com/tech/mlc.  You can just set your path environment variable in your .tcshrc or .cshrc and the MLCDIR in your .login, then run Inducer.

 

1. (35 points total) Comparing Inducers: ID3, Simple Bayes, C4.5

 

Your solution to this problem must be in MS Excel, PostScript, or PDF format, and you must use a spreadsheet (I recommend GNUmeric or Excel 2000/XP) to record your solution. 

 

a)       (15 points) Follow the instructions in the MLC++ Utilities 2.0 User Guide (http://www.sgi.com/tech/mlc/util/util.ps) to create a table comparing the ID3 results on the following data sets – Pima, CRX, and Mushroom – with Discrete Naïve Bayes.  Show the following: training error, test set error, generalization error, and confusion matrix (predicted vs. actual class labels).  You may use percentages for the first three of these, but show the variance (+/- x%) when it is given as well.

b)       (10 points) Plot an example learning curve for Vote, using ID3 and Naïve Bayes.

 

2. (15 points)  Building Bayesian networks.  Read the Hugin tutorial at www.hugin.com (as of 16 Nov 2003, the download URL is http://www.hugin.com/Products_Services/Products/Demo/).  Download and install Hugin Lite v6.3 if you wish.  Then download Bayesian Network tools in Java (BNJ) v2.03 from http://bndev.sourceforge.net and install it on your Windows, Solaris, or Linux system.    Walk through the Asia Bayesian belief network (BBN) example at http://developer.hugin.com/Samples/Asia/.  Turn in a saved copy of your BBN as a BNJ XML file.

 


Extra credit

 

(25 points) WEKA 3. Try the Waikato Environment for Knowledge Analysis (WEKA) v3.2.3 on one of the above 4 data sets from the UC Irvine Machine Learning Database Repository (UCI-MLDBR, http://www.ics.uci.edu/~mlearn/MLRepository.html) and report the same results for ID3 in WEKA in the same format as above.  This package can be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/.