shared
Class CatTestResult

java.lang.Object
  |
  +--shared.CatTestResult

public class CatTestResult
extends java.lang.Object

The CatTestResult class provides summaries of running categorizers on test data. This includes the option of loading the test data from a file (or giving an existing InstanceList), running the categorizer on all instances, and storing the results. Information can then be extracted quickly.

The training set and test set (if given as opposed to loading it here) must not be altered as long as calls to this class are being made, because references are kept to those structures.

The complexity for construction of the CatTestResult is O(n1 n2), where n1 is the size of the training-set InstanceList and n2 is the size of the test set. All display routines take time proportional to the number of displayed numbers.

The CatTestResult class has been enhanced to compute the log-evidence metric. The log evidence metric is equal to the total evidence against the correct category.


Field Summary
static int Generalized
          The Generalized partition for error reporting.
protected  LogOptions logOptions
          Logging options for this class.
static int Memorized
          The Memorized partition for error reporting.
static int Normal
          The Normal partition for error reporting.
 
Constructor Summary
CatTestResult(Categorizer cat, InstanceList trainILSource, InstanceList testILSource)
          Constructor.
 
Method Summary
static void check_for_unknown_classes(int step, int start, double[][] confusionMatrix)
          Determines whether unknown classes are used.
 java.lang.String display_ascii_confusion_matrix(java.lang.String stream)
          Displays confusion matrices in ascii format for this CatTestResult object.
static java.lang.String display_ascii_confusion_matrix(java.lang.String display, double[][] confusionMatrix, Schema schema)
          Displays confusion matrices in ascii format.
 java.lang.String display_confusion_matrix(java.lang.String stream)
          Displays the confusion matrix.
 void display(java.io.BufferedWriter stream)
          Gives all available statistics (not displays)
 double error()
          Returns ratio number of test instances incorrectly categorized / number of test instances.
 double error(int errType)
          Returns ratio number of test instances incorrectly categorized / number of test instances.
static boolean get_compute_log_loss()
          Returns TRUE if the log loss option is set, or FALSE otherwise.
 double[][] get_confusion_matrix()
          Returns the confusion matrix of the results of testing.
 int get_log_level()
          Returns the logging level for this object.
 LogOptions get_log_options()
          Returns the LogOptions object for this object.
 java.io.Writer get_log_stream()
          Returns the stream to which logs for this object are written.
 ScoringMetrics get_metrics()
          Returns the scoring metrics collected from the test results.
 CatOneTestResult[] get_results()
          Returns the individual results from testing.
 InstanceList get_testing_instance_list()
          Returns the InstanceList used for testing.
 InstanceList get_training_instance_list()
          Returns the InstanceList used for training.
protected  void initialize(Categorizer cat)
          Initializes this CatTestResult by categorizing the test data set with the given Categorizer.
protected  void initializeTrainTable()
          Uses TableCategorizer as an interface to hash table to do quick lookup on whether a test instance occurs in the training set.
 double normalized_loss()
          Calculates a normalized loss value.
 int num_correct()
          Return the number of instances in the test InstanceList that were correctly categorized.
 int num_incorrect()
          Return the number of instances in the test InstanceList that were incorrectly categorized.
 int num_off_train()
          Returns the number of test instances not appearing in appearing in the training data.
 int num_on_train()
          Returns the number of test instances appearing in appearing in the training data.
 int num_test_instances()
          Returns the number of instances in the testing set.
 int num_train_instances()
          Returns the number of instances in the training set.
static double pessimistic_error_correction(double numErrors, double totalWeight, double zValue)
          Prune the tree for the given pruning factor.
static void set_compute_log_loss(boolean b)
          Sets the computation of log loss option.
 void set_log_level(int level)
          Sets the logging level for this object.
 void set_log_options(LogOptions opt)
          Sets the LogOptions object for this object.
 void set_log_prefixes(java.lang.String file, int line, int lvl1, int lvl2)
          Sets the logging message prefix for this object.
 void set_log_stream(java.io.Writer strm)
          Sets the stream to which logging options are displayed.
 java.lang.String toString()
          Converts information in this CatTestResult object to a string for display.
 double total_correct_weight()
          Returns the total weight of instances which were correctly classified.
 double total_incorrect_weight()
          Returns the total weight of instances which were incorrectly classified.
 double total_log_loss()
          Returns the total log loss recorded for this Inducer run.
 double total_loss()
          Returns the total loss value from the scoring metrics.
 double total_test_weight()
          Returns the total weight in the test list.
 double total_train_weight()
          Returns the total weight in the training list.
 double total_weight_off_train()
          Weight of test instances appearing not appearing in the training data.
 double total_weight_on_train()
          Weight of test instances appearing in appearing in the training data.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

Normal

public static final int Normal
The Normal partition for error reporting.

Generalized

public static final int Generalized
The Generalized partition for error reporting.

Memorized

public static final int Memorized
The Memorized partition for error reporting.

logOptions

protected LogOptions logOptions
Logging options for this class.
Constructor Detail

CatTestResult

public CatTestResult(Categorizer cat,
                     InstanceList trainILSource,
                     InstanceList testILSource)
Constructor.
Parameters:
cat - The Categorizer used to create this CatTestResult.
trainILSource - The training data set.
testILSource - The test data set.
Method Detail

set_log_level

public void set_log_level(int level)
Sets the logging level for this object.
Parameters:
level - The new logging level.

get_log_level

public int get_log_level()
Returns the logging level for this object.
Returns:
The log level for this object.

set_log_stream

public void set_log_stream(java.io.Writer strm)
Sets the stream to which logging options are displayed.
Parameters:
strm - The stream to which logs will be written.

get_log_stream

public java.io.Writer get_log_stream()
Returns the stream to which logs for this object are written.
Returns:
The stream to which logs for this object are written.

get_log_options

public LogOptions get_log_options()
Returns the LogOptions object for this object.
Returns:
The LogOptions object for this object.

set_log_options

public void set_log_options(LogOptions opt)
Sets the LogOptions object for this object.
Parameters:
opt - The new LogOptions object.

set_log_prefixes

public void set_log_prefixes(java.lang.String file,
                             int line,
                             int lvl1,
                             int lvl2)
Sets the logging message prefix for this object.
Parameters:
file - The file name to be displayed in the prefix of log messages.
line - The line number to be displayed in the prefix of log messages.
lvl1 - The log level of the statement being logged.
lvl2 - The level of log messages being displayed.

pessimistic_error_correction

public static double pessimistic_error_correction(double numErrors,
                                                  double totalWeight,
                                                  double zValue)
Prune the tree for the given pruning factor. Pruning is based on C4.5's pruning / Quinlan. We return the pessimistic number of errors on the training set. We use the standard normal distribution approximation from CatTestResult. Here's a derivation that shows that this happens to be the same as C4.5, at least for errors >= 1.
err = (2ne+z^2+z*sqrt(4ne+z^2-4ne^2))/(2*(n+z^2))
where n is the number of records, e is the prob of error, and z is the z-value. Let E = count of errors, i.e., ne.
err = (2E + z^2 + z*sqrt(4E+z^2-4E^2/n))/(2*(n+z^2))
err = (E + z^2/2 + z*sqrt(E-E^2/n+z^2/4))/(n+z^2)
err = (E + z^2/2 + z*sqrt(E(1-E/n)+z^2/4))/(n+z^2)
Parameters:
numErrors - The number of errors produced in a test run of this categorizer.
totalWeight - The total weight of all Instances tested.
zValue - The half of the interval width for confidence evaluation.
Returns:
The pessimistic number of errors on the training set.

error

public double error()
Returns ratio number of test instances incorrectly categorized / number of test instances. Test instance set defaults to all test instances.
Returns:
The ratio number of incorrectly classified instances without partitioning.

error

public double error(int errType)
Returns ratio number of test instances incorrectly categorized / number of test instances. Test instance set defaults to all test instance. ErrorType argument can be used to partition test cases into those occuring in the training set or not.
Parameters:
errType - The type of error used to partition test cases. Possible values are CatTestResult.Normal, CatTestResult.Generalized, CatTestResult.Memorized.
Returns:
The ratio number of incorrectly classified instances.

total_weight_on_train

public double total_weight_on_train()
Weight of test instances appearing in appearing in the training data. Initializes flag for each test instance if not already done.
Returns:
The total weight of the instances found in the training and test data sets.

total_weight_off_train

public double total_weight_off_train()
Weight of test instances appearing not appearing in the training data. Initializes flag for each test instance if not already done.
Returns:
The total weight of the instances not found in the training and test data sets.

initializeTrainTable

protected void initializeTrainTable()
Uses TableCategorizer as an interface to hash table to do quick lookup on whether a test instance occurs in the training set. Only called when inTrainIL data is needed. Initializes class variable numOffTrain to number of test cases found in training set.

total_test_weight

public double total_test_weight()
Returns the total weight in the test list.
Returns:
The total weight of the test data set.

total_incorrect_weight

public double total_incorrect_weight()
Returns the total weight of instances which were incorrectly classified.
Returns:
The total weight of incorrectly classified instances.

initialize

protected void initialize(Categorizer cat)
Initializes this CatTestResult by categorizing the test data set with the given Categorizer.
Parameters:
cat - The categorizer with which the test data set will be categorized.

num_correct

public int num_correct()
Return the number of instances in the test InstanceList that were correctly categorized.
Returns:
An integer representing the number of instances that were correctly categorized by an inducer during a test run.

num_incorrect

public int num_incorrect()
Return the number of instances in the test InstanceList that were incorrectly categorized.
Returns:
An integer representing the number of instances that were incorrectly categorized by an inducer during a test run.

num_on_train

public int num_on_train()
Returns the number of test instances appearing in appearing in the training data. Initializes flag for each test instance if not already done.
Returns:
The number of test instances also in the training data.

num_off_train

public int num_off_train()
Returns the number of test instances not appearing in appearing in the training data. Initializes flag for each test instance if not already done.
Returns:
The number of test instances not in the training data.

num_train_instances

public int num_train_instances()
Returns the number of instances in the training set. This function ignores weights of instances.
Returns:
The number of training instances.

num_test_instances

public int num_test_instances()
Returns the number of instances in the testing set. This function ignores weights of instances.
Returns:
The number of test instances.

total_correct_weight

public double total_correct_weight()
Returns the total weight of instances which were correctly classified.
Returns:
The total weight of correct instances.

total_train_weight

public double total_train_weight()
Returns the total weight in the training list.
Returns:
The total weight in the training list.

total_log_loss

public double total_log_loss()
Returns the total log loss recorded for this Inducer run.
Returns:
The total log loss.

check_for_unknown_classes

public static void check_for_unknown_classes(int step,
                                             int start,
                                             double[][] confusionMatrix)
Determines whether unknown classes are used. or example, some test instance was classified as 'unknown' or there are some test instances that is of 'unknown' class. Result is to set step and start values for use in display_ascii_confusion_matrix and display_scatterviz_confusion_matrix.
Parameters:
step - The step value of where to begin in the confusion matrix.
start - The start value of where to begin in the confusion matrix.
confusionMatrix - The confusion matrix which is being checked for unknown values.

display_ascii_confusion_matrix

public java.lang.String display_ascii_confusion_matrix(java.lang.String stream)
Displays confusion matrices in ascii format for this CatTestResult object.
Parameters:
stream - Stream to which display is shown.
Returns:
The String containing the display of the confusion matrix.

display_ascii_confusion_matrix

public static java.lang.String display_ascii_confusion_matrix(java.lang.String display,
                                                              double[][] confusionMatrix,
                                                              Schema schema)
Displays confusion matrices in ascii format.
Parameters:
display - The String containing any previous items to be included in the display.
confusionMatrix - The confusion matrix to be displayed.
schema - The Schema of the categories that Instances can be classified as.
Returns:
The String containing the display of the confusion matrix.

display_confusion_matrix

public java.lang.String display_confusion_matrix(java.lang.String stream)
Displays the confusion matrix. The confusion matrix displays for row i column j, the number of instances classified as j that should have been classified as i.
Parameters:
stream - Stream to which display is shown.
Returns:
String containing the display.

display

public void display(java.io.BufferedWriter stream)
Gives all available statistics (not displays)
Parameters:
stream - The writer to which the statistics will be displayed.

toString

public java.lang.String toString()
Converts information in this CatTestResult object to a string for display.
Overrides:
toString in class java.lang.Object
Returns:
The String containing the display.

get_training_instance_list

public InstanceList get_training_instance_list()
Returns the InstanceList used for training.
Returns:
The InstanceList used for training.

get_testing_instance_list

public InstanceList get_testing_instance_list()
Returns the InstanceList used for testing.
Returns:
The InstanceList used for testing.

get_results

public CatOneTestResult[] get_results()
Returns the individual results from testing.
Returns:
The individual results from testing.

get_metrics

public ScoringMetrics get_metrics()
Returns the scoring metrics collected from the test results.
Returns:
The scoring metrics collected from the test results.

get_confusion_matrix

public double[][] get_confusion_matrix()
Returns the confusion matrix of the results of testing.
Returns:
The confusion matrix of the results of testing.

total_loss

public double total_loss()
Returns the total loss value from the scoring metrics.
Returns:
The total loss value from the scoring metrics.

normalized_loss

public double normalized_loss()
Calculates a normalized loss value.
Returns:
The loss value normalized by the loss value range.

set_compute_log_loss

public static void set_compute_log_loss(boolean b)
Sets the computation of log loss option.
Parameters:
b - The new setting of the log loss option.

get_compute_log_loss

public static boolean get_compute_log_loss()
Returns TRUE if the log loss option is set, or FALSE otherwise.
Returns:
TRUE if the log loss option is set, FALSE otherwise.