shared
Class SplitScore

java.lang.Object
  |
  +--shared.SplitScore
Direct Known Subclasses:
SplitAttr

public class SplitScore
extends java.lang.Object

A class for determining, holding, and returning the information associated with an attribute split. Uses methods from the Entropy class for the determination of the scores. The scores for a specific item are cached in an internal cache structure. This structure contains the entropy, conditional entropy, split entropy, mutual information, gain ratio, split distribution, label distribution, and the number of instances that these scores have been evaluated on. Access to scores that have been cached are constant time, while items that need refreshment have a longer access time for recalculation.


Field Summary
static byte defaultSplitScoreCriterion
          Default criterion for determining the score of a particular split.
static byte externalScore
          Value indicating how splits are scored.
static byte gainRatio
          Value indicating how splits are scored.
protected  LogOptions logOptions
          Logging options for this class.
static byte mutualInfo
          Value indicating how splits are scored.
static byte mutualInfoRatio
          Value indicating how splits are scored.
static byte normalizedMutualInfo
          Value indicating how splits are scored.
static java.lang.String[] splitScoreCriterionEnum
          String names for each form of split criterion.
 
Constructor Summary
SplitScore()
          Constructor.
SplitScore(SplitScore source)
          Copy Constructor.
 
Method Summary
 SplitScore assign(SplitScore rhs)
          Assigns the given SplitScore data to this SplitScore.
 void display()
          Produces formatted display of contents of object.
 void display(java.io.Writer stream)
          Produces formatted display of contents of object.
 void display(java.io.Writer stream, DisplayPref dp)
          Produces formatted display of contents of object.
 double get_cond_entropy()
          Returns cache.condEntropy, first checking to see if it has yet been set.
 double get_entropy()
          Returns cache.entropy, first checking to see if it has yet been set.
 double get_external_score()
          Returns the value, set externally, for the score.
 double get_gain_ratio()
          Determines, and returns, cache.gainRatio.
 double[] get_label_dist()
          The label distribution is calculated from the split and label distribution.
 int get_log_level()
          Returns the logging level for this object.
 LogOptions get_log_options()
          Returns the LogOptions object for this object.
 java.io.Writer get_log_stream()
          Returns the stream to which logs for this object are written.
 double get_mutual_info_ratio()
          Returns the mutual information ratio, which is the ratio between the mutual info and entropy.
 double get_mutual_info(boolean normalize)
          Returns the mutual info (information gain) score for the split this SplitScore object represents.
 double[][] get_split_and_label_dist()
          Returns a reference to the requested distribution array.
 double[] get_split_dist()
          The split distribution is calculated from the split and label distribution.
 double get_split_entropy()
          Returns cache.splitEntropy, first checking to see if it has yet been set.
 byte get_split_score_criterion()
          Returns the type of criterion used in scoring splits.
 double get_unnormalized_mutual_info()
          Returns the mutual info (information gain) score for the split this SplitScore object represents.
 boolean has_distribution()
          Checks if there exists a splitAndLabel distribution.
 boolean has_distribution(boolean fatalOnFalse)
          Checks if there exists a splitAndLabel distribution.
 boolean has_external_score()
          Checks if an external score has been set.
 double normalize_by_num_splits(double score)
          Normalize by the number of splits.
 int num_splits()
          Returns the number of splits--not including unknowns.
 double[] release_label_dist()
          Returns the label distribution array and releases ownership.
 double[][] release_split_and_label_dist()
          Returns the split and label distribution array and releases ownership.
 double[] release_split_dist()
          Returns the split distribution array and releases ownership.
 void reset()
          Clear (delete) distribution array data.
 double score()
          The criterion calculation depends on the score criterion.
 double score(double[][] sAndLDist, double[] sDist, double[] lDist, double passedEntropy, double passedWeight)
          Computes the scores and updates the cache when there are being computed many times for the same number of instances and entropy.
 void set_external_score(double extScore)
          Set the external score.
 void set_log_level(int level)
          Sets the logging level for this object.
 void set_log_options(LogOptions opt)
          Sets the LogOptions object for this object.
 void set_log_prefixes(java.lang.String file, int line, int lvl1, int lvl2)
          Sets the logging message prefix for this object.
 void set_log_stream(java.io.Writer strm)
          Sets the stream to which logging options are displayed.
 double[][] set_split_and_label_dist(double[][] sAndLDist)
          Stores the splitAndLabelDist array.
 void set_split_dist(double[] sDist)
          Stores the cache.splitDist array.
 void set_split_score_criterion(byte choice)
          Sets the split score criterion.
 double total_weight()
          Returns the total weight from the cache.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mutualInfo

public static final byte mutualInfo
Value indicating how splits are scored.

normalizedMutualInfo

public static final byte normalizedMutualInfo
Value indicating how splits are scored.

gainRatio

public static final byte gainRatio
Value indicating how splits are scored.

mutualInfoRatio

public static final byte mutualInfoRatio
Value indicating how splits are scored.

externalScore

public static final byte externalScore
Value indicating how splits are scored.

defaultSplitScoreCriterion

public static byte defaultSplitScoreCriterion
Default criterion for determining the score of a particular split.
See Also:
mutualInfo, normalizedMutualInfo, gainRatio, mutualInfoRatio, externalScore

splitScoreCriterionEnum

public static java.lang.String[] splitScoreCriterionEnum
String names for each form of split criterion.

logOptions

protected LogOptions logOptions
Logging options for this class.
Constructor Detail

SplitScore

public SplitScore(SplitScore source)
Copy Constructor.
Parameters:
source - The SplitScore to be copied.

SplitScore

public SplitScore()
Constructor.
Method Detail

set_log_level

public void set_log_level(int level)
Sets the logging level for this object.
Parameters:
level - The new logging level.

get_log_level

public int get_log_level()
Returns the logging level for this object.
Returns:
The level of logging in this class.

set_log_stream

public void set_log_stream(java.io.Writer strm)
Sets the stream to which logging options are displayed.
Parameters:
strm - The stream to which logs will be written.

get_log_stream

public java.io.Writer get_log_stream()
Returns the stream to which logs for this object are written.
Returns:
The stream to which logs for this object are written.

get_log_options

public LogOptions get_log_options()
Returns the LogOptions object for this object.
Returns:
The LogOptions object for this object.

set_log_options

public void set_log_options(LogOptions opt)
Sets the LogOptions object for this object.
Parameters:
opt - The new LogOptions object.

set_log_prefixes

public void set_log_prefixes(java.lang.String file,
                             int line,
                             int lvl1,
                             int lvl2)
Sets the logging message prefix for this object.
Parameters:
file - The file name to be displayed in the prefix of log messages.
line - The line number to be displayed in the prefix of log messages.
lvl1 - The log level of the statement being logged.
lvl2 - The level of log messages being displayed.

get_unnormalized_mutual_info

public double get_unnormalized_mutual_info()
Returns the mutual info (information gain) score for the split this SplitScore object represents. Created to avoid JVM error.
Returns:
The unnormalized mutual info for this split.

get_mutual_info

public double get_mutual_info(boolean normalize)
Returns the mutual info (information gain) score for the split this SplitScore object represents. This method updates the cache.
Parameters:
normalize - TRUE if normalization is requested, FALSE otherwise.
Returns:
The mutual info value for this split.

normalize_by_num_splits

public double normalize_by_num_splits(double score)
Normalize by the number of splits. Divide by (the number of bits needed to store the value (number of splits - 1)). This method updates the cache.
Parameters:
score - The score to be normalized.
Returns:
The normalized score value.

num_splits

public int num_splits()
Returns the number of splits--not including unknowns. This method updates the cache.
Returns:
The number of splits.

get_split_dist

public double[] get_split_dist()
The split distribution is calculated from the split and label distribution.
Returns:
The split distribution.

has_distribution

public boolean has_distribution()
Checks if there exists a splitAndLabel distribution.
Returns:
TRUE if there is a splitAndLabel distribution, FALSE otherwise.

has_distribution

public boolean has_distribution(boolean fatalOnFalse)
Checks if there exists a splitAndLabel distribution.
Parameters:
fatalOnFalse - TRUE if an error message is to be displayed if there is no splitAndLabel distribution.
Returns:
TRUE if there is a splitAndLabel distribution, FALSE otherwise.

get_cond_entropy

public double get_cond_entropy()
Returns cache.condEntropy, first checking to see if it has yet been set. This method updates the cache.
Returns:
The condEntropy stored in the cache.

total_weight

public double total_weight()
Returns the total weight from the cache. This method updates the cache.
Returns:
The total weight.

get_label_dist

public double[] get_label_dist()
The label distribution is calculated from the split and label distribution. This method updates the cache.
Returns:
The label distribution.

get_split_and_label_dist

public double[][] get_split_and_label_dist()
Returns a reference to the requested distribution array.
Returns:
The splitAndLabel distribution array.

get_entropy

public double get_entropy()
Returns cache.entropy, first checking to see if it has yet been set. This method updates the cache.
Returns:
The entropy stored in the cache.

score

public double score()
The criterion calculation depends on the score criterion. For gainRatio it's (surprise) gain ratio. For mutualInfo and normalizedMutualInfo it's mutualInfo. For mutualInfoRatio it's mutualInfo / entropy. This method updates the cache.
Returns:
The score for the split.

score

public double score(double[][] sAndLDist,
                    double[] sDist,
                    double[] lDist,
                    double passedEntropy,
                    double passedWeight)
Computes the scores and updates the cache when there are being computed many times for the same number of instances and entropy. This would happen, for instance, when determining the best threshold for a split.
Parameters:
sAndLDist - The split and label distribution.
sDist - The split distribution.
lDist - The label distribution.
passedEntropy - The entropy value for this split.
passedWeight - The weight of instances for this split.
Returns:
The score for this split distribution.
See Also:
Entropy.find_best_threshold(shared.RealAndLabelColumn, double, shared.SplitScore, shared.DoubleRef, shared.IntRef, shared.IntRef, int, double)

get_split_score_criterion

public byte get_split_score_criterion()
Returns the type of criterion used in scoring splits.
Returns:
The scoring criterion.
See Also:
mutualInfo, normalizedMutualInfo, gainRatio, mutualInfoRatio, externalScore

get_external_score

public double get_external_score()
Returns the value, set externally, for the score.
Returns:
The externally set score value.

get_mutual_info_ratio

public double get_mutual_info_ratio()
Returns the mutual information ratio, which is the ratio between the mutual info and entropy. The mutual information must be >= 0. Although const, this method updates the cache.
Returns:
Mutual information ratio.

get_gain_ratio

public double get_gain_ratio()
Determines, and returns, cache.gainRatio. This method updates the cache.
Returns:
The gainRatio stored in the cache.

get_split_entropy

public double get_split_entropy()
Returns cache.splitEntropy, first checking to see if it has yet been set. This method updates the cache.
Returns:
The split entropy value stored in the cache.

set_split_score_criterion

public void set_split_score_criterion(byte choice)
Sets the split score criterion.
Parameters:
choice - The chosen split score criterion.
See Also:
mutualInfo, normalizedMutualInfo, #rainRatio, mutualInfoRatio, externalScore

reset

public void reset()
Clear (delete) distribution array data.

set_split_dist

public void set_split_dist(double[] sDist)
Stores the cache.splitDist array.
Parameters:
sDist - The split distribution array to be cached.

set_split_and_label_dist

public double[][] set_split_and_label_dist(double[][] sAndLDist)
Stores the splitAndLabelDist array.
Parameters:
sAndLDist - The new split and label distribution.
Returns:
The old split and label distribution.

set_external_score

public void set_external_score(double extScore)
Set the external score. The score must be non-negative, although we could change it to anything but UNDEFINED_REAL.
Parameters:
extScore - The external score value.

has_external_score

public boolean has_external_score()
Checks if an external score has been set.
Returns:
TRUE if the external score is set, FALSE otherwise.

display

public void display()
Produces formatted display of contents of object. This method updates the cache. Does not abort on unset data.

display

public void display(java.io.Writer stream)
Produces formatted display of contents of object. This method updates the cache. Does not abort on unset data. This method updates the cache.
Parameters:
stream - The Writer to be displayed to.

display

public void display(java.io.Writer stream,
                    DisplayPref dp)
Produces formatted display of contents of object. This method updates the cache. Does not abort on unset data. This method updates the cache.
Parameters:
stream - The Writer to be displayed to.
dp - The display preferences.

assign

public SplitScore assign(SplitScore rhs)
Assigns the given SplitScore data to this SplitScore.
Parameters:
rhs - The SplitScore to be copied.
Returns:
This SplitScore after assignment.

release_split_and_label_dist

public double[][] release_split_and_label_dist()
Returns the split and label distribution array and releases ownership.
Returns:
The split and label distribution.

release_label_dist

public double[] release_label_dist()
Returns the label distribution array and releases ownership.
Returns:
The label distribution.

release_split_dist

public double[] release_split_dist()
Returns the split distribution array and releases ownership.
Returns:
The split distribution.