shared
Class CatDist

java.lang.Object
  |
  +--shared.CatDist

public class CatDist
extends java.lang.Object

The CatDist class is for representing a distribution of categories. A CatDist object is produced by a categorizer during the scoring process. A loss function may optionally be applied to the CatDist.

It is assumed the distribution is normalized. This is done automatically on construction. The internal array dist should be indexed by category number, starting with UNKNOWN_CATEGORY_VAL.


Field Summary
static int evidence
          Evidence Correction Type value.
static int laplace
          Laplace Correction Type value.
static LogOptions logOptions
          The options for logging displays.
static int none
          None Correction Type value.
 
Constructor Summary
CatDist(CatDist cDist)
          Copy constructor.
CatDist(Schema aSchema, AugCategory aug)
          Constructor.
CatDist(Schema aSchema, double[] fCounts, int cType)
          Constructor.
CatDist(Schema aSchema, double[] fCounts, int cType, double cParam)
          Constructor.
CatDist(Schema aSchema, DoubleRef unknownProb, double[] aDist)
          Constructor.
CatDist(Schema aSchema, int singleCat)
          Constructor.
 
Method Summary
static void apply_evidence_projection(double[] counts, double eviFactor, boolean firstIsUnknown)
          Applies the evidence projection algorithm.
 AugCategory best_category()
          Returns the best category according to the weight distribution.
 void check_tiebreaking_order(int[] order)
          Checks if the current tiebreaking order is the same as the given tiebreaking order.
 Schema get_schema()
          Returns the Schema stored in this CatDist object.
 double[] get_scores()
          Returns the distribution scores.
 int[] get_tiebreaking_order()
          Returns the tiebreaking order.
static void main(java.lang.String[] args)
          Testing code for the CatDist class.
static int majority_category(double[] weightDistribution, int[] tieBreakingOrder)
          Finds the majority category in the given weight distribution, using the given tie breaking order.
static int[] merge_tiebreaking_order(double[] weightDistribution)
          Merges the tie breaking order with the given weight distribution.
static int[] merge_tiebreaking_order(int[] tieBreakingOrder, double[] weightDistribution)
          Merges a given tie breaking order with the given weight distribution.
static void multiply_losses(double[][] lossMatrix, double[] probDist, double[] lossVector)
          Multiplies the given distribution by the given loss matrix to produce a vector of expected losses.
 void set_default_tiebreaking()
          Sets the tiebreaking order to the default values.
 void set_preferred_category(int cat)
          Specifies a single category to prefer if it is ever involved in a tie.
 void set_scores(double[] fCounts, int cType, double cParam)
          Sets the distribution scores for the current distribution.
 void set_scores(DoubleRef unknownProb, double[] aDist)
          Sets the distribution scores for the given distribution.
 void set_scores(int singleCat)
          Allows the results stored in and returned by a CatDist to be changed.
 void set_tiebreaking_order(int[] order)
          Sets the tiebreaking order to the given order.
static double single_evidence_projection(double count, double total, double maxEvidence)
          Returns a single, unnormalized, evidence projection of a count based on the max evidence available.
static int[] tiebreaking_order(double[] weightDistribution)
          Builds a tie breaking order from a weight distribution.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

none

public static final int none
None Correction Type value.

laplace

public static final int laplace
Laplace Correction Type value.

evidence

public static final int evidence
Evidence Correction Type value.

logOptions

public static LogOptions logOptions
The options for logging displays.
Constructor Detail

CatDist

public CatDist(Schema aSchema,
               AugCategory aug)
Constructor. It builds a distribution based on a single category, with a 1.0 probability given to this category, and 0.0 to all others.
Parameters:
aSchema - The Schema for the data in this distribution.
aug - The AugCategory with information on the category on which this distribution is built.

CatDist

public CatDist(Schema aSchema,
               int singleCat)
Constructor. It builds an all-or-nothing distribution based on a single category, with a 1.0 probability given to this category, and 0.0 to all others.
Parameters:
aSchema - The Schema for the data in this distribution.
singleCat - The specific category on which this distribution is built.

CatDist

public CatDist(Schema aSchema,
               double[] fCounts,
               int cType)
Constructor.
Parameters:
aSchema - The Schema for the data in this distribution.
fCounts - The frequency count of categories found as labels.
cType - Type of correction to perform. Range is CatDist.none, CatDist.laplace, CatDist.evidence.

CatDist

public CatDist(Schema aSchema,
               double[] fCounts,
               int cType,
               double cParam)
Constructor.
Parameters:
aSchema - The Schema for the data in this distribution.
fCounts - The frequency count of categories found as labels.
cType - Type of correction to perform. Range is CatDist.none, CatDist.laplace, CatDist.evidence.
cParam - Correction parameter. Must be equal to or greater than 0.

CatDist

public CatDist(Schema aSchema,
               DoubleRef unknownProb,
               double[] aDist)
Constructor.
Parameters:
aSchema - The Schema for the data in this distribution.
unknownProb - The desired probability weight for the unknown category.
aDist - A weight distribution for this CatDist object.

CatDist

public CatDist(CatDist cDist)
Copy constructor.
Parameters:
cDist - The CatDist object to be copied.
Method Detail

merge_tiebreaking_order

public static int[] merge_tiebreaking_order(double[] weightDistribution)
Merges the tie breaking order with the given weight distribution.
Parameters:
weightDistribution - The given weight distribution of categories.
Returns:
The tie breaking order.

majority_category

public static int majority_category(double[] weightDistribution,
                                    int[] tieBreakingOrder)
Finds the majority category in the given weight distribution, using the given tie breaking order.
Parameters:
weightDistribution - The weight sums for each category found.
tieBreakingOrder - The order of choices in the event that a tie occurs between categories.
Returns:
The category which appears the most among the labelled instances.

merge_tiebreaking_order

public static int[] merge_tiebreaking_order(int[] tieBreakingOrder,
                                            double[] weightDistribution)
Merges a given tie breaking order with the given weight distribution.
Parameters:
tieBreakingOrder - The order for choices in the event that a tie occurs between categories.
weightDistribution - The given weight distribution of categories.
Returns:
The tie breaking order.

get_schema

public Schema get_schema()
Returns the Schema stored in this CatDist object.
Returns:
The Schema for data on which this CatDist object contains information.

set_scores

public void set_scores(int singleCat)
Allows the results stored in and returned by a CatDist to be changed. This method takes a single category index and builds an all-or-nothing distribution around it. 1.0 probability mass is given to the single category and 0.0 is given to all others.
Parameters:
singleCat - The index for the category that should have a 1.0 probability mass.

set_preferred_category

public void set_preferred_category(int cat)
Specifies a single category to prefer if it is ever involved in a tie. If a tie occurs which does not involve the preferred category, the first category (but never unknown) will be chosen.
Parameters:
cat - The index of the category to be preferred.

tiebreaking_order

public static int[] tiebreaking_order(double[] weightDistribution)
Builds a tie breaking order from a weight distribution. If there is a tie among weights, the first one will have a better (i.e. lower) tie breaking rank.
Parameters:
weightDistribution - The distribution of weights for label categories.
Returns:
The tie breaking order of ranks.

best_category

public AugCategory best_category()
Returns the best category according to the weight distribution. If a loss matrix is defined, the distribution will be multiplied by the loss matrix to produce a vector of expected losses. The best category is the one with the smallest expected loss.
Returns:
An AugCategory containing information about the best category found.

set_scores

public void set_scores(double[] fCounts,
                       int cType,
                       double cParam)
Sets the distribution scores for the current distribution.
Parameters:
fCounts - The frequency counts of categories found.
cType - Type of correction to perform. Range is CatDist.none, CatDist.laplace, CatDist.evidence.
cParam - Correction parameter. Must be equal to or greater than 0.

set_scores

public void set_scores(DoubleRef unknownProb,
                       double[] aDist)
Sets the distribution scores for the given distribution.
Parameters:
unknownProb - The probability weight for the unknown category.
aDist - The distribution weights. This will be altered by this method. Must have a length equal to the number of categories.

set_default_tiebreaking

public void set_default_tiebreaking()
Sets the tiebreaking order to the default values.

get_scores

public double[] get_scores()
Returns the distribution scores.
Returns:
The distribution of scores.

multiply_losses

public static void multiply_losses(double[][] lossMatrix,
                                   double[] probDist,
                                   double[] lossVector)
Multiplies the given distribution by the given loss matrix to produce a vector of expected losses.
Parameters:
lossMatrix - The loss matrix.
probDist - The probability distribution for which loss is to be calculated. Must have the same bounds as the number of columns in the loss matrix.
lossVector - Contains the vector of expected losses. Will be changed by this function. Must have the same bounds as the number of columns in the loss matrix.

apply_evidence_projection

public static void apply_evidence_projection(double[] counts,
                                             double eviFactor,
                                             boolean firstIsUnknown)
Applies the evidence projection algorithm. This function is used by CatDist's auto-correction mode, and also inside NaiveBayesCat when it turns on evidence projection.
Parameters:
counts - Counts is an array of frequency counts. It is assumed that the sum of these counts is the total weight of information(i.e. total number of instances). This total weight is scaled by eviFactor.
eviFactor - Factor for computing the total evidence available.
firstIsUnknown - Setting firstIsUnknown to TRUE will cause the first value in the counts array to be treated as "unknown"-- it will not participate in the projection algorithm but will reduce probability weight given to the other counts. The counts array is adjusted in-place to become a normalized array of corrected probabilities.

single_evidence_projection

public static double single_evidence_projection(double count,
                                                double total,
                                                double maxEvidence)
Returns a single, unnormalized, evidence projection of a count based on the max evidence available.
Parameters:
count - The count of a particular category.
total - The total count of all categories.
maxEvidence - Projection factor.
Returns:
An evidence projection of the category count.

set_tiebreaking_order

public void set_tiebreaking_order(int[] order)
Sets the tiebreaking order to the given order.
Parameters:
order - The new tiebreaking order. The length of the array should be the same as the number of categories.

check_tiebreaking_order

public void check_tiebreaking_order(int[] order)
Checks if the current tiebreaking order is the same as the given tiebreaking order. If it is not, and error message is displayed.
Parameters:
order - The order to be compared to.

get_tiebreaking_order

public int[] get_tiebreaking_order()
Returns the tiebreaking order.
Returns:
The tiebreaking order.

main

public static void main(java.lang.String[] args)
Testing code for the CatDist class.
Parameters:
args - Command line arguments.