|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Object | +--shared.InstanceList
The InstanceList class provides basic functions for ordered lists of instances. Instances may be labelled or unlabelled. Depending on usage, the list may or may not keep counts about various data in the list. These counts are kept in the BagCounters class.
Assumptions :
File format follows Quinlan (pp. 81-83) EXCEPT:
1) , : | \ . do not appear in names
2) . at end of lists is optional
3) a word that appears before the labels are enumerated, that is preceded by
\ is interpreted as a modifier. Currently, the only implemented modifier is
"weighted", which indicates that the list will be weighted. This means that
labels are assumed to be nominal type for read_names().
Comments :
Line numbers given are the result of '\n', not wrapping of lines.
"continuous" is not a legal name for the first nominal attribute value; it is reserved to indicate a continuous(RealAttrInfo) attribute.
"discrete" is not a legal name for the first nominal attribute value; it is reserved to indicate a discrete (NominalAttrInfo) attribute but with a dynamic set of values to be specified as they appear in the data file.
"discrete n" is supported, where n is an estimate of the number of values of the attribute.
"nolabel" may be specified as the label field ONLY. If specified, it indicates an unlabelled list.
Enhancements :
Cause fatal_error() if read_names() is called by methods other than the constructor or InstanceList.read_names()
Extend read_attributes to handle AttrInfo other than NominalAttrInfo and RealAttrInfo.
Expand capability of input function to:
1) allow . in name if not followed by a space
2) allow , : | and \ in names if preceded by a backslash
(this would mimic Quinlan)
Use lex to do the lexical analysis of the input file. This will be critical if the syntax becomes more complicated.
Ideally impute_unknown_values would handle both nominal and real values in a single pass. It should accept an array of operators allowing each attribute to handle unknowns in a different way. Obvious operators would be: unique_value, mode, mean...
| Field Summary | |
static LogOptions |
logOptions
LogOptions object containing information for logging purposes. |
| Constructor Summary | |
InstanceList(InstanceList source)
Constructor. |
|
InstanceList(InstanceList source,
boolean preserveCounters)
Copy constructor. |
|
InstanceList(InstanceList trainList,
java.lang.String testName)
Build an instance list which is designed to be a test list for some other training set. |
|
InstanceList(Schema catSchema)
Constructor. |
|
InstanceList(Schema catSchema,
FileSchema names,
java.lang.String testName)
Constructor. |
|
InstanceList(Schema catSchema,
java.lang.String file,
java.lang.String namesExtension,
java.lang.String testExtension)
Constructor. |
|
InstanceList(java.lang.String file)
Constructor. |
|
InstanceList(java.lang.String file,
java.lang.String namesExtension,
java.lang.String dataExtension)
Constructor. |
|
| Method Summary | |
java.util.ListIterator |
add_instance(Instance instance)
Adds the specified Instance to this InstanceList. |
AttrInfo |
attr_info(int attrNum)
Returns the information about a specific attribute stored in this InstanceList. |
java.lang.Object |
clone()
Returns a clone of this InstanceList object. |
java.lang.Object |
clone(boolean preserveCounters)
Returns a clone of this InstanceList object. |
BagCounters |
counters()
Creates and fills bagCounters. |
Instance[] |
create_inst_list_index()
Returns a reference to an array of references to the instances in the list. |
void |
display_names(java.io.Writer stream,
boolean protectChars,
java.lang.String header)
Displays the names file associated with the InstanceList. |
void |
display(boolean normalizeReal)
Displays the Instances stored in this InstanceList. |
void |
drop_counters()
Deletes the counters stored for Instances in this InstanceList. |
void |
ensure_counters()
Fills bagCounters by adding all instances into it. |
int[] |
get_distribution_order()
Returns the tiebreaking distribution order stored in the CatDist object for this InstanceList. |
static int |
get_max_attr_vals()
Returns the maximum number of attributes that can be used for Instances in this InstanceList. |
static int |
get_max_label_vals()
Returns the maximum number of labes that can be used to categorize Instances in this InstanceList. |
FileSchema |
get_original_schema()
Returns the FileSchema loaded into this InstanceList. |
Schema |
get_schema()
Returns the Schema for this InstanceList. |
double |
get_weight(Instance instance)
Returns the weight for the specified Instance. |
boolean |
has_counters()
Checks if this InstanceList has a set of bagcounters yet. |
InstanceList |
independent_sample(int size)
Returns a reference to an InstanceList with "size" instances randomly sampled (without replacement) from this InstanceList. |
InstanceList |
independent_sample(int size,
InstanceList restOfInstList)
Returns a reference to an InstanceList with 'size' instances and another reference to an InstanceList with the rest of the instances. |
InstanceList |
independent_sample(int size,
InstanceList restOfInstList,
java.util.Random mrandom)
Returns a reference to an InstanceList with 'size' instances and another reference to an InstanceList with the rest of the instances. |
InstanceList |
independent_sample(int size,
InstanceList restOfInstList,
java.util.Random mrandom,
Instance[] index)
Returns a reference to an InstanceList with 'size' instances and another reference to an InstanceList with the rest of the instances. |
InstanceList |
independent_sample(int size,
java.util.Random mrandom)
Returns a reference to an InstanceList with "size" instances randomly sampled (without replacement) from this InstanceList. |
InstanceList |
independent_sample(int size,
java.util.Random mrandom,
Instance[] index)
Returns a reference to an InstanceList with "size" instances randomly sampled (without replacement) from this InstanceList. |
void |
init_max_vals()
Sets the maximum values for attributes and labels in this InstanceList according to the MLJ options stored in the MLJ-options file. |
java.util.LinkedList |
instance_list()
Returns the list of Instances stored in this InstanceList. |
boolean |
is_weighted()
Checks if Instances stored in this InstanceList are weighted. |
AttrInfo |
label_info()
Returns the label information contained in this InstanceList's schema. |
java.util.ListIterator |
listIterator()
Appends the instances from the given list to this list. |
int |
majority_category(int[] tieBreakingOrder)
Returns the Category corresponding to the label that occurs most frequently in the InstanceList. |
boolean |
no_instances()
Checks if this InstanceList contains Instances. |
boolean |
no_weight()
Checks if the total weight of this InstanceList is approximately 0. |
NominalAttrInfo |
nominal_label_info()
Returns the nominal label information contained in this InstanceList's schema. |
void |
normalize_weights()
Normalize all weights by the number of instances in the list. |
void |
normalize_weights(double normFactor)
Normalize all weights by the number of instances in the list, times an optional normalization factor. |
void |
normalize_weights(double normFactor,
boolean allowZeros)
Normalize all weights by the number of instances in the list, times an optional normalization factor. |
int |
num_attr()
Returns the number of attributes in the InstanceList. |
int |
num_categories()
Returns the number of categories that the instances in the List can have. |
int |
num_instances()
Returns the number of instances in the InstanceList. |
void |
OK()
Checks integrity constraints. |
void |
OK(int level)
Checks integrity constraints. |
java.lang.String |
out(boolean normalizeReal)
Returns a String representation of this InstanceList object. |
void |
project_in_place(boolean[] projMask)
This function is very similar to project(), except that the list is projected "in place"--attributes are removed directly from the list and the schema is updated. |
InstanceList |
project(boolean[] attrMask)
This function takes an attribute mask which is an array of booleans indicating whether the corresponding attribute should be included in the projection. |
void |
read_data(java.lang.String file,
boolean isTest)
Reads the data from the supplied file. |
Instance |
reader_add_instance(AttrValue[] vals,
AttrValue labelVal,
double weight,
boolean allowUnknownLabels)
Adds a new instance to the list, using the structures maintained by InstanceReader. |
void |
remove_all_instances()
Removes all Instance objects stored in this InstanceList object. |
Instance |
remove_front()
Returns an InstanceRC corresponding to the first instance in the list. |
void |
remove_inst_with_unknown_attr()
Removes all instances that have unknown attributes from the data set. |
void |
remove_instance(java.util.ListIterator pix,
Instance instance)
Removes the specified Instance from the ListIterator of Instances supplied. |
InstanceList |
sample_with_replacement(int size,
InstanceList restOfInstList,
java.util.Random mrandom)
Sample_with_replacement takes an independent sample of the instance list, with replacement. |
static void |
set_max_attr_vals(int maxVals)
Sets the maximum number of attributes for Instances in this InstanceList. |
static void |
set_max_label_vals(int maxVals)
Sets the maximum number of labesl for Instances to be categorized as in this InstanceList. |
void |
set_schema(Schema schemaRC)
Clones the supplied Schema and sets this InstanceList object to use it. |
InstanceList |
shuffle()
Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances. |
InstanceList |
shuffle(java.util.Random mrandom)
Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances. |
InstanceList |
shuffle(java.util.Random mrandom,
Instance[] index)
Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances. |
InstanceList |
shuffle(java.util.Random mrandom,
Instance[] index,
boolean keepFileSchema)
Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances. |
InstanceList[] |
split_by_label()
Split the InstanceList according to the labels. |
InstanceList |
split_prefix(int numInSplit)
Returns a list with the first "numInSplit" instances removed from this list. |
InstanceList |
split_prefix(int numInSplit,
boolean keepFileSchema)
Returns a list with the first "numInSplit" instances removed from this list. |
double |
total_weight()
Returns the sum of the weights of all Instances in the InstanceList. |
double |
total_weight(boolean recalculate)
Returns the sum of the weights of all Instances in the InstanceList. |
RealAndLabelColumn[] |
transpose(boolean[] mask)
Splits the InstanceList into several RealAndLabelColumn structures for the parallel discretization. |
void |
unite(InstanceList instList)
Appends the instances from the given list to this list. |
void |
update_for_overflows(boolean[] projMask)
Updates the list by removing specified attributes. |
| Methods inherited from class java.lang.Object |
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
toString,
wait,
wait,
wait |
| Field Detail |
public static LogOptions logOptions
| Constructor Detail |
public InstanceList(java.lang.String file)
file - The root name of the file to be loaded into the InstanceList.
public InstanceList(java.lang.String file,
java.lang.String namesExtension,
java.lang.String dataExtension)
file - The root name of the file to be loaded into the InstanceList.namesExtension - The file extension for the schema file.dataExtension - The file extension for the data file.
public InstanceList(Schema catSchema,
java.lang.String file,
java.lang.String namesExtension,
java.lang.String testExtension)
catSchema - The schema of categories for these data sets.file - The root name of the file to be loaded into the InstanceList.namesExtension - The file extension for the schema file.testExtension - The file extension for the test file.public InstanceList(Schema catSchema)
catSchema - The schema of categories for these data sets.
public InstanceList(Schema catSchema,
FileSchema names,
java.lang.String testName)
catSchema - The schema of categories for these data sets.names - The schema of attributes for these data sets.testName - The file name for the test file.public InstanceList(InstanceList source)
source - The InstanceList that is being copied.
public InstanceList(InstanceList source,
boolean preserveCounters)
source - The InstanceList object to be copied.preserveCounters - TRUE if counters of values should be copied, FALSE otherwise.
public InstanceList(InstanceList trainList,
java.lang.String testName)
trainList - The training InstanceList that will be used to identify Schema for test data set.testName - The name of the file containing the test data set.| Method Detail |
public boolean has_counters()
public BagCounters counters()
public void ensure_counters()
public void read_data(java.lang.String file,
boolean isTest)
file - The name of the file containing the data set.isTest - Indicator of whether this is a test data set. True
indicates this is a test data set, False otherwise.public void remove_inst_with_unknown_attr()
public void remove_instance(java.util.ListIterator pix,
Instance instance)
pix - The ListIterator containing the Instance.instance - The Instance to be removed.public void remove_all_instances()
public int num_instances()
public int num_categories()
public NominalAttrInfo nominal_label_info()
public AttrInfo label_info()
public boolean no_instances()
public java.util.LinkedList instance_list()
public void project_in_place(boolean[] projMask)
throws java.lang.CloneNotSupportedException
projMask - An array of boolean values representing which attributes shall be use in this
InstanceList object. Values of projMask are related by order to the atributes.
Values of TRUE indicate that attribute will be used, FALSE indicates the
attribute will not be used.
public void set_schema(Schema schemaRC)
throws java.lang.CloneNotSupportedException
schemaRC - The Schema object to be cloned into this InstanceList object.public AttrInfo attr_info(int attrNum)
attrNum - The number of the attribute about which information is
requested.public int num_attr()
public static int get_max_attr_vals()
public static int get_max_label_vals()
public static void set_max_attr_vals(int maxVals)
maxVals - The maximum number of attributes allowed for Instances
in this InstanceList.public static void set_max_label_vals(int maxVals)
maxVals - The maximum number of labels that instances may be
categorized as.public Schema get_schema()
public FileSchema get_original_schema()
public void init_max_vals()
public void display(boolean normalizeReal)
normalizeReal - TRUE if the Instances should be normalized according
to the min/max stored for real attributes. If
min equals max, values are normalized to .5.public boolean is_weighted()
public Instance reader_add_instance(AttrValue[] vals,
AttrValue labelVal,
double weight,
boolean allowUnknownLabels)
vals - The values of the instance to be added.labelVal - The label value of the instance to be added.weight - The weight of the instance to be added.allowUnknownLabels - TRUE if unknown label values are allowed for the instance to be added.public java.util.ListIterator add_instance(Instance instance)
instance - The Instance to bo added.public void update_for_overflows(boolean[] projMask)
projMask - A boolean array with the same number of values as there are
attributes. Each boolean element coresponds to an attribute
In the order they were input. True values represent
attributes that are used.public int[] get_distribution_order()
public double total_weight()
public double total_weight(boolean recalculate)
recalculate - TRUE if the sum should be recalculated, FALSE if
the cached value should be used.public double get_weight(Instance instance)
instance - The Instance for which weight is questioned.public void drop_counters()
public void normalize_weights()
public void normalize_weights(double normFactor)
normFactor - The normalization factor.
public void normalize_weights(double normFactor,
boolean allowZeros)
normFactor - The normalization factor.allowZeros - TRUE if zeros are allowed for Instance weights. If FALSE,
Instance weights that are approximately equal 0, the weight
is automatically reset to a lower bound.public RealAndLabelColumn[] transpose(boolean[] mask)
mask - Boolean array of the same length as the number of attributes. TRUE values
indicate that attribute should have a RealAndLabelColumn object created for it,
FALSE otherwise.public boolean no_weight()
public int majority_category(int[] tieBreakingOrder)
tieBreakingOrder - Array indicating the order in which ties are broken. The array should be the same
length as the number of attributes and each element corresponds to an attribute.
Lower number elements represent attributes that are more favorable in an tie
than higher number elements.public InstanceList project(boolean[] attrMask)
attrMask - A boolean array with the same number of values as there are
attributes. Each boolean element corresponds to an attribute
in the order they were input. True values represent
attributes that are used.public java.lang.String out(boolean normalizeReal)
normalizeReal - TRUE if real values in an Instance object should be normalized.public java.lang.Object clone(boolean preserveCounters)
preserveCounters - TRUE if counters of values should be copied, FALSE otherwise.public java.lang.Object clone()
public void OK()
public void OK(int level)
level - Level of checking done.
public void display_names(java.io.Writer stream,
boolean protectChars,
java.lang.String header)
stream - Writer object to which the names file will be displayed.protectChars - TRUE if protected characters are used, FALSE otherwise.header - A String to use for the header to the display.public InstanceList[] split_by_label()
public InstanceList sample_with_replacement(int size,
InstanceList restOfInstList,
java.util.Random mrandom)
size - The size of the list of samples requested.restOfInstList - An InstanceList object containing the Instances not sampled.mrandom - Random number generator for randomly sampling Instances.
public InstanceList shuffle(java.util.Random mrandom,
Instance[] index,
boolean keepFileSchema)
mrandom - Random number generator used for shuffling.index - Array of Instances being shuffled.keepFileSchema - TRUE if the FileSchema for this InstanceList object should be copied to the
InstanceList with the shuffled Instances.
public InstanceList shuffle(java.util.Random mrandom,
Instance[] index)
mrandom - Random number generator used for shuffling.index - Array of Instances being shuffled.public InstanceList shuffle(java.util.Random mrandom)
mrandom - Random number generator used for shuffling.public InstanceList shuffle()
public InstanceList split_prefix(int numInSplit,
boolean keepFileSchema)
numInSplit - The number of Instances to be split from this InstanceList object.keepFileSchema - TRUE if the FileSchema should be copied to the new InstanceList, FALSE otherwise.public InstanceList split_prefix(int numInSplit)
numInSplit - The number of Instances to be split from this InstanceList object.public Instance remove_front()
public InstanceList independent_sample(int size,
java.util.Random mrandom,
Instance[] index)
size - The number of Instances requested in the sample.mrandom - Random number generator for randomly selecting Instances.index - Array of Instances to be sampled from.
public InstanceList independent_sample(int size,
java.util.Random mrandom)
size - The number of Instances requested in the sample.mrandom - Random number generator for randomly selecting Instances.public InstanceList independent_sample(int size)
size - The number of Instances requested in the sample.
public InstanceList independent_sample(int size,
InstanceList restOfInstList,
java.util.Random mrandom,
Instance[] index)
size - The number of Instances requested in the sample.restOfInstList - The InstanceList containing Instances not contained in the sample.mrandom - Random number generator for randomly selecting Instances.index - Array of Instances to be sampled from.
public InstanceList independent_sample(int size,
InstanceList restOfInstList,
java.util.Random mrandom)
size - The number of Instances requested in the sample.restOfInstList - The InstanceList containing Instances not contained in the sample.mrandom - Random number generator for randomly selecting Instances.
public InstanceList independent_sample(int size,
InstanceList restOfInstList)
size - The number of Instances requested in the sample.restOfInstList - The InstanceList containing Instances not contained in the sample.public Instance[] create_inst_list_index()
public void unite(InstanceList instList)
instList - The supplied InstanceList object to be appended to this object.public java.util.ListIterator listIterator()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||