shared
Class InstanceList

java.lang.Object
  |
  +--shared.InstanceList

public class InstanceList
extends java.lang.Object
implements java.lang.Cloneable

The InstanceList class provides basic functions for ordered lists of instances. Instances may be labelled or unlabelled. Depending on usage, the list may or may not keep counts about various data in the list. These counts are kept in the BagCounters class.

Assumptions :

File format follows Quinlan (pp. 81-83) EXCEPT:
1) , : | \ . do not appear in names
2) . at end of lists is optional
3) a word that appears before the labels are enumerated, that is preceded by \ is interpreted as a modifier. Currently, the only implemented modifier is "weighted", which indicates that the list will be weighted. This means that labels are assumed to be nominal type for read_names().

Comments :

Line numbers given are the result of '\n', not wrapping of lines.

"continuous" is not a legal name for the first nominal attribute value; it is reserved to indicate a continuous(RealAttrInfo) attribute.

"discrete" is not a legal name for the first nominal attribute value; it is reserved to indicate a discrete (NominalAttrInfo) attribute but with a dynamic set of values to be specified as they appear in the data file.

"discrete n" is supported, where n is an estimate of the number of values of the attribute.

"nolabel" may be specified as the label field ONLY. If specified, it indicates an unlabelled list.

Enhancements :

Cause fatal_error() if read_names() is called by methods other than the constructor or InstanceList.read_names()

Extend read_attributes to handle AttrInfo other than NominalAttrInfo and RealAttrInfo.

Expand capability of input function to:

1) allow . in name if not followed by a space

2) allow , : | and \ in names if preceded by a backslash

(this would mimic Quinlan)

Use lex to do the lexical analysis of the input file. This will be critical if the syntax becomes more complicated.

Ideally impute_unknown_values would handle both nominal and real values in a single pass. It should accept an array of operators allowing each attribute to handle unknowns in a different way. Obvious operators would be: unique_value, mode, mean...


Field Summary
static LogOptions logOptions
          LogOptions object containing information for logging purposes.
 
Constructor Summary
InstanceList(InstanceList source)
          Constructor.
InstanceList(InstanceList source, boolean preserveCounters)
          Copy constructor.
InstanceList(InstanceList trainList, java.lang.String testName)
          Build an instance list which is designed to be a test list for some other training set.
InstanceList(Schema catSchema)
          Constructor.
InstanceList(Schema catSchema, FileSchema names, java.lang.String testName)
          Constructor.
InstanceList(Schema catSchema, java.lang.String file, java.lang.String namesExtension, java.lang.String testExtension)
          Constructor.
InstanceList(java.lang.String file)
          Constructor.
InstanceList(java.lang.String file, java.lang.String namesExtension, java.lang.String dataExtension)
          Constructor.
 
Method Summary
 java.util.ListIterator add_instance(Instance instance)
          Adds the specified Instance to this InstanceList.
 AttrInfo attr_info(int attrNum)
          Returns the information about a specific attribute stored in this InstanceList.
 java.lang.Object clone()
          Returns a clone of this InstanceList object.
 java.lang.Object clone(boolean preserveCounters)
          Returns a clone of this InstanceList object.
 BagCounters counters()
          Creates and fills bagCounters.
 Instance[] create_inst_list_index()
          Returns a reference to an array of references to the instances in the list.
 void display_names(java.io.Writer stream, boolean protectChars, java.lang.String header)
          Displays the names file associated with the InstanceList.
 void display(boolean normalizeReal)
          Displays the Instances stored in this InstanceList.
 void drop_counters()
          Deletes the counters stored for Instances in this InstanceList.
 void ensure_counters()
          Fills bagCounters by adding all instances into it.
 int[] get_distribution_order()
          Returns the tiebreaking distribution order stored in the CatDist object for this InstanceList.
static int get_max_attr_vals()
          Returns the maximum number of attributes that can be used for Instances in this InstanceList.
static int get_max_label_vals()
          Returns the maximum number of labes that can be used to categorize Instances in this InstanceList.
 FileSchema get_original_schema()
          Returns the FileSchema loaded into this InstanceList.
 Schema get_schema()
          Returns the Schema for this InstanceList.
 double get_weight(Instance instance)
          Returns the weight for the specified Instance.
 boolean has_counters()
          Checks if this InstanceList has a set of bagcounters yet.
 InstanceList independent_sample(int size)
          Returns a reference to an InstanceList with "size" instances randomly sampled (without replacement) from this InstanceList.
 InstanceList independent_sample(int size, InstanceList restOfInstList)
          Returns a reference to an InstanceList with 'size' instances and another reference to an InstanceList with the rest of the instances.
 InstanceList independent_sample(int size, InstanceList restOfInstList, java.util.Random mrandom)
          Returns a reference to an InstanceList with 'size' instances and another reference to an InstanceList with the rest of the instances.
 InstanceList independent_sample(int size, InstanceList restOfInstList, java.util.Random mrandom, Instance[] index)
          Returns a reference to an InstanceList with 'size' instances and another reference to an InstanceList with the rest of the instances.
 InstanceList independent_sample(int size, java.util.Random mrandom)
          Returns a reference to an InstanceList with "size" instances randomly sampled (without replacement) from this InstanceList.
 InstanceList independent_sample(int size, java.util.Random mrandom, Instance[] index)
          Returns a reference to an InstanceList with "size" instances randomly sampled (without replacement) from this InstanceList.
 void init_max_vals()
          Sets the maximum values for attributes and labels in this InstanceList according to the MLJ options stored in the MLJ-options file.
 java.util.LinkedList instance_list()
          Returns the list of Instances stored in this InstanceList.
 boolean is_weighted()
          Checks if Instances stored in this InstanceList are weighted.
 AttrInfo label_info()
          Returns the label information contained in this InstanceList's schema.
 java.util.ListIterator listIterator()
          Appends the instances from the given list to this list.
 int majority_category(int[] tieBreakingOrder)
          Returns the Category corresponding to the label that occurs most frequently in the InstanceList.
 boolean no_instances()
          Checks if this InstanceList contains Instances.
 boolean no_weight()
          Checks if the total weight of this InstanceList is approximately 0.
 NominalAttrInfo nominal_label_info()
          Returns the nominal label information contained in this InstanceList's schema.
 void normalize_weights()
          Normalize all weights by the number of instances in the list.
 void normalize_weights(double normFactor)
          Normalize all weights by the number of instances in the list, times an optional normalization factor.
 void normalize_weights(double normFactor, boolean allowZeros)
          Normalize all weights by the number of instances in the list, times an optional normalization factor.
 int num_attr()
          Returns the number of attributes in the InstanceList.
 int num_categories()
          Returns the number of categories that the instances in the List can have.
 int num_instances()
          Returns the number of instances in the InstanceList.
 void OK()
          Checks integrity constraints.
 void OK(int level)
          Checks integrity constraints.
 java.lang.String out(boolean normalizeReal)
          Returns a String representation of this InstanceList object.
 void project_in_place(boolean[] projMask)
          This function is very similar to project(), except that the list is projected "in place"--attributes are removed directly from the list and the schema is updated.
 InstanceList project(boolean[] attrMask)
          This function takes an attribute mask which is an array of booleans indicating whether the corresponding attribute should be included in the projection.
 void read_data(java.lang.String file, boolean isTest)
          Reads the data from the supplied file.
 Instance reader_add_instance(AttrValue[] vals, AttrValue labelVal, double weight, boolean allowUnknownLabels)
          Adds a new instance to the list, using the structures maintained by InstanceReader.
 void remove_all_instances()
          Removes all Instance objects stored in this InstanceList object.
 Instance remove_front()
          Returns an InstanceRC corresponding to the first instance in the list.
 void remove_inst_with_unknown_attr()
          Removes all instances that have unknown attributes from the data set.
 void remove_instance(java.util.ListIterator pix, Instance instance)
          Removes the specified Instance from the ListIterator of Instances supplied.
 InstanceList sample_with_replacement(int size, InstanceList restOfInstList, java.util.Random mrandom)
          Sample_with_replacement takes an independent sample of the instance list, with replacement.
static void set_max_attr_vals(int maxVals)
          Sets the maximum number of attributes for Instances in this InstanceList.
static void set_max_label_vals(int maxVals)
          Sets the maximum number of labesl for Instances to be categorized as in this InstanceList.
 void set_schema(Schema schemaRC)
          Clones the supplied Schema and sets this InstanceList object to use it.
 InstanceList shuffle()
          Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances.
 InstanceList shuffle(java.util.Random mrandom)
          Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances.
 InstanceList shuffle(java.util.Random mrandom, Instance[] index)
          Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances.
 InstanceList shuffle(java.util.Random mrandom, Instance[] index, boolean keepFileSchema)
          Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances.
 InstanceList[] split_by_label()
          Split the InstanceList according to the labels.
 InstanceList split_prefix(int numInSplit)
          Returns a list with the first "numInSplit" instances removed from this list.
 InstanceList split_prefix(int numInSplit, boolean keepFileSchema)
          Returns a list with the first "numInSplit" instances removed from this list.
 double total_weight()
          Returns the sum of the weights of all Instances in the InstanceList.
 double total_weight(boolean recalculate)
          Returns the sum of the weights of all Instances in the InstanceList.
 RealAndLabelColumn[] transpose(boolean[] mask)
          Splits the InstanceList into several RealAndLabelColumn structures for the parallel discretization.
 void unite(InstanceList instList)
          Appends the instances from the given list to this list.
 void update_for_overflows(boolean[] projMask)
          Updates the list by removing specified attributes.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logOptions

public static LogOptions logOptions
LogOptions object containing information for logging purposes.
Constructor Detail

InstanceList

public InstanceList(java.lang.String file)
Constructor.
Parameters:
file - The root name of the file to be loaded into the InstanceList.

InstanceList

public InstanceList(java.lang.String file,
                    java.lang.String namesExtension,
                    java.lang.String dataExtension)
Constructor. InstanceList(String, String, String) takes complexity of InstanceList.read_names() + complexity of InstanceList.read_data().
Parameters:
file - The root name of the file to be loaded into the InstanceList.
namesExtension - The file extension for the schema file.
dataExtension - The file extension for the data file.

InstanceList

public InstanceList(Schema catSchema,
                    java.lang.String file,
                    java.lang.String namesExtension,
                    java.lang.String testExtension)
Constructor.
Parameters:
catSchema - The schema of categories for these data sets.
file - The root name of the file to be loaded into the InstanceList.
namesExtension - The file extension for the schema file.
testExtension - The file extension for the test file.

InstanceList

public InstanceList(Schema catSchema)
Constructor.
Parameters:
catSchema - The schema of categories for these data sets.

InstanceList

public InstanceList(Schema catSchema,
                    FileSchema names,
                    java.lang.String testName)
Constructor.
Parameters:
catSchema - The schema of categories for these data sets.
names - The schema of attributes for these data sets.
testName - The file name for the test file.

InstanceList

public InstanceList(InstanceList source)
Constructor.
Parameters:
source - The InstanceList that is being copied.

InstanceList

public InstanceList(InstanceList source,
                    boolean preserveCounters)
Copy constructor.
Parameters:
source - The InstanceList object to be copied.
preserveCounters - TRUE if counters of values should be copied, FALSE otherwise.

InstanceList

public InstanceList(InstanceList trainList,
                    java.lang.String testName)
Build an instance list which is designed to be a test list for some other training set. The training set must have been built with a FileSchema which will now be used to interpret the test data.
Parameters:
trainList - The training InstanceList that will be used to identify Schema for test data set.
testName - The name of the file containing the test data set.
Method Detail

has_counters

public boolean has_counters()
Checks if this InstanceList has a set of bagcounters yet.
Returns:
False if BagCounters is set to null, True otherwise.

counters

public BagCounters counters()
Creates and fills bagCounters.
Returns:
The BagCounters object created.

ensure_counters

public void ensure_counters()
Fills bagCounters by adding all instances into it.

read_data

public void read_data(java.lang.String file,
                      boolean isTest)
Reads the data from the supplied file. InstanceList.read_data() takes time proportional to the number of instances * the complexity of read_data_line() + complexity of free_instances().
Parameters:
file - The name of the file containing the data set.
isTest - Indicator of whether this is a test data set. True indicates this is a test data set, False otherwise.

remove_inst_with_unknown_attr

public void remove_inst_with_unknown_attr()
Removes all instances that have unknown attributes from the data set.

remove_instance

public void remove_instance(java.util.ListIterator pix,
                            Instance instance)
Removes the specified Instance from the ListIterator of Instances supplied.
Parameters:
pix - The ListIterator containing the Instance.
instance - The Instance to be removed.

remove_all_instances

public void remove_all_instances()
Removes all Instance objects stored in this InstanceList object.

num_instances

public int num_instances()
Returns the number of instances in the InstanceList. InstanceList.num_instances() takes time proportional to the number of instances in the List.
Returns:
An integer value of the number of Instances contained in this list.

num_categories

public int num_categories()
Returns the number of categories that the instances in the List can have. Only works if the Label is of a nominal attribute.
Returns:
An integer value of the number of categories.

nominal_label_info

public NominalAttrInfo nominal_label_info()
Returns the nominal label information contained in this InstanceList's schema.
Returns:
The information on the nominal labels contained in the schema.

label_info

public AttrInfo label_info()
Returns the label information contained in this InstanceList's schema.
Returns:
The information on the labels contained in the schema.

no_instances

public boolean no_instances()
Checks if this InstanceList contains Instances.
Returns:
Returns True if there are no Instances in this InstanceList, False otherwise.

instance_list

public java.util.LinkedList instance_list()
Returns the list of Instances stored in this InstanceList.
Returns:
A LinkedList containing the Instances sotred in this InstanceList.

project_in_place

public void project_in_place(boolean[] projMask)
                      throws java.lang.CloneNotSupportedException
This function is very similar to project(), except that the list is projected "in place"--attributes are removed directly from the list and the schema is updated.
Parameters:
projMask - An array of boolean values representing which attributes shall be use in this InstanceList object. Values of projMask are related by order to the atributes. Values of TRUE indicate that attribute will be used, FALSE indicates the attribute will not be used.
Throws:
java.lang.CloneNotSupportedException - if the cloning process in Schema encounters an exception.

set_schema

public void set_schema(Schema schemaRC)
                throws java.lang.CloneNotSupportedException
Clones the supplied Schema and sets this InstanceList object to use it.
Parameters:
schemaRC - The Schema object to be cloned into this InstanceList object.
Throws:
java.lang.CloneNotSupportedException - if the cloning process of the Schema object encounters an error.

attr_info

public AttrInfo attr_info(int attrNum)
Returns the information about a specific attribute stored in this InstanceList.
Parameters:
attrNum - The number of the attribute about which information is requested.
Returns:
An AttrInfo containing the information about the attribute.

num_attr

public int num_attr()
Returns the number of attributes in the InstanceList.
Returns:
An integer representing the number of attributes used for each instance in this InstanceList.

get_max_attr_vals

public static int get_max_attr_vals()
Returns the maximum number of attributes that can be used for Instances in this InstanceList.
Returns:
The maximum number of attributes.

get_max_label_vals

public static int get_max_label_vals()
Returns the maximum number of labes that can be used to categorize Instances in this InstanceList.
Returns:
The maximum number of labels.

set_max_attr_vals

public static void set_max_attr_vals(int maxVals)
Sets the maximum number of attributes for Instances in this InstanceList.
Parameters:
maxVals - The maximum number of attributes allowed for Instances in this InstanceList.

set_max_label_vals

public static void set_max_label_vals(int maxVals)
Sets the maximum number of labesl for Instances to be categorized as in this InstanceList.
Parameters:
maxVals - The maximum number of labels that instances may be categorized as.

get_schema

public Schema get_schema()
Returns the Schema for this InstanceList.
Returns:
The Schema of Instances in this InstanceList.

get_original_schema

public FileSchema get_original_schema()
Returns the FileSchema loaded into this InstanceList.
Returns:
The FileSchema for this InstanceList.

init_max_vals

public void init_max_vals()
Sets the maximum values for attributes and labels in this InstanceList according to the MLJ options stored in the MLJ-options file. These options are MAX_ATTR_VALS and MAX_LABEL_VALS.

display

public void display(boolean normalizeReal)
Displays the Instances stored in this InstanceList. InstanceList.display() takes time proportional to the number of instances * the number of attributes per instance.
Parameters:
normalizeReal - TRUE if the Instances should be normalized according to the min/max stored for real attributes. If min equals max, values are normalized to .5.

is_weighted

public boolean is_weighted()
Checks if Instances stored in this InstanceList are weighted.
Returns:
TRUE if the Instances are weighted, FALSE otherwise.

reader_add_instance

public Instance reader_add_instance(AttrValue[] vals,
                                    AttrValue labelVal,
                                    double weight,
                                    boolean allowUnknownLabels)
Adds a new instance to the list, using the structures maintained by InstanceReader. Properly updates both schemas so that automatic instance removal will work.
Parameters:
vals - The values of the instance to be added.
labelVal - The label value of the instance to be added.
weight - The weight of the instance to be added.
allowUnknownLabels - TRUE if unknown label values are allowed for the instance to be added.
Returns:
A new Instance object containing the supplied information.

add_instance

public java.util.ListIterator add_instance(Instance instance)
Adds the specified Instance to this InstanceList.
Parameters:
instance - The Instance to bo added.
Returns:
A ListIterator of all Instances in this InstanceList.

update_for_overflows

public void update_for_overflows(boolean[] projMask)
Updates the list by removing specified attributes. This is similar to the project() call, except that it is designed to be used WHILE READING. The size of the projMask may be larger than the number of attributes in the schema. This is to allow InstanceReader to maintain a single copy of the projMask even as the schema shrinks.
Parameters:
projMask - A boolean array with the same number of values as there are attributes. Each boolean element coresponds to an attribute In the order they were input. True values represent attributes that are used.

get_distribution_order

public int[] get_distribution_order()
Returns the tiebreaking distribution order stored in the CatDist object for this InstanceList.
Returns:
The tiebreaking order.

total_weight

public double total_weight()
Returns the sum of the weights of all Instances in the InstanceList. This value is cached for faster access.
Returns:
The sum of weights for all Instances stored in this InstanceList.

total_weight

public double total_weight(boolean recalculate)
Returns the sum of the weights of all Instances in the InstanceList. This value is cached for faster access, but can be recalculated to avoid the numerical instabilities involved in weight updates.
Parameters:
recalculate - TRUE if the sum should be recalculated, FALSE if the cached value should be used.
Returns:
The sum of weights for all Instances stored in this InstanceList.

get_weight

public double get_weight(Instance instance)
Returns the weight for the specified Instance.
Parameters:
instance - The Instance for which weight is questioned.
Returns:
The weight for the Instance supplied.

drop_counters

public void drop_counters()
Deletes the counters stored for Instances in this InstanceList.

normalize_weights

public void normalize_weights()
Normalize all weights by the number of instances in the list. After this operation, totalWeight should equal the number of instances. The normalization factor is 1 and zeros are allowed for Instance weights.

normalize_weights

public void normalize_weights(double normFactor)
Normalize all weights by the number of instances in the list, times an optional normalization factor. After this operation, totalWeight should equal the number of instances * the normalization factor. Zeros are allowed for Instance weights.
Parameters:
normFactor - The normalization factor.

normalize_weights

public void normalize_weights(double normFactor,
                              boolean allowZeros)
Normalize all weights by the number of instances in the list, times an optional normalization factor. After this operation, totalWeight should equal the number of instances * the normalization factor.
Parameters:
normFactor - The normalization factor.
allowZeros - TRUE if zeros are allowed for Instance weights. If FALSE, Instance weights that are approximately equal 0, the weight is automatically reset to a lower bound.

transpose

public RealAndLabelColumn[] transpose(boolean[] mask)
Splits the InstanceList into several RealAndLabelColumn structures for the parallel discretization.
Parameters:
mask - Boolean array of the same length as the number of attributes. TRUE values indicate that attribute should have a RealAndLabelColumn object created for it, FALSE otherwise.
Returns:
An array of RealAndLabelColumns generated from the attribute values for the Instances stored in this InstanceList.

no_weight

public boolean no_weight()
Checks if the total weight of this InstanceList is approximately 0.
Returns:
TRUE if the total weight is approximately equal to 0, FALSE otherwise.

majority_category

public int majority_category(int[] tieBreakingOrder)
Returns the Category corresponding to the label that occurs most frequently in the InstanceList. In case of a tie, we prefer the given tieBreaker if it is one of those tied. TieBreaker can be UNKNOWN_CATEGORY_VAL if you prefer the earlier category to the tied ones. The method used differs depending on whether or not we have counters on this List. It is considerably faster if the counters are present. This method is only meaningful for labels with AttrInfo derived from NominalAttrInfo. This method will cause fatal_error otherwise. In the case of a tie, returns the Category corresponding to the label which occurs first in the NominalAttrInfo. InstanceList.majority_category() takes time proportional to the number of different categories + the number of instances.
Parameters:
tieBreakingOrder - Array indicating the order in which ties are broken. The array should be the same length as the number of attributes and each element corresponds to an attribute. Lower number elements represent attributes that are more favorable in an tie than higher number elements.
Returns:
The category that occurs the most in this InstanceList or UNKNOWN_CATEGORY_VAL if there are no instances.

project

public InstanceList project(boolean[] attrMask)
This function takes an attribute mask which is an array of booleans indicating whether the corresponding attribute should be included in the projection. InstanceList.project() takes O(num attributes * (num instances + num attributes)) time.
Parameters:
attrMask - A boolean array with the same number of values as there are attributes. Each boolean element corresponds to an attribute in the order they were input. True values represent attributes that are used.
Returns:
An InstanceList with a new Schema that includes only the attributes with a mask value true. May return null if an exception occures.

out

public java.lang.String out(boolean normalizeReal)
Returns a String representation of this InstanceList object.
Parameters:
normalizeReal - TRUE if real values in an Instance object should be normalized.
Returns:
A String representation of this InstanceList object.

clone

public java.lang.Object clone(boolean preserveCounters)
Returns a clone of this InstanceList object.
Parameters:
preserveCounters - TRUE if counters of values should be copied, FALSE otherwise.
Returns:
A new object with a copy of the data stored in the supplied InstanceList.

clone

public java.lang.Object clone()
Returns a clone of this InstanceList object. Does not preserve counters.
Overrides:
clone in class java.lang.Object
Returns:
A new object with a copy of the data stored in the supplied InstanceList.

OK

public void OK()
Checks integrity constraints. We verify that all instances have the same schema at level 0 Comments : Because the schema has attrinfo's that are updated, everyone must share the EXACT representation, not just logical equivalence. Specifically, if the schema is updated, we want to make sure all instances see the exact same min/max for RealAttrInfo's. Level of checking is automatically set to 0.

OK

public void OK(int level)
Checks integrity constraints. We verify that all instances have the same schema at level 0 Comments : Because the schema has attrinfo's that are updated, everyone must share the EXACT representation, not just logical equivalence. Specifically, if the schema is updated, we want to make sure all instances see the exact same min/max for RealAttrInfo's
Parameters:
level - Level of checking done.

display_names

public void display_names(java.io.Writer stream,
                          boolean protectChars,
                          java.lang.String header)
Displays the names file associated with the InstanceList.
Parameters:
stream - Writer object to which the names file will be displayed.
protectChars - TRUE if protected characters are used, FALSE otherwise.
header - A String to use for the header to the display.

split_by_label

public InstanceList[] split_by_label()
Split the InstanceList according to the labels. Each InstanceList corresponds to all instances having one label value. Works only for nominal labels. If a label value never appears in the training set, the corresponding InstanceList will be empty.
Returns:
An array of InstanceList where each InstanceList contains Instances with a particular label value.

sample_with_replacement

public InstanceList sample_with_replacement(int size,
                                            InstanceList restOfInstList,
                                            java.util.Random mrandom)
Sample_with_replacement takes an independent sample of the instance list, with replacement. The parameter size is the number of samples to take, which is generally equal to num_instances() for bootstrap. size must be greater than 0, and can be greater than num_instances(). If restOfInstList is non-null, the unused instances are inserted into it. If mrandom is non-null, it is used as the random number generator, otherwise a new one is created, used, and destroyed. The caller gains ownership of the returned list, and should already own restOfInstList, if non-null.
Parameters:
size - The size of the list of samples requested.
restOfInstList - An InstanceList object containing the Instances not sampled.
mrandom - Random number generator for randomly sampling Instances.
Returns:
An InstanceList object containing randomly sampled Instances.

shuffle

public InstanceList shuffle(java.util.Random mrandom,
                            Instance[] index,
                            boolean keepFileSchema)
Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances. The Random parameter allows for duplication of results.
Parameters:
mrandom - Random number generator used for shuffling.
index - Array of Instances being shuffled.
keepFileSchema - TRUE if the FileSchema for this InstanceList object should be copied to the InstanceList with the shuffled Instances.
Returns:
An InstanceList object with a shuffled order of Instances.

shuffle

public InstanceList shuffle(java.util.Random mrandom,
                            Instance[] index)
Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances. The Random parameter allows for duplication of results. The FileSchema is not copied to the new InstanceList object.
Parameters:
mrandom - Random number generator used for shuffling.
index - Array of Instances being shuffled.
Returns:
An InstanceList object with a shuffled order of Instances.

shuffle

public InstanceList shuffle(java.util.Random mrandom)
Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances. The Random parameter allows for duplication of results. The FileSchema is not copied to the new InstanceList.
Parameters:
mrandom - Random number generator used for shuffling.
Returns:
An InstanceList object with a shuffled order of Instances.

shuffle

public InstanceList shuffle()
Returns a reference to an InstanceList that has the same contents as this InstanceList, with a random ordering of the instances. The Random parameter allows for duplication of results. The FileSchema is not copied to the new InstanceList.
Returns:
An InstanceList object with a shuffled order of Instances.

split_prefix

public InstanceList split_prefix(int numInSplit,
                                 boolean keepFileSchema)
Returns a list with the first "numInSplit" instances removed from this list. If keepFileSchema is TRUE, the FileSchema will be copied into the new list.
Parameters:
numInSplit - The number of Instances to be split from this InstanceList object.
keepFileSchema - TRUE if the FileSchema should be copied to the new InstanceList, FALSE otherwise.
Returns:
Returns an InstanceList with the first "numInSplit" Instances.

split_prefix

public InstanceList split_prefix(int numInSplit)
Returns a list with the first "numInSplit" instances removed from this list. The FileSchema is not copied to the new InstanceList.
Parameters:
numInSplit - The number of Instances to be split from this InstanceList object.
Returns:
Returns an InstanceList with the first "numInSplit" Instances.

remove_front

public Instance remove_front()
Returns an InstanceRC corresponding to the first instance in the list. The instance is deleted from the list.
Returns:
The Instance removed from this InstanceList object.

independent_sample

public InstanceList independent_sample(int size,
                                       java.util.Random mrandom,
                                       Instance[] index)
Returns a reference to an InstanceList with "size" instances randomly sampled (without replacement) from this InstanceList. The MRandom parameter allows for duplication of results.
Parameters:
size - The number of Instances requested in the sample.
mrandom - Random number generator for randomly selecting Instances.
index - Array of Instances to be sampled from.
Returns:
An InstanceList object containing the randomly sampled Instances.

independent_sample

public InstanceList independent_sample(int size,
                                       java.util.Random mrandom)
Returns a reference to an InstanceList with "size" instances randomly sampled (without replacement) from this InstanceList. The MRandom parameter allows for duplication of results.
Parameters:
size - The number of Instances requested in the sample.
mrandom - Random number generator for randomly selecting Instances.
Returns:
An InstanceList object containing the randomly sampled Instances.

independent_sample

public InstanceList independent_sample(int size)
Returns a reference to an InstanceList with "size" instances randomly sampled (without replacement) from this InstanceList.
Parameters:
size - The number of Instances requested in the sample.
Returns:
An InstanceList object containing the randomly sampled Instances.

independent_sample

public InstanceList independent_sample(int size,
                                       InstanceList restOfInstList,
                                       java.util.Random mrandom,
                                       Instance[] index)
Returns a reference to an InstanceList with 'size' instances and another reference to an InstanceList with the rest of the instances.
Parameters:
size - The number of Instances requested in the sample.
restOfInstList - The InstanceList containing Instances not contained in the sample.
mrandom - Random number generator for randomly selecting Instances.
index - Array of Instances to be sampled from.
Returns:
An InstanceList object containing the randomly sampled Instances.

independent_sample

public InstanceList independent_sample(int size,
                                       InstanceList restOfInstList,
                                       java.util.Random mrandom)
Returns a reference to an InstanceList with 'size' instances and another reference to an InstanceList with the rest of the instances.
Parameters:
size - The number of Instances requested in the sample.
restOfInstList - The InstanceList containing Instances not contained in the sample.
mrandom - Random number generator for randomly selecting Instances.
Returns:
An InstanceList object containing the randomly sampled Instances.

independent_sample

public InstanceList independent_sample(int size,
                                       InstanceList restOfInstList)
Returns a reference to an InstanceList with 'size' instances and another reference to an InstanceList with the rest of the instances.
Parameters:
size - The number of Instances requested in the sample.
restOfInstList - The InstanceList containing Instances not contained in the sample.
Returns:
An InstanceList object containing the randomly sampled Instances.

create_inst_list_index

public Instance[] create_inst_list_index()
Returns a reference to an array of references to the instances in the list. This is used for independent_sample() and shuffle().
Returns:
An array of references to all Instance objects in this object.

unite

public void unite(InstanceList instList)
Appends the instances from the given list to this list. Gets ownership of and deletes the given list.
Parameters:
instList - The supplied InstanceList object to be appended to this object.

listIterator

public java.util.ListIterator listIterator()
Appends the instances from the given list to this list. Gets ownership of and deletes the given list.
Returns:
A ListIterator containing the Instance objects stored in this InstanceList.