shared
Class FileSchema

java.lang.Object
  |
  +--shared.FileSchema

public class FileSchema
extends java.lang.Object

This class represents an MLC++ names file. The FileSchema's main task is to interpret the values in a .data file. Currently, a FileSchema maintains a raw list of attribute infos or COLUMNS, information about which columns should represent label or weight values in the final schema, and an optional loss matrix.

FileSchemas may be created from a names file, or from a preexisting array of attribute infos which may be built programmatically. The label column, weight column, and loss matrix may all be set programatically.

At any time, a standard MLC++ Schema may be created from the FileSchema through the create_schema() function.

Displaying a FileSchema will do so in the same format used to read FileSchemas from names files.


Field Summary
static byte adefault
          LossKeyword value.
static byte distance
          LossKeyword value.
static int MAX_INPUT_STRING_SIZE
          Maximum size for a String value.
static byte nodefault
          LossKeyword value.
static byte nomatrix
          LossKeyword value.
static byte sectionCharacter
          Byte value indicating a character is an alpha-numerical character.
static byte sectionDelimiter
          Byte value indicating a character is a section delimeter.
static byte sectionEscape
          Byte value indicating an end-of-file character has been reached.
 
Constructor Summary
FileSchema(FileSchema other)
          Copy constructor.
FileSchema(java.lang.String namesFile)
          Constructor.
 
Method Summary
 void apply_loss_spec(Schema s)
          Apply the loss specification stored in this FileSchema to the given schema.
 Schema create_schema()
          Create an MLJ style schema from all the information stored in this class.
 void display()
          Display this FileSchema.
 int find_attribute(java.lang.String[] name, boolean fatalOnNotFound)
          Find an attribute in the file schema by name.
 boolean get_ignore_weight_column()
          Returns TRUE if the weight column is to be ignored, FALSE otherwise.
 int get_label_column()
          Returns the column number of the column containing labels.
 int get_weight_column()
          Returns the column number of the column containing weight values.
 int num_attr()
          Returns the number of attributes in this FileSchema.
 java.lang.String read_word_on_same_line(java.io.BufferedReader stream, boolean qMark, boolean periodAllowed)
          Reads a single word from the supplied BufferedReader without crossing lines.
 java.lang.String read_word(java.io.BufferedReader stream, boolean qMark, boolean[] sameLine)
          Reads a single word from the supplied BufferedReader.
 void set_attr_info(int i, AttrInfo a)
          Set an attribute info.
 boolean skip_white_comments_same_line(java.io.BufferedReader stream)
          Skips white space and comments.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

nomatrix

public static final byte nomatrix
LossKeyword value.

nodefault

public static final byte nodefault
LossKeyword value.

adefault

public static final byte adefault
LossKeyword value.

distance

public static final byte distance
LossKeyword value.

sectionDelimiter

public static final byte sectionDelimiter
Byte value indicating a character is a section delimeter.

sectionEscape

public static final byte sectionEscape
Byte value indicating an end-of-file character has been reached.

sectionCharacter

public static final byte sectionCharacter
Byte value indicating a character is an alpha-numerical character.

MAX_INPUT_STRING_SIZE

public static final int MAX_INPUT_STRING_SIZE
Maximum size for a String value.
Constructor Detail

FileSchema

public FileSchema(java.lang.String namesFile)
Constructor.
Parameters:
namesFile - Name of the namesfile containing the schema to be used.

FileSchema

public FileSchema(FileSchema other)
Copy constructor.
Parameters:
other - The FileSchema to be copied.
Method Detail

get_ignore_weight_column

public boolean get_ignore_weight_column()
Returns TRUE if the weight column is to be ignored, FALSE otherwise.
Returns:
TRUE if weight column is to be ignored, FALSE otherwise.

get_weight_column

public int get_weight_column()
Returns the column number of the column containing weight values.
Returns:
A column number.

get_label_column

public int get_label_column()
Returns the column number of the column containing labels.
Returns:
A column number.

apply_loss_spec

public void apply_loss_spec(Schema s)
Apply the loss specification stored in this FileSchema to the given schema. The InstanceList corresponding to the schema should be fully read when this function is called, to make sure that any non-fixed nominals in the schema have all their values showing. Any InstanceList calling this function on its schema MUST call set_schema with the new schema afterwards to ensure that all instances still have the same schema.
Parameters:
s - The schema to which the loss specification is to be applied.

num_attr

public int num_attr()
Returns the number of attributes in this FileSchema.
Returns:
The number of attributes.

set_attr_info

public void set_attr_info(int i,
                          AttrInfo a)
Set an attribute info. Makes a copy of the attribute info which is passed in.
Parameters:
i - Number of the attribute.
a - Attribute information.

skip_white_comments_same_line

public boolean skip_white_comments_same_line(java.io.BufferedReader stream)
Skips white space and comments.
Parameters:
stream - Reader allowing access to the namesfile.
Returns:
TRUE if the current line contains no comments, FALSE otherwise.

read_word

public java.lang.String read_word(java.io.BufferedReader stream,
                                  boolean qMark,
                                  boolean[] sameLine)
Reads a single word from the supplied BufferedReader.
Parameters:
stream - The BufferedReader to be read from.
qMark - TRUE if question marks are an acceptable name, FALSE otherwise.
sameLine - Set to TRUE if the line has not changed in the process of reading this word, FALSE otherwise.
Returns:
The word read.

read_word_on_same_line

public java.lang.String read_word_on_same_line(java.io.BufferedReader stream,
                                               boolean qMark,
                                               boolean periodAllowed)
Reads a single word from the supplied BufferedReader without crossing lines.
Parameters:
stream - The BufferedReader to be read from.
qMark - TRUE if question marks are an acceptable name, FALSE otherwise.
periodAllowed - TRUE if periods are allowed as words, FALSE otherwise. Automatically set to FALSE in this function.
Returns:
The word read.

find_attribute

public int find_attribute(java.lang.String[] name,
                          boolean fatalOnNotFound)
Find an attribute in the file schema by name. If the attribute is not found, aborts if fatalOnNotFound is set. Otherwise returns -1. Assumes the schema has no duplicate attributes.
Parameters:
name - Name of the attribute.
fatalOnNotFound - TRUE if an error message should be displayed if there is no attribute matching that name, FALSE otherwise.
Returns:
The integer value corresponding to the attribute with the specified name or -1 if an attribute with a matching name is not found.

create_schema

public Schema create_schema()
Create an MLJ style schema from all the information stored in this class. This schema is used to create lists, use InstanceReaders, etc.
Returns:
Schema object containing information generated from this FileSchema object.

display

public void display()
Display this FileSchema. This is done in .names file format so this can be used for file conversion.