Introduction: ------------- BNJ stands for Bayesian Network tools in JAVA. It is an open-source software development toolkit for research in probabilistic learning and inference using Bayesian networks. BNJ is developed by the probabilistic reasoning group of KDD Lab in the Computing and Information Sciences (CIS) Department at Kansas State University. The release of BNJ is governed by the GNU Public License. A Bayesian Network, or Bayesian Belief Network (BBN), is a concise representation of a joint probability distribution defined on a finite set of random variables. It is a directed acyclic graph (DAG) in which nodes represent random variables and arcs represent probabilistic dependencies among the variables. A conditional probability distribution is associated with each node and describes the dependency between the node and its parents. The networks are most often used in expert systems that reason under uncertainty. There are two main research problems in probabilistic reasoning using Bayesian networks: learning and inference. Learning the Bayesian network from data is automatically constructing the network from data using some learning algorithms such as K2. Bayesian network inference invovles computing the posterior marginal probabilities of some query nodes, P(Q|E), and computing the most probable explanation (MPE) given the values of some observed query nodes. Both bayesian network learning and inference have been proven to be NP-hard in general. BNJ aims to provide researchers and developers a useful toolkit for Bayesian network representation, learning, and inference. Several popular Bayesian network learning and inference algorithms have been implmented and included into BNJ. The first release consists of some core classes for representing main data structures, a graphic Bayesian network editor for loading and manipulating the network, a learning algorithm K2, an exact inference algorithm (the clique-tree propagation algorithm by Lauritzen and Spiegelhalter 1988), several stochastic sampling algorithms, and some other useful utilities including a network format converter, data generator to simulate the network, and a simple DAG layout method. Contents: --------- The current version of BNJ, 1.0 alpha (04 May 2002 build), contains the following modules for working with Bayesian networks with discrete chance nodes: - Bayesian network core classes that model the structure of a Bayesian network - XML Bayesian Network Interchange Format (XMLBIF) converter that converts networks of other formats to and from XMLBIF (1.0 beta) - Graphical editor for editting the network and interchanging formats - K2 algorithm for learning the structure of a Bayesian network from data - Lauritzen-Spiegelhalter (LS) algorithm for exact inference on a Bayesian network - Importance sampling algorithms evaluated using exact root mean squared error including logic sampling, likelihood weighting, self-importance sampling, adaptive importance sampling - GASLEAK, a research genetic algorithm (GA)-based wrapper that searches for inferential loss-minimizing orderings of variables in Bayesian network structure learning Sample Networks: ---------------- Two sample networks are used in this manual to demonstrate the usage of BNJ modules: Asia and ALARM. Asia is an eight-node binary network. ALARM has 37 nodes and 46 edges. The default file format is xmlbif format. Followed is the content of asia.xml. ===============================asia.xml============================================ ]> bayesiannetwork VisitAsia Visit No_Visit position = (166, 79) Tuberculosis Present Absent position = (166, 175) Smoking Smoker NonSmoker position = (520, 79) Cancer Present Absent position = (418, 175) TbOrCa True False position = (310, 271) XRay Abnormal Normal position = (178, 361) Bronchitis Present Absent position = (658, 175) Dyspnea Present Absent position = (454, 361) VisitAsia 0.01 0.99
Tuberculosis VisitAsia 0.05 0.01 0.95 0.99
Smoking 0.5 0.5
Cancer Smoking 0.1 0.01 0.9 0.99
TbOrCa Tuberculosis Cancer 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0
XRay TbOrCa 0.98 0.05 0.02 0.95
Bronchitis Smoking 0.6 0.3 0.4 0.7
Dyspnea TbOrCa Bronchitis 0.9 0.7 0.8 0.1 0.1 0.3 0.2 0.9
===============================asia.xml============================================ Modules: -------- ********* * BBN * ********* Description: The package bbn consists of core classes for representing the data structure of a Bayesian network. BBN.java and Node.java are used to represent a Bayesian network. NodeManager.java helps handle evidence for inference. Some useful utilities are also included in this package such as datageneator. Examples: (example 1) Load the network from xmlbif file and print out the information of each node. The following example shows how to load a Bayesian network from file, and how to access its node information. To run this example, create the file BBNExample.java as follows and make sure the network file asia.xml is under the same directory. ==============================BBNExample.java===================================== import bbn.*; import java.util.Vector; public class BBNExample { public static void main(String[] args){ System.out.println("load the network from file ... "); BBN network = new BBN(); network.loadFromXML("asia.xml"); System.out.println("printing out information of each node... "); for (int i = 0; i < network.size(); i++){ System.out.println("i="+i); network.getNodeAt(i).print(); network.getNodeAt(i).printStateNames(); if(network.getNodeAt(i).getParents().size()>0){ for(int j=0;j "+network.getNodeAt(i).getParentName(j)); } System.out.println("probabilities in CPT --> "+network.getNodeAt(i).getProbabilities()); System.out.println("--------"); } } } =============================Result of BBNExample======================================== D:\bnj>javac BBNExample.java D:\bnj>java BBNExample load the network from file ... printing out information of each node... i=0 For node VisitAsia: Visit No_Visit probabilities in CPT --> [0.01, 0.99] -------- i=1 For node Tuberculosis: Present Absent parents --> VisitAsia probabilities in CPT --> [0.05, 0.01, 0.95, 0.99] -------- i=2 For node Smoking: Smoker NonSmoker probabilities in CPT --> [0.5, 0.5] -------- i=3 For node Cancer: Present Absent parents --> Smoking probabilities in CPT --> [0.1, 0.01, 0.9, 0.99] -------- i=4 For node TbOrCa: True False parents --> Tuberculosis parents --> Cancer probabilities in CPT --> [1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0] -------- i=5 For node XRay: Abnormal Normal parents --> TbOrCa probabilities in CPT --> [0.98, 0.05, 0.02, 0.95] -------- i=6 For node Bronchitis: Present Absent parents --> Smoking probabilities in CPT --> [0.6, 0.3, 0.4, 0.7] -------- i=7 For node Dyspnea: Present Absent parents --> TbOrCa parents --> Bronchitis probabilities in CPT --> [0.9, 0.7, 0.8, 0.1, 0.1, 0.3, 0.2, 0.9] -------- ================================================================ ****************** * DataGenerator * ****************** Description: DataGenerator can be used to simulate the network and generate a data file from the network. The output data file can be used as the training dataset of any learning algorithm. The simulation process is done by forward sampling, i.e., first simulating root nodes, then simulating their parents, and so on. It is included in package bbn. DataGenerator2.java is an optimized version of DataGenerator.java. The output data file is stored in an xml file format. DataGenerator is also used to generate the evidence file if the number of samples is set to zero. Examples: (example 2) Simulate the network and generate a data file from the network. The following example shows how to simulate the network and generate a data file with 20 samples, named asia20.xml, from the network. =============================Example of using DataGenerator======================================== D:\bnj>java bbn/DataGenerator2 * * * DataGenerator2 * * * Usage: java DataGenerator2 input.xml output.xml number_of_samples (0 will generat e an evdience file) D:\bnj>java bbn/DataGenerator2 asia.xml asia20.xml 20 * * * DataGenerator * * * Usage: java DataGenerator input.xml output.xml number_of_samples (0 will generat e an evdience file) Success! D:\bnj> =============================Result of BBNExample: asia20.xml======================================= ]> InternalNetwork VisitAsia Visit No_Visit position = (166, 79) Smoking Smoker NonSmoker position = (520, 79) Tuberculosis Present Absent position = (166, 175) Cancer Present Absent position = (418, 175) TbOrCa True False position = (310, 271) XRay Abnormal Normal position = (178, 361) Bronchitis Present Absent position = (658, 175) Dyspnea Present Absent position = (454, 361) 20 No_Visit Smoker Absent Absent False Abnormal Absent Absent No_Visit Smoker Absent Present True Abnormal Absent Absent No_Visit NonSmoker Absent Absent False Normal Absent Absent No_Visit Smoker Absent Absent False Normal Present Present No_Visit Smoker Absent Absent False Normal Present Present No_Visit NonSmoker Absent Absent False Normal Absent Absent No_Visit Smoker Absent Absent False Normal Present Absent Visit Smoker Absent Absent False Normal Absent Absent No_Visit NonSmoker Absent Absent False Normal Absent Absent No_Visit NonSmoker Absent Absent False Normal Present Present No_Visit Smoker Absent Absent False Normal Present Present No_Visit NonSmoker Absent Absent False Normal Absent Present No_Visit Smoker Absent Absent False Normal Absent Absent No_Visit Smoker Absent Absent False Normal Present Present No_Visit NonSmoker Absent Absent False Normal Present Present No_Visit NonSmoker Absent Absent False Normal Absent Absent No_Visit Smoker Absent Absent False Normal Present Present No_Visit NonSmoker Absent Absent False Abnormal Present Present No_Visit Smoker Absent Absent False Abnormal Present Present No_Visit NonSmoker Absent Absent False Normal Absent Absent =============================End of BBNExample: asia20.xml======================================== The following example shows how to generate an evidence file using DataGenerator. The evidence file is useful for inference. (example 3) use dataGenerator to generate evidence file =============================Example of using DataGenerator======================================== D:\bnj>java bbn/DataGenerator2 asia.xml asia_evidence.xml 0 * * * DataGenerator2 * * * Usage: java DataGenerator2 input.xml output.xml number_of_samples (0 will genera te an evdience file) Success! D:\bnj> =============================Result evidence file : asia_evidence.xml=================== 0 0 0 0 0 0 0 0 VisitAsia|Smoking|Tuberculosis|Cancer|TbOrCa|XRay|Bronchitis|Dyspnea Visit NonSmoker Absent Present False Normal Present Present =============================end of evidence file======================================= You can modify the evidence bits at the first line to set evidence nodes(0 means query, 1 means evidence). ********* * K2 * ********* Description: K2 is a score based greedy search algorithm for learning Bayesian networks from data. It was published in Cooper, G. and Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9, 309-347 (1992) K2 uses a Bayesian score, P(Bs, D), to rank different structures and it uses a greedy search algorithm to maxmize P(Bs, D). The main classes in package k2 are K2Manager.jaav and K2.java. Class K2Manager loads the input training data. Class K2 implements the learning procedure. Inputs to K2 include: 1 a set of nodes, 2 an ordering on the nodes, 3 an upper bound u on the number of parents a node may have 4 a database D containing m cases of training data Output of K2: for each node, K2 returns its most probable parents given the training data D. In our implementation, the set of nodes and the database D are stored in the input training data file in xml format. The upper bound u is set to a fixed value 5 in K2.java. The required ordering can be input from command line. If no ordering is given at command line, 0 - n is assumed according to the node ordering in the input data file. The output of K2 is the learned network in XMLBIF format. You can use the network editor to open and see its structure. Examples: (example 4) use K2 to learn a Bayesian network from training data. The following example shows how to use K2 to learn a network from training data. To run K2, first we need to prepare the input training data. In this example, we first use DataGenerator to generate the training data, then we run K2 on this training dataset to recover(learn) the network from data. The =============================Example of K2======================================== D:\bnj>java bbn/DataGenerator2 asia.xml asia1500.xml 1500 * * * DataGenerator2 * * * Usage: java DataGenerator2 input.xml output.xml number_of_samples (0 will genera te an evdience file) Success! D:\bnj>java k2/K2 asia1500.xml learned.xml Usage: java k2.K2 input.xml output.xml [order] (if no order is given, 0 - n is assumed). [loaded network and training samples] [Network learned.] [Finished.] D:\bnj> ==========================Result of K2, the learned network: learned.xml============= ]> InternalNetwork VisitAsia Visit No_Visit position = (166, 79) Smoking Smoker NonSmoker position = (520, 79) Tuberculosis Present Absent position = (166, 175) Cancer Present Absent position = (418, 175) TbOrCa True False position = (310, 271) XRay Abnormal Normal position = (178, 361) Bronchitis Present Absent position = (658, 175) Dyspnea Present Absent position = (454, 361) VisitAsia 0.012 0.988
Smoking 0.5026666666666667 0.49733333333333335
Tuberculosis VisitAsia 0.1111111111111111 0.008097165991902834 0.8888888888888888 0.9919028340080972
Cancer Smoking 0.11538461538461539 0.00804289544235925 0.8846153846153846 0.9919571045576407
TbOrCa Cancer Tuberculosis 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0
XRay TbOrCa 0.9532710280373832 0.04737975592246949 0.04672897196261682 0.9526202440775305
Bronchitis Smoking VisitAsia 0.4 0.581989247311828 0.625 0.3089430894308943 0.6 0.41801075268817206 0.375 0.6910569105691057
Dyspnea Bronchitis TbOrCa 0.875 0.7805280528052805 0.6744186046511628 0.09911054637865312 0.125 0.21947194719471946 0.32558139534883723 0.9008894536213469
==========================end of learned.xml==================================================================== ******************** * Network Editor * ******************** Network editor is a graphical Bayesian network editor. Users can use it to open Bayesian networks of up to 9 file formats including (.bif, .xml, .net, .xbn, .dsc, .dsl, .dnet, .ENT, .ideal). After viewing and editing the network, users can save the network into 3 different file format: .xml, .bif, and .net. BIF .bif --- Bayesian networks Interchange Format (http://www-2.cs.cmu.edu/~fgcozman/Research/InterchangeFormat/Old/xmlbif02.html) XMLBIF .xml --- XML Belief Network Interchange File Format (http://www-2.cs.cmu.edu/afs/cs/user/fgcozman/www/Research/InterchangeFormat/) Microsoft XBN .xbn --- XBN File Format (http://www.research.microsoft.com/research/dtg/bnformat/default.htm) Microsoft .dsc --- MSBN Page (http://www.research.microsoft.com/adapt/MSBNx/) Hugin format .net --- the net language (http://developer.hugin.com/documentation/net/Net_Language.article) Genie format .dsl --- Genie home (http://www.kddresearch.org/Groups/Probabilistic-Reasoning/www2.sis.pitt.edu/~genie) Ergo format .ENT --- Ergo by Noetic Systems (http://www.noeticsystems.com/ergo.shtml) Netica format .dnet --- Netica by Norsys (http://www.norsys.com/) IDEAL format .ideal --- IDEAL format by Rockwell (http://www.rpal.rockwell.com/ideal.html) The main IO functions are implemented in FileIO.java file, i.e. FileIO.load() method and FileIO.Save() method. The following example shows how to start up the network editor. Examples: (example 5) ==========================EditorExample.java==================================================================== import bbn.*; public class EditorExample { public static void main(String[] args) { Debug.println("xbneditor..."); XBN editor = new XBN(); } } ==========================end of EditorExample.java==================================================================== ******************* * Converter * ******************* Network converter can convert Bayesian network files from 9 different formats(.bif, .xml, .net, .xbn, .dsc, .dsl, .dnet, .ENT, .ideal) to 3 formats(.xml, .bif, .net). Last section shows how to convert network formats using the network editor. This section gives examples of converting network formats at command line. Usage: java xbneditor/BBNConvertor [sourcefilename] [targetfilename] The following example shows how to convert asia.xml to asia.net. ==========================BBNConvertor Example==================================================================== D:\bnj>java xbneditor/BBNConvertor asia.xml asia.net informat = xml outformat = net load XMLBIF .xml file ... load variables ... stateList ******* [Visit, No_Visit] stateList ******* [Present, Absent] stateList ******* [Smoker, NonSmoker] stateList ******* [Present, Absent] stateList ******* [True, False] stateList ******* [Abnormal, Normal] stateList ******* [Present, Absent] stateList ******* [Present, Absent] load ProbabilityDistributions ... blockName = "VisitAsia" parentList = [] probabilitiesList = 0.01 0.99 count = 2 blockName = "Tuberculosis" parentList = [VisitAsia] probabilitiesList = 0.05 0.01 0.95 0.99 count = 4 blockName = "Smoking" parentList = [] probabilitiesList = 0.5 0.5 count = 2 blockName = "Cancer" parentList = [Smoking] probabilitiesList = 0.1 0.01 0.9 0.99 count = 4 blockName = "TbOrCa" parentList = [Tuberculosis, Cancer] probabilitiesList = 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 count = 8 blockName = "XRay" parentList = [TbOrCa] probabilitiesList = 0.98 0.05 0.02 0.95 count = 4 blockName = "Bronchitis" parentList = [Smoking] probabilitiesList = 0.6 0.3 0.4 0.7 count = 4 blockName = "Dyspnea" parentList = [TbOrCa, Bronchitis] probabilitiesList = 0.9 0.7 0.8 0.1 0.1 0.3 0.2 0.9 count = 8 outputfiletype =net convert to Hugin .net format! D:\bnj> ==========================end of BBNConvertor Example====================================================================