Introduction:
-------------
BNJ stands for Bayesian Network tools in JAVA. It is an open-source
software development toolkit for research in probabilistic learning and
inference using Bayesian networks. BNJ is developed by the probabilistic
reasoning group of KDD Lab in the Computing and Information Sciences (CIS)
Department at Kansas State University. The release of BNJ is governed by
the GNU Public License.
A Bayesian Network, or Bayesian Belief Network (BBN), is a concise
representation of a joint probability distribution defined on a finite set
of random variables. It is a directed acyclic graph (DAG) in which nodes
represent random variables and arcs represent probabilistic dependencies
among the variables. A conditional probability distribution is associated
with each node and describes the dependency between the node and its
parents. The networks are most often used in expert systems that reason
under uncertainty.
There are two main research problems in probabilistic reasoning using
Bayesian networks: learning and inference. Learning the Bayesian network
from data is automatically constructing the network from data using some
learning algorithms such as K2. Bayesian network inference invovles
computing the posterior marginal probabilities of some query nodes,
P(Q|E), and computing the most probable explanation (MPE) given the values
of some observed query nodes. Both bayesian network learning and inference
have been proven to be NP-hard in general.
BNJ aims to provide researchers and developers a useful toolkit for
Bayesian network representation, learning, and inference. Several popular
Bayesian network learning and inference algorithms have been implmented
and included into BNJ. The first release consists of some core classes for
representing main data structures, a graphic Bayesian network editor for
loading and manipulating the network, a learning algorithm K2, an exact
inference algorithm (the clique-tree propagation algorithm by Lauritzen
and Spiegelhalter 1988), several stochastic sampling algorithms, and some
other useful utilities including a network format converter, data
generator to simulate the network, and a simple DAG layout method.
Contents:
---------
The current version of BNJ, 1.0 alpha (04 May 2002 build), contains the
following modules for working with Bayesian networks with discrete chance
nodes:
- Bayesian network core classes that model the structure of a Bayesian
network
- XML Bayesian Network Interchange Format (XMLBIF) converter that converts
networks of other formats to and from XMLBIF (1.0 beta)
- Graphical editor for editting the network and interchanging formats
- K2 algorithm for learning the structure of a Bayesian network from data
- Lauritzen-Spiegelhalter (LS) algorithm for exact inference on a Bayesian
network
- Importance sampling algorithms evaluated using exact root mean squared
error including logic sampling, likelihood weighting, self-importance
sampling, adaptive importance sampling
- GASLEAK, a research genetic algorithm (GA)-based wrapper that searches
for inferential loss-minimizing orderings of variables in Bayesian network
structure learning
Sample Networks:
----------------
Two sample networks are used in this manual to demonstrate the usage of
BNJ modules: Asia and ALARM. Asia is an eight-node binary network. ALARM
has 37 nodes and 46 edges. The default file format is xmlbif format.
Followed is the content of asia.xml.
===============================asia.xml============================================
]>
bayesiannetwork
VisitAsia
Visit
No_Visit
position = (166, 79)
Tuberculosis
Present
Absent
position = (166, 175)
Smoking
Smoker
NonSmoker
position = (520, 79)
Cancer
Present
Absent
position = (418, 175)
TbOrCa
True
False
position = (310, 271)
XRay
Abnormal
Normal
position = (178, 361)
Bronchitis
Present
Absent
position = (658, 175)
Dyspnea
Present
Absent
position = (454, 361)
VisitAsia
Tuberculosis
VisitAsia
Smoking
Cancer
Smoking
TbOrCa
Tuberculosis
Cancer
1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0
XRay
TbOrCa
Bronchitis
Smoking
Dyspnea
TbOrCa
Bronchitis
0.9 0.7 0.8 0.1 0.1 0.3 0.2 0.9
===============================asia.xml============================================
Modules:
--------
*********
* BBN *
*********
Description:
The package bbn consists of core classes for representing the data
structure of a Bayesian network. BBN.java and Node.java are used to
represent a Bayesian network. NodeManager.java helps handle evidence for
inference. Some useful utilities are also included in this package such as
datageneator.
Examples:
(example 1) Load the network from xmlbif file and print out the
information of each node.
The following example shows how to load a Bayesian network from file, and
how to access its node information. To run this example, create the file
BBNExample.java as follows and make sure the network file asia.xml is
under the same directory.
==============================BBNExample.java=====================================
import bbn.*;
import java.util.Vector;
public class BBNExample {
public static void main(String[] args){
System.out.println("load the network from file ... ");
BBN network = new BBN();
network.loadFromXML("asia.xml");
System.out.println("printing out information of each node... ");
for (int i = 0; i < network.size(); i++){
System.out.println("i="+i);
network.getNodeAt(i).print();
network.getNodeAt(i).printStateNames();
if(network.getNodeAt(i).getParents().size()>0){
for(int j=0;j
"+network.getNodeAt(i).getParentName(j));
}
System.out.println("probabilities in CPT -->
"+network.getNodeAt(i).getProbabilities());
System.out.println("--------");
}
}
}
=============================Result of
BBNExample========================================
D:\bnj>javac BBNExample.java
D:\bnj>java BBNExample
load the network from file ...
printing out information of each node...
i=0
For node VisitAsia:
Visit
No_Visit
probabilities in CPT --> [0.01, 0.99]
--------
i=1
For node Tuberculosis:
Present
Absent
parents --> VisitAsia
probabilities in CPT --> [0.05, 0.01, 0.95, 0.99]
--------
i=2
For node Smoking:
Smoker
NonSmoker
probabilities in CPT --> [0.5, 0.5]
--------
i=3
For node Cancer:
Present
Absent
parents --> Smoking
probabilities in CPT --> [0.1, 0.01, 0.9, 0.99]
--------
i=4
For node TbOrCa:
True
False
parents --> Tuberculosis
parents --> Cancer
probabilities in CPT --> [1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0]
--------
i=5
For node XRay:
Abnormal
Normal
parents --> TbOrCa
probabilities in CPT --> [0.98, 0.05, 0.02, 0.95]
--------
i=6
For node Bronchitis:
Present
Absent
parents --> Smoking
probabilities in CPT --> [0.6, 0.3, 0.4, 0.7]
--------
i=7
For node Dyspnea:
Present
Absent
parents --> TbOrCa
parents --> Bronchitis
probabilities in CPT --> [0.9, 0.7, 0.8, 0.1, 0.1, 0.3, 0.2, 0.9]
--------
================================================================
******************
* DataGenerator *
******************
Description:
DataGenerator can be used to simulate the network and generate a data file
from the network. The output data file can be used as the training dataset
of any learning algorithm. The simulation process is done by forward
sampling, i.e., first simulating root nodes, then simulating their
parents, and so on. It is included in package bbn. DataGenerator2.java is
an optimized version of DataGenerator.java. The output data file is stored
in an xml file format.
DataGenerator is also used to generate the evidence file if the number of
samples is set to zero.
Examples:
(example 2) Simulate the network and generate a data file from the
network.
The following example shows how to simulate the network and generate a
data file with 20 samples, named asia20.xml, from the network.
=============================Example of using
DataGenerator========================================
D:\bnj>java bbn/DataGenerator2
* * * DataGenerator2 * * *
Usage: java DataGenerator2 input.xml output.xml number_of_samples (0 will
generat
e an evdience file)
D:\bnj>java bbn/DataGenerator2 asia.xml asia20.xml 20
* * * DataGenerator * * *
Usage: java DataGenerator input.xml output.xml number_of_samples (0 will
generat
e an evdience file)
Success!
D:\bnj>
=============================Result of BBNExample:
asia20.xml=======================================
]>
InternalNetwork
VisitAsia
Visit
No_Visit
position = (166, 79)
Smoking
Smoker
NonSmoker
position = (520, 79)
Tuberculosis
Present
Absent
position = (166, 175)
Cancer
Present
Absent
position = (418, 175)
TbOrCa
True
False
position = (310, 271)
XRay
Abnormal
Normal
position = (178, 361)
Bronchitis
Present
Absent
position = (658, 175)
Dyspnea
Present
Absent
position = (454, 361)
20
No_Visit Smoker Absent Absent False Abnormal Absent
Absent
No_Visit Smoker Absent Present True Abnormal Absent
Absent
No_Visit NonSmoker Absent Absent False Normal Absent
Absent
No_Visit Smoker Absent Absent False Normal Present
Present
No_Visit Smoker Absent Absent False Normal Present
Present
No_Visit NonSmoker Absent Absent False Normal Absent
Absent
No_Visit Smoker Absent Absent False Normal Present
Absent
Visit Smoker Absent Absent False Normal Absent Absent
No_Visit NonSmoker Absent Absent False Normal Absent
Absent
No_Visit NonSmoker Absent Absent False Normal Present
Present
No_Visit Smoker Absent Absent False Normal Present
Present
No_Visit NonSmoker Absent Absent False Normal Absent
Present
No_Visit Smoker Absent Absent False Normal Absent
Absent
No_Visit Smoker Absent Absent False Normal Present
Present
No_Visit NonSmoker Absent Absent False Normal Present
Present
No_Visit NonSmoker Absent Absent False Normal Absent
Absent
No_Visit Smoker Absent Absent False Normal Present
Present
No_Visit NonSmoker Absent Absent False Abnormal Present
Present
No_Visit Smoker Absent Absent False Abnormal Present
Present
No_Visit NonSmoker Absent Absent False Normal Absent
Absent
=============================End of BBNExample:
asia20.xml========================================
The following example shows how to generate an evidence file using
DataGenerator. The evidence file is useful for inference.
(example 3) use dataGenerator to generate evidence file
=============================Example of using
DataGenerator========================================
D:\bnj>java bbn/DataGenerator2 asia.xml asia_evidence.xml 0
* * * DataGenerator2 * * *
Usage: java DataGenerator2 input.xml output.xml number_of_samples (0 will
genera
te an evdience file)
Success!
D:\bnj>
=============================Result evidence file :
asia_evidence.xml===================
0 0 0 0 0 0 0 0
VisitAsia|Smoking|Tuberculosis|Cancer|TbOrCa|XRay|Bronchitis|Dyspnea
Visit NonSmoker Absent Present False Normal Present Present
=============================end of evidence
file=======================================
You can modify the evidence bits at the first line to set evidence nodes(0
means query, 1 means evidence).
*********
* K2 *
*********
Description:
K2 is a score based greedy search algorithm for learning Bayesian networks
from data. It was published in Cooper, G. and Herskovits, E. (1992). A
Bayesian method for the induction of probabilistic networks from data.
Machine Learning 9, 309-347 (1992) K2 uses a Bayesian score, P(Bs, D), to
rank different structures and it uses a greedy search algorithm to maxmize
P(Bs, D).
The main classes in package k2 are K2Manager.jaav and K2.java. Class
K2Manager loads the input training data. Class K2 implements the learning
procedure.
Inputs to K2 include:
1 a set of nodes,
2 an ordering on the nodes,
3 an upper bound u on the number of parents a node may have
4 a database D containing m cases of training data
Output of K2: for each node, K2 returns its most probable parents given
the training data D.
In our implementation, the set of nodes and the database D are stored in
the input training data file in xml format. The upper bound u is set to a
fixed value 5 in K2.java. The required ordering can be input from command
line. If no ordering is given at command line, 0 - n is assumed according
to the node ordering in the input data file.
The output of K2 is the learned network in XMLBIF format. You can use the
network editor to open and see its structure.
Examples:
(example 4) use K2 to learn a Bayesian network from training data.
The following example shows how to use K2 to learn a network from training
data. To run K2, first we need to prepare the input training data. In this
example, we first use DataGenerator to generate the training data, then we
run K2 on this training dataset to recover(learn) the network from data.
The
=============================Example of
K2========================================
D:\bnj>java bbn/DataGenerator2 asia.xml asia1500.xml 1500
* * * DataGenerator2 * * *
Usage: java DataGenerator2 input.xml output.xml number_of_samples (0 will
genera
te an evdience file)
Success!
D:\bnj>java k2/K2 asia1500.xml learned.xml
Usage: java k2.K2 input.xml output.xml [order]
(if no order is given, 0 - n is assumed).
[loaded network and training samples]
[Network learned.]
[Finished.]
D:\bnj>
==========================Result of K2, the learned network:
learned.xml=============
]>
InternalNetwork
VisitAsia
Visit
No_Visit
position = (166, 79)
Smoking
Smoker
NonSmoker
position = (520, 79)
Tuberculosis
Present
Absent
position = (166, 175)
Cancer
Present
Absent
position = (418, 175)
TbOrCa
True
False
position = (310, 271)
XRay
Abnormal
Normal
position = (178, 361)
Bronchitis
Present
Absent
position = (658, 175)
Dyspnea
Present
Absent
position = (454, 361)
VisitAsia
Smoking
0.5026666666666667 0.49733333333333335
Tuberculosis
VisitAsia
0.1111111111111111 0.008097165991902834 0.8888888888888888
0.9919028340080972
Cancer
Smoking
0.11538461538461539 0.00804289544235925 0.8846153846153846
0.9919571045576407
TbOrCa
Cancer
Tuberculosis
0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0
XRay
TbOrCa
0.9532710280373832 0.04737975592246949 0.04672897196261682
0.9526202440775305
Bronchitis
Smoking
VisitAsia
0.4 0.581989247311828 0.625 0.3089430894308943 0.6
0.41801075268817206 0.375 0.6910569105691057
Dyspnea
Bronchitis
TbOrCa
0.875 0.7805280528052805 0.6744186046511628
0.09911054637865312 0.125 0.21947194719471946 0.32558139534883723
0.9008894536213469
==========================end of
learned.xml====================================================================
********************
* Network Editor *
********************
Network editor is a graphical Bayesian network editor. Users can use it to
open Bayesian networks of up to 9 file formats including (.bif, .xml,
.net, .xbn, .dsc, .dsl, .dnet, .ENT, .ideal). After viewing and editing
the network, users can save the network into 3 different file format:
.xml, .bif, and .net.
BIF .bif --- Bayesian networks Interchange Format
(http://www-2.cs.cmu.edu/~fgcozman/Research/InterchangeFormat/Old/xmlbif02.html)
XMLBIF .xml --- XML Belief Network Interchange File Format
(http://www-2.cs.cmu.edu/afs/cs/user/fgcozman/www/Research/InterchangeFormat/)
Microsoft XBN .xbn --- XBN File Format
(http://www.research.microsoft.com/research/dtg/bnformat/default.htm)
Microsoft .dsc --- MSBN Page
(http://www.research.microsoft.com/adapt/MSBNx/)
Hugin format .net --- the net language
(http://developer.hugin.com/documentation/net/Net_Language.article)
Genie format .dsl --- Genie home
(http://www.kddresearch.org/Groups/Probabilistic-Reasoning/www2.sis.pitt.edu/~genie)
Ergo format .ENT --- Ergo by Noetic Systems
(http://www.noeticsystems.com/ergo.shtml)
Netica format .dnet --- Netica by Norsys (http://www.norsys.com/)
IDEAL format .ideal --- IDEAL format by Rockwell
(http://www.rpal.rockwell.com/ideal.html)
The main IO functions are implemented in FileIO.java file, i.e.
FileIO.load() method and FileIO.Save() method.
The following example shows how to start up the network editor.
Examples:
(example 5)
==========================EditorExample.java====================================================================
import bbn.*;
public class EditorExample {
public static void main(String[] args) {
Debug.println("xbneditor...");
XBN editor = new XBN();
}
}
==========================end of
EditorExample.java====================================================================
*******************
* Converter *
*******************
Network converter can convert Bayesian network files from 9 different
formats(.bif, .xml, .net, .xbn, .dsc, .dsl, .dnet, .ENT, .ideal) to 3
formats(.xml, .bif, .net). Last section shows how to convert network
formats using the network editor. This section gives examples of
converting network formats at command line.
Usage: java xbneditor/BBNConvertor [sourcefilename] [targetfilename]
The following example shows how to convert asia.xml to asia.net.
==========================BBNConvertor
Example====================================================================
D:\bnj>java xbneditor/BBNConvertor asia.xml asia.net
informat = xml
outformat = net
load XMLBIF .xml file ...
load variables ...
stateList ******* [Visit, No_Visit]
stateList ******* [Present, Absent]
stateList ******* [Smoker, NonSmoker]
stateList ******* [Present, Absent]
stateList ******* [True, False]
stateList ******* [Abnormal, Normal]
stateList ******* [Present, Absent]
stateList ******* [Present, Absent]
load ProbabilityDistributions ...
blockName = "VisitAsia"
parentList = []
probabilitiesList = 0.01 0.99
count = 2
blockName = "Tuberculosis"
parentList = [VisitAsia]
probabilitiesList = 0.05 0.01 0.95 0.99
count = 4
blockName = "Smoking"
parentList = []
probabilitiesList = 0.5 0.5
count = 2
blockName = "Cancer"
parentList = [Smoking]
probabilitiesList = 0.1 0.01 0.9 0.99
count = 4
blockName = "TbOrCa"
parentList = [Tuberculosis, Cancer]
probabilitiesList = 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0
count = 8
blockName = "XRay"
parentList = [TbOrCa]
probabilitiesList = 0.98 0.05 0.02 0.95
count = 4
blockName = "Bronchitis"
parentList = [Smoking]
probabilitiesList = 0.6 0.3 0.4 0.7
count = 4
blockName = "Dyspnea"
parentList = [TbOrCa, Bronchitis]
probabilitiesList = 0.9 0.7 0.8 0.1 0.1 0.3 0.2 0.9
count = 8
outputfiletype =net
convert to Hugin .net format!
D:\bnj>
==========================end of BBNConvertor
Example====================================================================