2

I recently started studying bayesian networks and I am now implementing an exact inference algorithm: enumeration. I am aware of the complexity and inefficiency of this method but I want to fully understand it.

As far as I know there are three basic 'steps':

  1. Use Bayes Rule on query / evidence variables.
  2. Use summation for hidden variables.
  3. Calculate joint distribution.

I've tried a recursive approach as suggested in most documentation of the topic, here's the algorithm I'm using (with python):

'''
bayesnet is an instance of a class that hold a bayes net
queryVars is a list of tuples (varname, boolean) to describe the state of a variable, eg. ('B', True)
evidenceVars is a list of tuples (varname, boolean)
'''
def enumerate_method(bayesnet, queryVars, evidenceVars = []):
    #Check evidence vars to apply bayes rule
    if len(evidenceVars):
        return enumerate_method(queryVars + evidenceVars) / enumerate_method(evidenceVars)
    #Check if there are hidden variables left given the query variables
    if len(bayesnet.hiddenVariables(queryVars)):
        #For every hidden variable
        for hiddenvar in bayesnet.hiddenVariables(queryVars):
            return enumerate_method(bayesnet, queryVars + [(hiddenvar, True)]) + enumerate_method(bayesnet, queryVars + [(hiddenvar, False)])

    #Calculate joint distribution of queryVars?

    return bayesnet.joint(queryVars)

I am still unsure of how to implement the joint distribution calculation. My BayesNet class stores, for each variable, a name, a list of parents and a list of tuples with the probability given its parents. The bayes network is fully described.

I guess I have to use the chain rule, but I have trouble with that as it brings new expressions with evidence variables that I do not have.

Am I moving on the right direction? If so, could you provide a suggestion on how to calculate the joint distribution.

Raphael
  • 73,212
  • 30
  • 182
  • 400

1 Answers1

2

You already have the result of the chain rule - this is your networks topology basically. Now you need to compute the joint. I would start from "root nodes", i.e. nodes with no parents, and proceed forward (I'm sure you can go the other way too, just more pain):

  • Each of the "root nodes" has its "own" joint already. I.e. if your network was just one root node, you're done - just output the CDF.
  • Now suppose your network is two "root nodes". What's the joint? Suppose you have two boolean variables A,B with values a,~a,b,~b. The joint is:

    (a,b, P = a*b)

    (a,~b, P = a*~b)

    (~a, b, P = ~a*b)

    (~a,~b, P = ~a*~b)

  • This looks like a cartesian product. I.e. you have a cartesian product in the variables, and the value is the product of terms. This is the simple version of a "factor product" - where factor is some unnormalized discrete distribution.

  • In general the product of two factors will encounter variables that are common to the factors. Do the cartesian product and throw out all tuples that match on the common variables - this is the join in databases. Compute the probability as before.

  • Now you have the factor product. Just use on your graph to compute the joint. Be careful with paths that separate and then join back in the graph.