This is a summary of Chapter 16.2 in "Deep Learning" by Ian Goodfellow, Yoshua Bengio, Aaron Courvilley.
Undirected models are also known as Markov random fields (MRFs) or Markov networks. It uses graphs whose edges are undirected as shown below. The edge in an undirected model has no arrow and is not associated with a conditional probability distribution.
One of the most important term in an undirected graphical model is clique. A clique 𝒞 is a subset of nodes that are all connected to each other by an edge of the graph. A factor 𝜙(𝒞), which is also called a clique potential, measures the affinity of the variables in that clique for being in each of their possible joint states. The factors are constrained to be nonnegative. Together they define an unnormalised probability distribution:
. The unnormalised probability distribution is efficient to work with so long as all the cliques are small. Unlike in a Bayesian network, there is little structure to the definition of the cliques. So we cannot guarantee that multiplying them together will yield a valid probability distribution. The normalised probability distribution of the above graph 𝒫(a,b,c,d,e) can be written as
The Partition function
To obtain a valid probability distribution, we must use the corresponding normalised probability distribution:
, where Z is the value that result in the probability distribution summing or integrating to 1:
. This normalising constant is known as the partition function.
One consideration for designing undirected models is that is is possible to specify the factors in such a way that Z does not exist. For example, when 𝜙(x) = x^2, the integral of Z diverges, causing no probability distribution corresponding to the choice of 𝜙(x).
One key difference between directed modelling and undirected modelling is that directed models are defined directly in terms of probability distributions from the start, while undirected models are defined more loosely by 𝜙 functions that are then converted into probability distributions. Sometimes, we can define the domain of each of the variables to stop the partition function Z from diverging.
Energy based model
As the undirected models depend on the assumption that the unnormalised probability distribution is bigger than 0, a convenient way to enforce this condition is to used an energy-based model (EBM) where
, and E(x) is known as the energy function.
Any distribution of the form of the EBM is an example of a Boltzmann distribution. Nowadays, The model with latent variables are called Boltzmann machine, while Boltzmann machine without latent variables are called Markov random fields or log-linear models.