This is a summary of Chapter 16.2 in "Deep Learning" by Ian Goodfellow, Yoshua Bengio, Aaron Courvilley.
In a graphical model, the edges gives us information about how variables interact. They could be either directly or indirectly interact. It is also beneficial to know which subsets of variable are conditionally independent from each other.
Separation and D-separation
Conditional independence is called separation in undirected models. If a set of variable 𝔸 is separated from another set of variables 𝔹 given 𝕊, and 𝕊 is observable, the two variables can be considered as separable. If the connecting variables 𝕊 are unobserved, they are the "active" nodes which cannot separate variables.
Here 𝕊 is active and unobserved in (a). 𝔸 and 𝔹 are not separable. The shaded 𝕊 is inactive and observed in (b), making 𝔸 and 𝔹 separable.
The similar concept apply to directed models, it is called d-separation where the "d" stands for dependence. There is a special case when 𝕊 is observed but still active.
When variable 𝔸 and 𝔹 are both parents of 𝕊. This is called a V-structure or the collider case. This V-structure causes 𝔸 and 𝔹 to be related by the explaining away effect. In this case, the path is still active when 𝕊 is observed. If 𝕊 has a descendant ℂ which is observed and inactive, the paths between 𝔸 and 𝔹 are still active. The only way to block a path through a V-structure is to observe none of the descendants ℂ of the shared child 𝕊. 𝔸 and 𝔹 are d-separated given the empty set. 𝔸 and 𝔹 are not d-separated given 𝕊.
There is also context-specific independences that they are dependent on the value of some variables in the network. For example, consider a model of three binary variables: 𝔸, 𝔹 and ℂ. Suppose that when 𝔸 is 0, 𝔹 and ℂ are independent, but when a is 1, 𝔹 is deterministically equal to ℂ. Encoding the behaviour when 𝔸 = 1 requires an edge connecting 𝔹 and c. The graph then fails to indicate that 𝔹 and ℂ are independent when 𝔸 =0.
Converting between Undirected and Directed graphs
We typically refer to Restricted Boltzmannn Machine as undirected graphs and sparse coding as directed graphs. The choice of graph depends on the computational task , variables styles, or the one which gives the most independences, the fewest edges.
Advantage table
directed | undirected |
---|---|
efficiently draw samples from the model | useful for deriving approximate inference procedures |
encode some independences | |
substructure of immorality | capture all conditional independences |
Every probability distribution can be represented by either a directed model or an undirected model. In the worst case, one can always represent any distribution by using a complete graph. For a directed model, the complete graph is any directed acrylic graph where all variables have path to each other. For an undirected model, the complete graph is simply a graph containing a single clique encompassing all the variables.
Directed models are able to use one substructure called immorality. It occurs when two random variables 𝔸 and 𝔹 are both parents of a third random variable ℂ, and there are no edge directly connecting 𝔸 and 𝔹. The undirected model converted from a directed model is known as a moralised graph.
The first row of the above figure is directed graph and the second row is the converted undirected graph. For every pair of variables 𝔸 and 𝔹, we add an undirected edge connecting 𝔸 and 𝔹 in the new moralised graph 𝒰 if there is a directed edge connecting 𝔸 and 𝔹, or if 𝔸 and 𝔹 are both parents of a third variable. A directed graph cannot capture all the conditional independences implied by an undirected graph if it contains a loop of greater than three, unless that loop also contains a chord.
A loop is a sequence of variables connected by undirected edges, with the last variable in the sequence connected back to the first variable in the sequence. A chord is a connection between any two nonconsecutive variables in the sequence defining a loop.
Before converting undirected graph to directed graph, we must add the chords to loop larger than 4 edges. Adding these chords discards some of the independence information that was encoded in the undirected graph. The graph formed by adding chords to undirected graph is known as a chordal or a triangulated, as opposite of moralised graph. Then we assign direction to the edges in the chordal graph with an order from the early node to the later node.
Comments