Markov Networks: Undirected Graphical ModelsKushal ValaBlockedUnblockFollowFollowingMar 10This article briefs you about Markov Networks which falls under the family of Undirected Graphical Models (UGM).
This article is a follow-up to Bayesian Network, which is a type of Directed Graphical Models.
Key Motivation behind these networks is to parameterize the Joint Probability Distribution based on Local Independencies between Random Variables.
Picture Courtesy: PinterestShortcomings of Bayesian NetworkGenerally, Bayesian Network requires to pre-define a directionality to assert an influence of random variable.
But there might be cases where interaction between nodes ( or random variables) are symmetric in nature, and we would like to have a model which can represent this symmetricity without directional influence.
In some use-cases, Bayesian Network might fail to represent the Perfect Graph including all independencies in the distribution.
Figure-1 Interaction GraphIn this figure, There is no interaction between A and C, B and D, which means there is conditional independence of A given C when B or D is given.
Also, B is independent of D given A or C.
We fail to make a Bayesian Network which dictates a directional influence, but instead, want a network which shows the strength of interaction of edges between nodes.
To overcome this shortcoming, we address to a different family of Probabilistic Graphical models i.
e Undirected Graphs.
Figure 1 represents a Markov Model which eradicates the directional influence of edges.
Markov NetworksIn Bayesian Networks, We used conditional probability as factors between connected nodes.
Figure-2: Conditional Parameterization of Bayesian GraphUsing conditional probability in case of Undirected Graphical Models seems erratic because there is no direction and hence no natural conditioning.
In the case of Markov Models, we want to capture the affinity between connected random variables.
Affinity can be a real number unlike in the case of Bayesian Network where factors were Probability between 0 and 1.
So for figure 1, we can have factors which link the random variable interaction.
Figure-3: Factors for Markov Network in Figure-1In the above Markov Network, we have assumed that random variable will take binary value : 0 or 1.
How to learn these numbers?.Even for the case of Bayesian Networks!.This number which represents affinity is derived from past interactions between random variables (Data).
Ring Any Bells?.Yes, that's where Deep Learning comes into the picture.
Data is used to derive the parameter weight.
In the previous article, It was mentioned these methods are a pathway to Restricted Boltzmann Machines which form a class of Neural Networks.
Eventually, we are interested in getting a probabilistic approach in this method.
Probability Theory is a working engine in most of Deep Learning and Machine Learning Algorithms.
In Markov Models, we will write the joint probability distribution as the product of all factors.
The method used is called Factor Product.
Figure-4: Factor Product of Markov Model with A, B, and C.
We can face a situation where we are dealing with unnormalized data.
In that case, we use a partition function which divides the product of factors.
Figure-5: Joint Probability Distribution of Markov NetworkIf we extend the model in figure 1, we get the following Markov Network with factors:Figure-6: Markov Network with 6 random variables and their factors.
In this Markov Model Graph, we see there are binary interactions which have been factored!.But what if we can group each clique *.
It will substantially reduce complexity.
Cliques are the subset or a subgraph of an undirected graphical model such that every two distinct vertices in clique are adjacent to each other.
We will use this concept to advance the concept of parameterization in Markov Networks.
Figure-7: Parameterization using CliquesThe factorization that we get is much simpler in terms of factors and less computationally expensive.
In conclusion, A distribution factorizes over a Markov Network, H if P can be expressed as follows, where D represents complete subgraph in H.
Figure-8: Gibbs Distribution of Markov NetworkThis distribution is called a Gibbs Distribution.
Local Independencies in Markov NetworkLet U be set of all random variables in our joint distribution.
Let X, Y, Z be some distinct subsets of U.
A distribution of P over Random Variables would imply X is independent of Y given Z only if we write the Distribution as :Figure-9: The Factorized form of X is independent of Y given ZWe started this article with the example of Cyclic dependency graph structure of A, B, C, D which can be factorized into:Figure-10: Modified Factors for Markov Model in Fig.
1So analogous to Bayesian independencies, we can make a formal semantic for Markov Models: X is independent of Non-Neighbor given Neighbor in Clique.
Mathematically represented as :Figure-11: Formal Semantic of Markov NetworkConcluding NoteMarkov and Bayesian Network Models are used to conditionally parameterize the Joint Distribution into directed or undirected graphs having local probability models associated with them.
That is our main goal and pre-requisite needed for Restricted Boltzmann Machines.
The Learning and Inference of this network is a subject of its own, which is beyond the scope of these articles.
Kushal Vala, Junior Data Scientist at Datametica Solutions Pvt LtdReferences: Daphne Koller, Nir Friedman, Probabilistic Graphical Models Dr.
Mitesh Khapra, Deep Learning-2, IIT-Madras.. More details