Attribute selection is used to determine the splitting criterion which tells us which attribute to test at node N by determining the “best” way to separate or partition the tuples in D into individual classes.Attribute Selection MeasuresInformation GainThe attribute with the highest information gain is chosen as the splitting attribute for node N..This attribute minimizes the information needed to classify the tuples in the resulting partitions..The expected information needed to classify a tuple in D(or entropy) is given bySource- Artificial Intelligence Stack Exchangewhere pi is the nonzero probability that an arbitrary tuple in D belongs to class Ci and is estimated by Ci,D/|D|..A log function to the base 2 is used, because the information is encoded in bits.Ideally, we would like this partitioning to produce an exact classification of the tuples..That is, we would like for each partition to be pure..However, it is quite likely that the partitions will be impure (e.g., where a partition may contain a collection of tuples from different classes rather than from a single class)..How much more information would we still need (after the partitioning) to arrive at an exact classification?Result 2Info A(D) is the expected information required to classify a tuple from D based on the partitioning by Attribute A..The smaller the expected information (still) required, the greater the purity of the partitions..Finally Gain isResult 3In fact Gain(A) tells us how much would be gained by branching on A..The attribute A with the highest information gain, Gain.A/, is chosen as the splitting attribute at node N.Lets have an exampleSource- CheggLet class C1(class 1) correspond to yes and class C2 correspond to no..There are nine tuples of class yes and five tuples of class no..A (root) node N is created for the tuples in D.Next, we need to compute the expected information requirement for each attribute.. More details