Explainable AI vs Explaining AI — Part 2Statistical Intuitive vs.

Symbolic Reasoning Systemsahmad haj mosaBlockedUnblockFollowFollowingJan 7In the early 1900s, the horse Clever Hans showed a remarkable ability to answer arithmetic questions.

Hans tapped numbers or letters with his hoof in order to answer questions.

This behavior, drew the world‘s attention to him as the first intelligent animal.

However, after some experimentation, it appeared that Hans was biased, reading subconscious cues from his human observers and tapping the appropriate answer.

How did Hans make its decision?Clever Hans used statistical bias to complete his tasks; this approach seeks relationships in the data set without an understanding of the background symbolic reasoning model.

In the previous part, the two systems of thinking and their relation to the level of explainability, generalization and speed of learning was presented.

This part presents a deeper comparison between System 1 and System 2, which I will refer to as the Statistical Intuitive (“Clever Hans” system), and the Symbolic Reasoning Systems, respectively.

Statistical Intuitive SystemIntuition is related to automatic decision and reflection.

Intuition is the ability to make decisions without complete information, it makes us capable of making decisions in a certain domain without having a deep logical understanding of it.

Intuitive system uses historical evidence to prove the strengths of its arguments.

Intuitive systems care more about the correlation between two historical events A (raining) and B (carrying umbrella) rather than, reasoning about the relationship between them.

So in order to prove that: if it rains then people carry umbrellas, the Intuitive system uses the historical statistical evidences of frequently observing both events together.

While logical system will resolve the problem and try to prove it with some common knowledge and logical background, for example: rain consists of water, water makes people wet, people do not like to get wet and umbrellas block the rain.

The logical system knows that carrying umbrella does’not imply its raining, so rain cause carrying umbrella and not the vise versa.

So intuitive system is the lazy part of our brain (Clever Hans trick), not necessary the wise part, but could be most of the time correct, especially if the historical evidence is strong.

Classical statistical machine learning algorithms use intuition to make decisions.

They all build a mathematical model of the past experience, and use it to make an automatic reflection.

Deep learning added some reasoning by building high level features, but still it makes the final decision (decision layer) using statistical bias.

Most of deep learning based statistical intuitive system has the following features:Narrow ( used in a specific domain such as digit recognition )Learns from historical data or experience ( supervised, unsupervised or reinforcement learning )Overcomes the curse of dimensionality: build a high level representation of a low-level input xRelies on the i.

i.

d assumption: assuming that the test distribution is same as training distributionBecause of the i.

i.

d assumption, this system needs a lot of training data.

The training distribution must be as general as possible, and this system is weak in reasoning outside the training distributionThe mapping from input to output does not contain a search process, i.

e searching for an optimum decision in a space of many possible decisions or searching for logical arguments to achieve its decisionIn this article, I will use deep learning (DL) as the main representative of the intuitive system, that takes a low-level input (voice, image or text), reduces the dimensinality, and maps the inputs to a high-level representation space where it can easily build a decision function between the different targets (classes, events), see Figure 1.

Figure 1: http://colah.

github.

io/posts/2014-03-NN-Manifolds-Topology/There are many current examples of what DL can do, such as voice recognition, self-driving cars and machine translation, but let‘s discuss what is challenging to deep learning.

When it comes to symbolic computation, deep network tends to fail in generalization .

Take for example the scalar identity function, which is one of the the simplest symbolic computations.

An experiment has been done to train Auto-encoder, that takes a scalar input (e.

g.

, the number 4), encode it in a latent space, then ts hidden layers (distributed representations), then predict the input value again (number 4) as a linear regression of the last hidden layer.

Different Auto-encoders with different activation functions and inputs in the range between -5 and 5 are used.

In this experiment, generalization was tested for the inputs in the range > 5 or < -5.

The results show that all networks failed to generalize (Figure 2).

Figure 2: MLPs learn the identity function only for the range of values they are trained on.

The mean error ramps up severely both below and above the range of numbers seen during trainingAnother example of symbolic operation is the use of the comparison operators ( < > ≤ ≥ =).

In the paper Learning explanatory rules from noisy data, Deepmind scientists experimented with training a CNN model that compares two MNIST images and gives a true value when the recognized number in the right image is greater than the number in the left image (Figure 3).

Figure 3: A DNN system that compares two MNIST images and give a true value when the recognized number of the right image is greater than the number of the right image.

If we train a CNN model to do this task, we need to create a dataset that contains several examples of every pair of digits; only then will it be able to generalize.

For example, in the following figure, we see a set of all possible pair of digits between 0 and 9, that apply the < relation (Figure 4).

Figure 4: Application of the ‘<’ (less-than) relation to every possible pair of digits between 0 and 9So, if the training data set contains a set of different images for every pair of digits in Figure 4, then the DL system will be able to generalize correctly.

However, it will generalize visually rather than symbolically.

In other words, whenever this system receives two images, it will map them to their similar training pair (again “Clever Hans” trick), and if that training pair is part of the list in Figure 4, then it will predict 1, or True.

How to test if the system generalizes the symbolic relation?Suppose the training set of pairs is instead like that depicted in Figure 5; then, the system will most likely fail to predict the relations 7<9, 2<8, and so forth (Figure 6).

It will be able to recognize the 7, 9, 2 and 8 digits because it has seen many images of them (i.

e in 8<9).

So it generalizes visually but not symbolically.

Figure 5: Application of the “less-than” relationship with some pair of digits between 0 and 9 removedFigure 6: the difference between Visual and Symbolic generalizationMost symbolic operations, such as simple arithmetic, propositional logic, first-order logic and spatial relations (e.

g.

object 1 is above, under, before or after object 2) are challenging for the intuitive system, especially when it comes to generalization.

Symbolic Reasoning SystemReasoning can be defined as the algebraic manipulation of historical knowledge in order to answer a new question.

This manipulation can include a search in an algebraic space of different solutions.

The reasoning system has the following features:It requires a knowledge base (a relational, non-relational or graph database).

See the family tree in Figure 7 for an example.

It requires a collection of symbolic facts, rules and relationships, like the one shown in Figure 8.

Figure 7: A knowledge graph of a family treeFigure 8: Symbolic clauses about the family tree3.

It requires an inference engine, that takes a question or query and generates an answer by using the set of rules and the knowledge-base.

For example, if I ask “who is the maternal great-uncle of Freya?”, the inference engine will search for the solution in the space of clauses in Figure 8 and apply deduction rules such as substitution.

The first selection will be the last clause (in blue in the figure).

The first predicate of this rule is maternalgrandmother(Freya,?).

By checking the third clause, we see that “maternalgrandmother” has the conjunction of predicates mother(X,Z), mother(Z,Y), which basically says “if Y is the mother of Z and Z is the mother of X, then Y is the maternal grand mother of X.

”So the engine will first find the maternal grandmother of Freya using the third clause, which is Charlotte, then the mother of Charlotte, which is Lindsey and finally the son of Lindsey, which is Fergus, who is the maternal great-uncle of Freya (Figure 9).

Figure 9 : Reasoning about the family treeAs we see in the previous example, symbolic AI involves a searching process.

In this regards, researchers have proposed different searching algorithms, such us Goal tree search (also call And — Or tree) and Monte Carlo tree search.

Let us take another example to see how we can use the tree search in inductive logical programming, a field of symbolic AI that focuses on learning logical clauses from data — this is also considered to be a part of machine learning.

Suppose we have the truth table in Figure 10, and we want to find the correct logical clause to predict Y given the three inputs A, B and C.

Figure 10: A truth tableThe first step is to write all the mini expressions where Y is true as the following:(1)The target expression, that the symbolic simplification system suppose to find is:(2)To solve this problem, the automatic symbolic simplification AI will need a set of simplification (problem reduction) rules.

Then, it will start a search tree, where at each node of the tree, it will select on or more term, then it will apply one of the simplification rules that fits the best.

In Figure 11, we see an example of how a tree search combined with background knowledge (simplification laws) can be used to find the simplest boolean expression of the truth table in Figure 10.

The python implementation of this problem is as follows:from sympy import * # python symbolic packagefrom sympy.

logic import SOPformimport numpy as npimport time # will use it to estimate the processing timea, b, c = symbols('a b c') # create the four three symbols in Fig 10minterms = [[0, 0, 1], [1, 0, 1],[0, 1, 1], [1, 1, 0], [1, 1, 1]] # the terms/rows in Fig 10, where y = 1tic = time.

clock()expr = SOPform([a,b,c], minterms)toc = time.

clock()print('expression :', expr)print('processing time:', toc-tic)the result is:expression : c | (a & b) processing time: 0.

0005342439999989068As we see it is quit easy to implement it and find the correct boolean expression from training data/ truth table.

Until now, so far, we have seen two examples of what Symbolic AI can do.

butWhat is challenging in Symbolic Systems?Limitations of Symbolic SystemsThe example above is a simple example of inductive logical programming, the technique we used to find the optimum solution has the following limitations:Computationally Expensive: Suppose we have a truth table with 15 variables.

The target expression and the python code is the following:from sympy import * # python symbolic packagefrom sympy.

logic import SOPformimport itertoolsimport pandas as pdimport numpy as npsm = symbols('x0:10')# create the truth table out of 15 boolean variablesimport itertoolsimport pandas as pdn = len(sm)truth_table = list(itertools.

product([0, 1], repeat=n))# create a DataFrame of the truth tabletruth_table_df= pd.

DataFrame(truth_table, columns= np.

asarray(sm).

astype('str'))# write a target logical expression, that the sympy should findy=(truth_table_df['x0'] & ~truth_table_df['x1'] & truth_table_df['x2']) | (truth_table_df['x3'] & ~truth_table_df['x4'] & truth_table_df['x5']) | (truth_table_df['x6'] & truth_table_df['x7'] & ~truth_table_df['x8'] & ~truth_table_df['x9'])# find the miniterms, where y is trueminterms=truth_table_df[y==1].

values.

tolist()# Run simplification codetic = time.

clock() # starting timeexpr = SOPform(sm, minterms) # find the expressiontoc = time.

clock() # end timeprint('expression :', expr)print('processing time:', toc-tic)the result is :expression : (x0 & x2 & ~x1) | (x3 & x5 & ~x4) | (x6 & x7 & ~x8 & ~x9)processing time: 2.

3627283299997544If we compare the processing time of this example with the one with only three variables, we see that it can increase exponentially depending on the number of the variables and the complexity of the solution.

2.

Sensitive to noise: Suppose another two variables are added to the problem in the previous step.

These two variables are assigned some random values and the the target expression is the same:from sympy import * # python symbolic packagefrom sympy.

logic import SOPformimport itertoolsimport pandas as pdimport numpy as npsm = symbols(‘x0:12’)# create the truth table out of 15 boolean variablesn = len(sm)truth_table = list(itertools.

product([0, 1], repeat=n))# create a DataFrame of the truth tabletruth_table_df= pd.

DataFrame(truth_table, columns= np.

asarray(sm).

astype(‘str’))# write a target logical expression, that the sympy should findy=(truth_table_df[‘x0’] & ~truth_table_df[‘x1’] & truth_table_df[‘x2’]) | (truth_table_df[‘x3’] & ~truth_table_df[‘x4’] & truth_table_df[‘x5’]) | (truth_table_df[‘x6’] & truth_table_df[‘x7’] & ~truth_table_df[‘x8’] & ~truth_table_df[‘x9’])# find the miniterms, where y is trueminterms=truth_table_df[y==1].

values.

tolist()# Run simplification codetic = time.

clock() # starting timeexpr = SOPform(sm, minterms) # find the expressiontoc = time.

clock() # end timeprint(‘expression :’, expr)print(‘processing time:’, toc-tic)the result is:expression : (x0 & x2 & ~x1) | (x3 & x5 & ~x4) | (x6 & x7 & ~x8 & ~x9)processing time: 207.

635452686As we see, the solution is correct, but the processing time is about a hundred times longer though we added only two variables.

3.

Sensitive to mislabels: Suppose we reverse some of the Y values :from sympy import * # python symbolic packagefrom sympy.

logic import SOPformimport itertoolsimport pandas as pdimport numpy as npsm = symbols('x0:10')# create the truth table out of 15 boolean variablesimport itertoolsimport pandas as pdn = len(sm)truth_table = list(itertools.

product([0, 1], repeat=n))# create a DataFrame of the truth tabletruth_table_df= pd.

DataFrame(truth_table, columns= np.

asarray(sm).

astype('str'))# write a target logical expression, that the sympy should findy=(truth_table_df['x0'] & ~truth_table_df['x1'] & truth_table_df['x2']) | (truth_table_df['x3'] & ~truth_table_df['x4'] & truth_table_df['x5']) | (truth_table_df['x6'] & truth_table_df['x7'] & ~truth_table_df['x8'] & ~truth_table_df['x9'])#reverse 2 random rows (mislabled)mislabels= abs(1-y.

sample(n=2))y.

iloc[mislabels.

index] = mislabels# find the miniterms, where y is trueminterms=truth_table_df[y==1].

values.

tolist()# Run simplification codetic = time.

clock() # starting timeexpr = SOPform(sm, minterms) # find the expressiontoc = time.

clock() # end timeprint(‘expression :’, expr)print(‘processing time:’, toc-tic)Then the result isexpression : (x0 & x2 & ~x1) | (x3 & x5 & ~x4) | (x6 & x7 & ~x8 & ~x9) | (x1 & x2 & x4 & x5 & x7 & x8 & ~x0 & ~x3 & ~x6 & ~x9) | (x0 & x2 & x9 & ~x3 & ~x4 & ~x5 & ~x6 & ~x7 & ~x8)processing time: 2.

4316794240000945As we see, only two mislabeled rows created an incorrect expression.

What if noise is combined with mislabels?Then the result is even worse:expression : (x3 & x5 & ~x4) | (x0 & x11 & x2 & ~x1) | (x0 & x2 & x5 & ~x1) | (x0 & x2 & x8 & ~x1) | (x0 & x2 & x9 & ~x1) | (x0 & x2 & ~x1 & ~x10) | (x0 & x2 & ~x1 & ~x3) | (x0 & x2 & ~x1 & ~x4) | (x0 & x2 & ~x1 & ~x6) | (x6 & x7 & ~x8 & ~x9) | (x1 & x11 & x3 & x4 & x8 & ~x0 & ~x10 & ~x2 & ~x5 & ~x6 & ~x7 & ~x9)processing time: 181.

491753306999954.

Sensitive to ambiguity:The logical term X0 = 1, means X0 is equal to 1 and only 1.

What if we are uncertain about X0 = 1?Uncertainty is challenging in Symbolic SystemConsider the two MNIST image-comparison examples (see above).

Given small training examples using inductive first order logic programming , we can easily learn and generalize the operation <; the learned logical program is as follows:Figure 12: Logical clause for the < operationBasically, X is a number and Y is a number; Y is greater than X if there is a number Z, where Z is a successor of Y and X is a successor of Z.

But what if the inputs X and Y for this system are images and not numbers, and we want first to recognize the digits of X and Y and then apply the clause in Figure 12?How can that be done with symbolic AI?.Suppose the digit image dimensions are 5 x 5, so we have 25 Boolean inputs.

Then, the binary images of the two digits 0,1 are as follows:and the the expressions for 0 and 1 are:Other digits from 2 till 9 can be represented in the same way.

By using these expressions we can recognize the digit in the image first and then apply the clause in Figure 11.

But what if the training set consists of non binary images and the digits are ambiguous as in Figure 13, then the symbolic solution will become much more complex to find and to generalize, while a statistical machine learning approach would do the job much better.

Figure 13: Ambiguous image of digit 0Summary:I introduced a brief comparison between statistical and symbolic machine learning, which I referred to as intuitive and symbolic systems, respectively.

The pros and cons of each system have been presented, and can be summarized in the following table:Recently, there has been a lot of research to overcome the limitations of both the intuitive and symbolic systems.

Most of the proposed techniques combine features from both systems to achieve the goal.

In the next parts, we will cover these techniques that aim to close the gab between symbolic and statistical AI.

At the end, I will conclude this part by the following argument:why 4 is smaller than 5?.you would easily explain the reason using your symbolic system (for example because 5 = 4+1, or 5 is a successor of 4)Now if I ask you why do you see now the following digit 4, would you be able to answer?.No because this the role of our intuitive system.

Stay tuned!.. More details