Dealing with Cognitive biases: A data scientist perspectivePraveen BysaniBlockedUnblockFollowFollowingJun 23Bias is an important principle that impacts our lives on a daily basis.
We all have our own biases formed by our life experiences, education, culture and social interactions.
The word bias is associated with a negative connotation and prejudice, but there is also a whole different category of biases under the umbrella term “Cognitive Bias” that helped human race to survive the pre-historic times.
Cognitive biases are the result of our evolutionary process.
They are designed to help us survive by making fast decisions in critical time sensitive situations and deal with limited information processing capabilities of human brain.
Simply, they are the reason behind our gut-feeling.
These biases can often cause irrational behaviour that goes against facts and lead to systematic errors in judgement.
Researchers in the fields of social psychology and behavioural economics have discovered more than 200 different biases in the last seven decades.
They have studied how these biases impact areas such as finance, management and clinical judgement.
As the prevalence of Data science models in real world applications such as credit scoring, fraud detection, personal recommendations continues to increase, it is critical that Data scientists have a more thorough understanding of Cognitive biases.
By being aware of these systematic errors in human judgement, hopefully we can take necessary measures and be more conscious in our approach to building accurate models.
A data science project typically involves five major steps,Understanding the business problemData collection and cleaningExploratory data analysisModel Building and validationInsights and communication of resultsThere hasn’t been much focus on the necessity to cover the blind spots in a data science project, that arise due to cognitive biases.
Below, i provide a list of relevant biases that can impact several stages of the data science lifecycle.
Data CollectionOnce the data scientist has good understanding of the business problem, they need to prepare a dataset to build the model.
At this stage we need to be careful with logical errors introduced by survivorship bias, availability bias etc.
Survivorship bias arises due to the human tendency to focus over data points that are selected or successful over an underlying criteria.
For example, if we are set out to build a model to predict the characteristics of a successful entrepreneur it is very likely the available data is focused mainly about those who made their fortune in the business and not about those who failed in their entrepreneurial journey.
We all know how it’s gonna end if we build the model with only positive labels.
Availability bias is a cognitive shortcut that results in over-reliance of the events or data that we can immediately think of.
Often times, this bias may result in neglecting new sources of data that can potentially improve the model performance.
For example, if the task at hand is to build a credit risk model, our thought process directs us towards collecting the financial and demographic information because they would be readily available and easy to collect.
We do not really focus on the personality traits that can be inferred from user’s web browsing patterns.
As financial stability doesn’t imply inclination to repay loans in real-world, our model will fail to capture this signal.
Exploratory Data Analysis (EDA)Data exploration is a critical part for delivering successful models.
Careful EDA provides an intuitive understanding of the data and helps enormously in engineering new features.
Data scientists need to pay attention to fallacies such as clustering illusion and anchoring at this stage.
Clustering illusion refers to the human tendency to find patterns in random events, when they don’t exist.
Although randomness is prevalent and familiar to us, research in human psychology says we are poor at recognising it.
Since the goal of data exploration is to find patterns int he data, we are more susceptible of making mistakes due to this illusion.
Proper statistical tests and being skeptical to do additional checks helps overcoming this illusion.
Anchoring is a bias that occurs due to the behaviour of attributing too much importance on information that is discovered/provided first.
We tend to ‘anchor’ all of our future decisions based on the first information.
This is a principle thats often exploited by e-commerce and product companies by offering an exuberantly higher price for the first item so that the consecutive items can be deemed as ‘cheap’During EDA, a very weak linear correlation from the first independent variable might influence our interpretation of the correlation of subsequent variables to the dependent variable.
Having a pre-defined set of guidelines and thresholds to guide the significance of analysis will be helpful in overcoming this bias.
Model BuildingAt this stage, data scientists is tasked to select and build a machine learning model that provides the most efficient and robust solution.
It is possible to make mistakes during this phase due to biases such as Hot hand fallacy or Band wagon effect.
Hot hand Fallacy is a phenomenon in which a person who experienced a successful outcome recently is deemed to have greater chances of success in future attempts.
This is most commonly observed in sports, where a team that recorded consecutive wins is considered to be ‘hot’/ ‘on streak’ and hence expected to have higher success rate.
Similarly, we data scientists often times feel that its fine to use the model that gave us the best scores in the previous problem without examining other suitable models.
Band Wagon Effect is a cognitive bias which explains the impulse to choose certain option or follow particular behaviour, because other people are doing it.
This leads to a dangerous cycle, as more people continue to follow a trend makes it more likely that other people hop on the band wagon.
We observe this quite often in analytics, where practitioners often go after the buzz words such as deep learning/reinforcement learning without understanding the constraints and associated costs.
By experimenting with a collection of suitable machine learning algorithms and proper cross validation cycles, we can overcome the biases during the model selection phase.
Model InterpretationAfter we successfully build a model, its equally important to interpret the model output and come up with a coherent story along with actionable insights.
This critical stage can be compromised due to confirmation bias.
Confirmation bias is the most popular and prevalent cognitive bias among all.
We only hear what we want to hear, we only see what we want to see.
Our strong belief system forces us to ignore any information that doesn’t conform with our preconceived notions.
Any new information that challenges existing beliefs should be taken with an open mind.
Easier said than done, i agree.
The cognitive biases mentioned above are by no means an exhaustive list and there’s a lot of other principles and fallacies that drives human thought processes.
I hope this article provided you with some additional insight to be more skeptical and exercise caution in your future projects.
If you are interested in exploring more about cognitive biases, check out this wonderful article written by Buster Benson.