Data Ethics: Keeping Your Ethics in Check as a Data Scientist

This also means that any conclusions you make about certain groups of people or how the world works depends on whether good data ethics collection practices were used..For example, you might come across a model that was based on “race” as being a heavily weighted predictor variable..There are two issues with this: Firstly, the model just so happens to classify people of a certain race as all being high credit risk applicants for a home loan at a bank..However, when looking closer at the actual data, it’s apparent that majority of cases are from one racial group, with all these cases living in the same part of the city or location..How different the results might be if there was a more diverse random sample of cases, across all locations?.What if there were many cases of this racial group living in other locations with a good credit history that just didn’t make it into the dataset?.Also, when it comes to classification tasks, if there is extreme imbalance of classes in the dataset, the model will tend to correctly predict the majority class most of the time, but will struggle to predict the under-represented classes..Secondly, why did the bank decide to place such a heavy weight on the predictor variable “race”?.Are the results different when race is not heavily weighted?.Was this decision driven by a personal viewpoint, or was there a non-subjective reason behind placing a heavy weight on race?.It could be that the reason behind this decision is purely subjective and skews the results, therefore making any conclusions negligible..Bad Data Ethics Studies that make conclusions about crime rates among certain ethnic or socioeconomic status groups are another example where data ethics are a concern.. More details

Leave a Reply