This is what we try to assess when looking at false positives and false negatives.
A false positive in this case is a high-risk score (a “positive” result for high dzud risk) when little to no mortality occurred.
The opposite situation is a false negative, where the risk score is low but high mortality results.
Both are undesirable but for different reasons.
False PositivesA false positive is also known as “crying wolf”.
The result can be similar to the fable, with an erosion of trust in the model after several attempts to prevent disaster over the long term.
In the short term, a false positive can waste resources that could otherwise be used in areas that need them more.
Over the long term, the larger movement to stop a dzud from happening could be undermined with herders, governments, and NGO’s withdrawing from the program.
The idea of false dzud alarms was discussed in 2017 by Julian Dierkes, a professor at the University of British Columbia and a well-known observer of Mongolia.
This critical look at false positives shows how not only data can undermine the system, but the perception of the situation.
False NegativeA false negative can have dire consequences.
In this situation, high mortality would occur with the model indicating a low risk.
This would mean that enough fodder or other materials may not be readily available on short notice to help in aimags falsely designated as low risk.
This would have more extreme short term consequences not only for livestock but also for trust in the model.
If a model has even a few false negatives it very likely could be thrown out and labeled untrustworthy.
Confusion MatrixIn general, when optimizing a machine learning model it is good to ask whether false positives or false negatives are worse.
For this model, it is conceivable that several false positives could be tolerated, although with an erosion of trust and possibly less mobilization for each subsequent false positive.
However, it would be disastrous to have even a few false negatives in the real world.
Livestock could die by the hundreds of thousands and the area could be devastated economically.
For this reason, I believe that false negatives for this are much less desirable and should be minimized as much as possible.
The authors of the study made available the predicted risk scores and the livestock mortality numbers for the years in question.
With this data, we can create a confusion matrix that will allow us to visualize our false positives and negatives.
As the authors did not give bins for the index values, we will simplify things a bit and bin them ourselves.
The bins are as follows:Index, Mortality %0–1, [0%-3%]1–2, (3%-6%]2–3, (6%-17%]3+, (17%+)Here is the corresponding confusion matrix.
TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative.
Confusion matrix derived from MVDI risk scoresViewing this confusion matrix we see a relatively small number of true positives or negatives (about 15% of the total).
If this were a classification task this would be quite undesirable.
However, the model is set up as a regression task, so this isn’t necessarily bad.
Of more interest are the FP’s and FN’s.
The false positives are the large majority here.
As we established that we want to minimize FN’s due to their destructive potential, this is a positive sign.
However, FP’s are nearly 74% of the values.
Given this, it should be known that this model will consistently overstate the risk values.
ConclusionOne way to solve the issue with FP’s and FN’s is to restate the task as a classification one.
Currently, as regression, the output plotted against real values is shown in figure 5.
While it’s simple to say that higher values mean higher risk, what is one aimag has “1.
5” and the other has “2.
Should more resources be provided to the higher risk even though we know this model has a high false positive ratio?Resulting index values from MDVI modelBy changing the problem to a classification one we can allow the user of the model to make more intuitive decisions from the output of the model.
Instead of numeric values, the output of the model could be “low”, “medium”, and “high”.
This may in itself be too simple, but if the classes are based on actual results from real data, then they will have a higher interpretive ability in the real world.
It is an important step for any machine learning or algorithmic project to take time to understand how it’s results can impact stakeholders and the system as a whole.
The authors state that this model was not designed as a dzud forecast.
However, once they release it in the real world NGO’s and governments can interpret the results how they wish.
As such, there is a very real responsibility to understand how these values can be interpreted.
As usual, you can find the data, code, and images for this article on Github here.
Liked this article?.Drop a comment, or send me an email at robertritz@outlook.
If you want to hear more about algorithmic bias check out the excellent DataFramed episode below.