We’ll dive into this in a moment, but first, let’s talk about different types of uncertainty.There are different types of uncertainty and modeling, and each is useful for different purposes.Model uncertainty, AKA epistemic uncertainty: let’s say you have a single data point and you want to know which linear model best explains your data..On the right: given more data uncertainty reduces Epistemic uncertainty accounts for uncertainty in the model’s parameter..We are not sure which model weights describe the data best, but given more data our uncertainty decreases..This type of uncertainty is important in high risk applications and when dealing with small and sparse data.As an example, let’s say you want to build a model that gets a picture of an animal, and predicts if that animal will try to eat you..This uncertainty is the result of the model, and given enough pictures of zombies it will decrease.Data uncertainty, or aleatoric uncertainty, captures the noise inherent in the observation..If the labels are noisy, the uncertainty increases.There are various ways to model each type of uncertainty..The model was able to learn that given an informative advertiser it should reduce the uncertainty.We can repeat this for different features and look for ones that result in low uncertainty when replaced with OOV embeddings..We would expect the model to have higher uncertainty for advertisers of the first type..Again, we expect the model to become more certain, and if it doesn’t — debug we will!Another cool example is the title feature: unique titles with rare words should incur high model uncertainty..This is the result of the model not seeing a lot of examples from that area of all possible titles..Then we’ll retrain the model using one of the titles, and see if the uncertainty has been reduced for the entire group.. More details