corrcoef(corr_data)}")>>> Correlation of randomly generated data – [[ 1.
]]Correlation of correlated data – [[1.
]]As you can see the resulting correlation is not exactly 0.
This varies based on the sample set generated as well as the sample size.
A joint plot of Height vs WeightWeight and Height have a different mean but the same standard deviation.
Loop through the data set and find points within K=1 STD from mean.
k = 1normal_height = (corr_data > mean – k) & ( corr_data < mean + k)normal_weight = (corr_data > mean – k) & ( corr_data < mean + k)normal_height_weight = normal_height & normal_weightnormal_height is a binary vector indicating all values within K standard deviation from mean height and the same for normal_weight.
normal_height_weight is the intersection of both these conditions.
This provides a nice segway into our second theory, Gaussian Correlation Inequality.
Before switching topics remember that we are trying to show how increasing the dimensions reduces the number of people who are closer to the average.
Lucy Reading-Ikkanda/Quanta MagazineThe image looks confusing at first glance, and the name is a mouthful.
Let’s try to break it down.
Let’s start with the basic rule of multiplication,P(A ∩ B) = P(A) * P(B), given the events, are independent.
But in our case, we know that A and B are not independent, and they are correlated given by the correlation matrix.
P(A ∩ B) = P(A) * P(B|A).
Gaussian correlation inequality states that,P(A ∩ B) >= P(A) * P(B)Replacing A and B with S1 and S2 in our scenario, we know that P(S1) and P(S2) ~ 0.
68, because in a Gaussian random variable 68% of the data falls within +-1 STD of the bell curve meanPlotting P(S1 ∩ S2) and P(S1) * P(S2) against various correlation values clearly shows that when there is minimal correlation, they are identical but when there is a strong correlation, P(S1 ∩ S2) steadily starts to increase while P(S1) * P(S2) remains fairly constant.
Now circling back to the average user fallacy, we know that P(S1 ∩ S2) is < P(S1), which translates to Probability of a user falling +- K STD from the average decreases as you increase the dimension in spite of the correlation among dimensions.
Let’s simulate correlated data in n dimensions and see the effects of the curse of dimensionality on Average User fallacy.
We will randomize the correlation factor between dimensions.
corr_values = np.
random(((ndim ** 2 – ndim) // 2) – 1)corr_matrix = np.
identity(ndim)for row in range(0, ndim): for col in range(0, ndim): if row != col: corr_matrix[row][col] = corr_values[row + col – 1]From the above figure, it is clear that irrespective of correlation, as dimension increases, the probability of occurring close to the mean value across dimensions decreases rapidly and approaches zero.
You can find the entire code in this jupyter notebook.
One size never fits all:For the tolerant readers who have stayed so far, let’s look at a real-world example of designing for the average user going horribly wrong.
99 pi -During the World War II, the US Air Force expanded rapidly accompanied with a decline in performance and a rash of deaths even during training.
The high death rate in the Air Force was a mystery for many years, but after blaming the pilots and their training programs, the military finally realized that the cockpit itself was to blame, that it didn’t fit most pilots.
At first, they assumed it was just too small and that the average man had grown since the 1920s, so in 1950, they asked researchers at Wright Air Force base in Ohio to calculate the new average.
One of these researches was a young Harvard graduate named Gilbert S.
In his research measuring thousands of airmen on a set of ten critical physical dimensions, Daniels realized that none of the pilots he measured was average on all ten dimensions.
Not a single one.
When he looked at just three dimensions, less than five percent were average.
Daniels realized that by designing something for an average pilot, it was literally designed to fit nobody.
Revisiting the thought experiment that we had at the start, being a lead designer and now knowing the Average User Fallacy, how should you go about optimizing your design.In an ideal world, not constrained by resources, the solution would be to custom fit every person with their own set of measurements.
But the real world is never that simple.
You are always bound by constraints and need to strike a balance.
On the one end, you have witnessed the horrors of designing for the average and on the other hand custom fitting everyone is unattainable.
Indian Mom, The best optimizer:Average user fallacy is not ubiquitous.
For example, if you are designing a chair, a car seat, a modern cockpit, you are not restricted to fixed dimensions.
You use levers and other mechanisms to allow multitudes of body dimensions to access them comfortably.
These dynamic systems let users navigate it seamlessly through its continuous design space.
But there are certain designs that are inherently subjected to average user fallacy.
These systems have certain rigidity within them that render them incapable of dynamic adjustments.
One perfect example is the dress.
Once bought, the size remains fixed.
So, the dress manufacturers based on their motive to maximize the profit, try to reach as many people as possible while still reducing the different varieties in sizes that they have to produce.
This leads to the system of making clothes in distinct discrete intervals marked by S, M, L, XL etc and thus solving the average user fallacy problem.
Let’s consider the worst case, where the dress starts to shrink, and you start to gain weight testing the integrity of your dress.
This seems like an impossible problem to solve, but not for an Indian Mom.
How she solves this conundrum is quite ingenious.
She single-handedly decides that you will get one size above your current size so as to support her adage,“You never buy for your current self, but for the one that you will grow into”.
She proudly then strolls past you, perfectly knowing that she got you what you wanted while optimizing the longevity of the product, thereby decreasing the long-term cost.
If this isn’t a perfect optimization, I don’t know what is.