Challenging Convention is a Key Skill of a Data Scientist

But it was valid up until a few months ago so bear with me.

Which Should Be Closer?My first thought was Venus as well.

But when you think about it at any point of time the closest distance is dependent at which point each planet is in its orbit.

There will be times where Mars, Mercury or Venus is closer to the Earth than any other.

But if we are thinking which one spends the most time closer to the Earth then we are thinking more statistical and want an average over time.

My first thought experiment was to consider these orbits.

Orbital Distance Thought Experiment (Credit: Author)Looking at the figure above if we think of the circles each planet will draw out relative to the earth (I didn’t include eccentricity as that’s more maths than my head can hold internally).

We draw a radius from the earth touching the closest and furthest radius of Mercury.

These represent areas that Venus will always be closer (in the blue) and Mercury will always be closer (Green) and a grey area where either one could be closer but will probably average out the same.

This gave me an intuition that since the area of the green was larger than the blue that Venus may not be closer most of the time than Mercury.

But to prove something as a Data Scientist (or regular scientist) you need facts.

So I turned to simulation.

Basic SimulationsThe way I worked it out myself was to do a “back of the envelope” calculation.

I took the semi-major axis for each planet (I ignored eccentricity) and their orbital period in days.

I programmed this in Python (in a Jupyer Notebook) into a simple solar system, lined them up and set them running for 100 years.

Simulation of planet orbits assuming no eccentricity.

I then calculated the distances from earth at each time point and calculated the median distance.

I got:Mars: 1.

826 AUVenus: 1.

231 AUMercury: 1.

073 AU(An AU stands for Astronomical Unit, because the distances involved are so huge they use AU’s which is the average distance of the Earth from the Sun).

The reason I chose the Median is that the Mean can be pulled about by outliers and the Median represents the value at which half your data is below that value and half above.

Therefore here it works excellently because if Mercury has 50% of its data points below the other two planets it must spend more time closer to the Earth than than the other two.

I also plotted an Empirical Cumulative Density Function and plotted the line at 50% which matches the values above.

Empirical Cumulative Density Functions for distances between earth and Mercury, Venus and Mars.

This ordering of distances is roughly in line with the answer from Wolfram Alpha (remember my model doesn’t include eccentric orbits, someone did something more sophisticated here).

This seems quite counter intuitive, but if you think of the orbits, Mercury is close to the sun so cannot move far away (at most its orbital radius plus the Earth’s), whilst the others have a much larger range.

If you’d like a nice graphic of these distances then Popular Mechanic has one here.

My take away?.Don’t always take quoted fact as truth, do your own tests to confirm you believe it, you might be surprised how fake data propagates and becomes fact without ever being challenged or at least sanity checked.

Often this is simply by being quoted often enough that people assume it must be correct (Illusory Truth Effect).

A little bit of research can often indicate if something is realistic or not (be more Descartes basically).

That is great Data Science.

.. More details

Leave a Reply