The Tao of Data ScienceDemonic determinism in generative machine learningDeterminism, generative machine learning, and whether or not free will in humans (or machines) is possibleRobert Osazuwa NessBlockedUnblockFollowFollowingMay 28Laplace provided an interesting insight into generative machine learningLaplace’s DemonPierre-Simon Laplace supposed that everything is composed of atoms and that Newtonian physics governs the motions of atoms.
As a thought experiment, Laplace imagined a kind of hyper-clairvoyant demon with two types of knowledge:The demon knows the initial conditions of all the positions and velocities of all the particles in the universe;The demon knows all the (Newtonian) laws of physics.
Laplace argues that given this knowledge, the demon could correctly predict anything at all about all the physical bodies within the universe, including you.
It does this by starting with the initial conditions and projecting forward using the deterministic laws of physics.
Building demonic determinism into machine learning modelsGenerative machine learning models uncertainty with probability.
Recall the apple that fell upon Newton’s head and inspired his conception of physics.
Suppose we want know the weight of that apple.
The following is a simple generative machine learning model of the process of weighing that apple, which we could use to estimate the weight:apple_weight ~ Normal(μ, σ)measurement ~ Normal(apple_weight, ε)Here “Normal” means a normal (Gaussian) probability distribution.
If this is new to you, just think of it as a specific flavor of random generator.
If you are not familiar with the “~” notation, it means that the object on the left-hand side “comes from” the probability distribution on the right-hand side.
To generate from this model, you would assign a value to the object on the left-hand side, by simulating from the distribution on the right-hand side.
To estimate the apple’s weight, you would take an actual measurement, and use this model to make inferences about the apple’s weight, perhaps using something like Baye’s rule.
Reading the model as English, we measure the weight of the apple with a scale.
Since no scale is perfectly calibrated, we distinguish between the reported measurement and the true apple_weight.
We model this source of uncertainty (usually called technical noise) about the true apple_weight given the measurement with a normal distribution centered on the true apple_weight with some noise parameter ε.
Moreover, we are also uncertain about the apple_weight itself.
So we model that uncertainty with a prior — a probability distribution based on some average weight and weight variation across all apples.
In this case, the prior is a normal distribution with mean μ and standard deviation σ.
This model grates against Laplace’s deterministic view of the universe.
If measurement deviates from apple_weight, it is because of some deterministic process stemming from physical characteristics of the scale itself.
Even if we don’t know what those characteristics are, we should still be able to see this deterministic mechanism show up in the model.
Let’s change the model, using a simple mathematical transformation.
# initial conditionsZ1 ~ Normal(0, 1)Z2 ~ Normal(0, 1)# determinismapple_weight = σ*Z1 + μmeasurement = ε*Z2 + apple_weightIf you are not familiar with math, all this transformation did was make use of the mathematical fact that any normal (Gaussian) random variable is just a linear transformation of a standardized normal random variable (Normal(0, 1)).
In other words, this model is mathematically equivalent to the first model.
However, it is philosophically much different.
Here, apple_weight and measurement are deterministic functions of some initial conditions, represented by Z1 and Z2.
Uncertainty, which we represent with the prior Normal(0, 1), is articulated only in the context of the initial conditions.
Thus we have determinism while still acknowledging we aren’t as clairvoyant as the demon.
Indeed, this modeling approach (called structural causal models) allows building more subtle deterministic physics directly in the model.
For example, perhaps rather than use a linear transform, we can describe in detail the physical mechanism in the scale that causes the measurement to be slightly off of the true value.
Let’s represent this with function g(.
# initial conditionsZ1 ~ Normal(0, 1)Z2 ~ Normal(0, 1)# determinismapple_weight = σZ1 + μmeasurement = g(apple_weight, Z2)Similarly, if you are an engineer working at Fitbit, you might know how uncertainty due to variation in human anatomy combines with the mechanical and digital components in the device to cause the reported number of steps to deviate from the actual number of steps.
Generalizing, g here can combine uncertain initial conditions with some deterministic natural relation, say for example Ohm’s Law (V = IR) or mass-energy equivalence (E = MC²).
We can even extend it to abstract “physics” such as the law of supply and demand in economics, or the rules of a game such as Go.
What about free will?Many philosophers believe determinism is not compatible with free will.
If the initial conditions of the universe determine the motions of your body, how can it be up to you whether you do a push-up or write a Medium article or mouth the words “I love you”?.They conclude that either determinism is false, or free will is an illusion.
But what about quantum mechanics?Many think that modern physics tells us that the fundamental laws of quantum mechanics are not deterministic, but probabilistic.
Some philosophers think quantum mechanical laws not only solves the problem of free will but even provide the basis of consciousness.
The philosophical implications of quantum mechanics on machine learning and artificial intelligence deserve a separate post.
However, in terms of choice of a predictive modeling approach, I believe Occam’s razor applies — it is best for your model to assume the universe is deterministic unless you are explicitly modeling some phenomenon where quantum-level variation matters.
You certainly don’t need to understand quantum gravity to model the weight of an apple.
Related readingGödel’s incompleteness theorems and the implications to building strong AI — The Tao of Data ScienceProbabilistic causal models — Stanford Encyclopedia of PhilosophyCausal determinism — Stanford Encyclopedia of PhilosophyYounger thinkers now argue that free will is real — Mind MattersYes, determinists, there Is free will.
You make choices even if your atoms don’t.