# Optimizing Individual-Level Models for Group-Level Predictions

(Caveat: Fourier duality suffers from a plurality of conventions due to the identification of the group of real numbers with its dual group being “unnatural”, in the Category Theory sense of that word.

We will follow the convention that the Fourier transform of a function f(t) isand the reverse transform of a function f(t) isWith this convention, the reverse transform of the characteristic function of X is the pdf of X, if one exists.

)3.

Finally,this is Levy’s Continuity Theorem.

The standard proof of the CLT follows exactly this approach.

By normalizing first, you can reduce CLT to:One proves this by showing thatpoint-wise.

The crucial step is thatand therefore:, where the last step is because the X_i’s are all identically distributed.

To simplify notation let X:=X_1.

Showing thatpointwise is relatively easy, though the details are irrelevant to the rest of the discussion.

The idea behind Edgeworth expansions, which is the mathematical tool that Hall uses as motivation (albeit just stops short of using directly), is similar:To be more specific, what one does is write a Maclaurin expansion of Log of the characteristic function of X, where Log is the principal branch of log, asFor those of you who like naming things, these κ_j’s are usually called the “cumulants” of X.

Generally speaking, cumulants don’t have an intuitive interpretation, but the first three do: κ_1 = E(X), κ_2= V(X), and κ_3 = E((X — E(X))³).

In particular, for us: κ_1 = 0 and κ_2 = 1.

Thus the chosen expansion of the characteristic function of X is then:Plug that intoand getThis last equality is simply using the Maclaurin expansion of eˣ; each r_j is a polynomial with real coefficients of degree 3j, and can be computed if need be.

It now just remains to apply a reverse Fourier transform to get an expansion of the probability density of S_n.

This is relatively easy — there’s a trick:(Why? The degree 0 case is standard, and the monomial case follows by repeatedly deriving the degree 0 case; the general case follows the monomial case trivially.

) The trick continues: deriving the probability density function of the standard normal distribution j times is actually easy to compute (by commuting the derivative and the integral), and for any j it’s always some polynomial times the pdf of the standard normal.

(The polynomials arising in this way are called Hermite polynomials, but it is not crucial to determine their properties for our purposes.

)  The bottom line of all of this is that:So we just got an estimate of the error of the Central Limit Theorem!  Weeeellll….

It turns out Edgeworth expansions almost never converge… So the above is more of the motivational bedtime-story we tell about Edgeworth expansion.

But as it turns out, under extremely mild conditions, Edgeworth expansions are “asymptotic expansions”, which is to say that for every k:uniformly in x so long as E(|X|^(k+2))<∞ and X satisfies a condition called “Cramer’s Condition”, which is a weaker condition than having a continuous pdf.

Proving this requires a bit of heavy lifting, and I refer to Peter Hall’s book “The Bootstrap and Edgeworth Expansion”², Section 2.

4 and Chapter 5, for details.

From Edgeworth expansions to an approximation of the medianIn Hall’s paper the equivalent of this entire section is the phrase “it follows that”, with no explanation given.

As you will see, the argument requires some nuance.

But the details do work out…! Let:Sinceit follows that:Let’s plug that into the Edgeworth expansion above:Note that since pdf of the standard normal is exponential, the numerator ofis bounded, and therefore this fraction converges to 0.

This implies thatand therefore:(We ultimately want to show that m_n converges, but this is a first step that is required for the remainder of the proof.

)  Let’s now use the Maclaurin expansion:Its radius of convergence is infinity.

It follows that:It can be shown that the p_1(x) term in the Edgeworth expansion is equal -κ_3(x²-1)/6, hence:In particular:Therefore we have that:Which, after multiplying by the square root of n implies that:We’re ready for the kill now.

Using the Maclaurin expansion:We can see that:Which in particular is:Therefore as n approaches infinity:and we’re done!Final note on a Hall-like result for non-i.

i.

dSomewhat understandably, there are no asymptotic results about the median of a sum of independent but not identically distributed random variables.

However, surprisingly, there does exist a variant of Edgeworth expansions that is available for non-i.

i.

d random variables that is ideal under some rather complex definition of being ideal.

(See “Edgeworth Expansions of Distribution Functions of Independent Random Variables” by Bai Zhidong and Zhao Lincheng³.

) It is conceivable that one can follow the recipe above to go from the Edgeworth expansion to an estimate of the median.

Originally posted in LumiataIf this post got your interest and you would like to challenge yourself with similar problems, we are hiring!.Please check our open positions.

GitHub: https://github.

com/lumiata/tech_blogVisit Lumiata at www.

lumiata.

com and follow on Twitter via @lumiata.

Find us on LinkedIn: www.

com/company/lumiataCitations:1.

Hall, P.

(1980) On the limiting behaviour of the mode and median of a sum of independent random variables.

Ann.

Probability 8 419–430.

2.

Hall, P.

(1992) The Bootstrap and Edgeworth Expansion.

Springer-Verlag,New York.

3.

Bai, Z.

and Zhao, L.

(1984) Edgeworth Expansions of Distribution Functions of Independent Random Variables.

Scientia Sinica 29 1–22.

.