All smooth f-divergences are locally the same

For discrete distributions p and q, the f-divergence is defined as

Df(pq)=iqif(piqi),

where f is a convex function f:(0,)R satisfying the condition f(1)=0. If p is a variation of q, then

Df(q+dqq)=iqif(qi+dqiqi)=iqif(1+dqiqi).

Provided f is twice differentiable, we can develop it into Taylor series

f(1+dqiqi)f(1)+f(1)dqiqi+12f(1)(dqiqi)2

and thus approximate the f-divergence by a quadratic function

Df(q+dqq)12f(1)idqi1qidqi.

Comparing it with the Fisher metric, we see that it is the same quadratic form scaled by a constant factor f(1).

Note that not all f-divergences are locally the same, only the smooth ones. For example, the total variation distance corresponds to

fTV(x)=12|x1|,

which is not quadratic around x=1.

Special case: α-divergence

The f-divergence with f having the form

fα(x)=(xα1)α(x1)α(α1),αR

is known as the α-divergence. Noting that fα has the properties fα(x)=xα2 and fα(1)=1, we obtain the approximation of the α-divergence

Dα(q+dqq)12idqi1qidqi,

which directly generalizes the result of Kullback for the KL divergence and its reverse, corresponding to α=1 and α=0 respectively.

Local approximation is exact for Pearson χ2 divergence

Pearson χ2 divergence is the α-divergence with α=2, corresponding to the generating function

f2(x)=12(x1)2.

We established the following quadratic approximation of the f-divergence

Df(q+dqq)Eq[f(1)2(dqq)2]

that is valid for small dq. However, if we allow big deviations dq=pq, then we obtain Pearson χ2 divergence (scaled by f(1)),

Eq[f(1)2(pqq)2]=f(1)D2(pq).

Thus, Pearson χ2 divergence is the linear extension of the Fisher metric from a local neighborhood to the whole space. Consequently, local quadratic approximation is exact for Pearson χ2 divergence, since f2(1)=1.