ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
pm_distanceHellinger Module Reference

This module contains classes and procedures for computing the Hellinger statistical distance between two probability distributions. More...

Data Types

interface  getDisHellSq
 Generate and return the square of the Hellinger distance of two univariate (discrete or continuous) distributions. More...
 

Variables

character(*, SK), parameter MODULE_NAME = "@pm_distanceHellinger"
 

Detailed Description

This module contains classes and procedures for computing the Hellinger statistical distance between two probability distributions.

The Hellinger distance (which is also closely related to the Bhattacharyya distance) is used to quantify the similarity between two probability distributions.
The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.
It is also sometimes called the Jeffreys distance.

Definition using Measure Theory

Let \(P\) and \(Q\) denote two probability measures on a measure space \(\mathcal{X}\) that are absolutely continuous with respect to an auxiliary measure \(\lambda\).
The square of the Hellinger distance between \(P\) and \(Q\) is defined as the quantity,

\begin{equation} H^{2}(P, Q) = {\frac{1}{2}} \int_{\mathcal{X}} \left( {\sqrt{p(x)}} - {\sqrt {q(x)}} \right)^{2} \lambda(dx) ~. \end{equation}

where, \(P(dx) = p(x)\lambda(dx)\) and \(Q(dx) = q(x)\lambda(dx)\), that is \(p\) and \(q\) are the Radon–Nikodym derivatives of \(P\) and \(Q\) respectively with respect to \(\lambda\).
This definition does not depend on \(\lambda\), that is, the Hellinger distance between \(P\) and \(Q\) does not change if \(\lambda\) is replaced with a different probability measure with respect to which both \(P\) and \(Q\) are absolutely continuous.
For compactness, the above formula is often written as,

\begin{equation} H^{2}(P,Q) = {\frac{1}{2}}\int_{\mathcal {X}}\left({\sqrt {P(dx)}}-{\sqrt {Q(dx)}}\right)^{2} ~. \end{equation}

Definition using Probability Theory

To define the Hellinger distance in terms of elementary probability theory, let \(\lambda\) be the Lebesgue measure, so that \(f = \frac{dP}{d\lambda}\) and \(q = \frac{dQ}{d\lambda}\) are simply probability density functions.
The squared Hellinger distance can be then expressed as a standard calculus integral,

\begin{equation} H^{2}(f,g) = {\frac {1}{2}}\int \left({\sqrt {f(x)}}-{\sqrt {g(x)}}\right)^{2}\,dx=1-\int {\sqrt {f(x)g(x)}}\,dx ~, \end{equation}

where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals \(1\).

The Hellinger distance \(H(P, Q)\) satisfies the property (derivable from the Cauchy–Schwarz inequality),

\begin{equation} 0\leq H(P,Q)\leq 1 ~. \end{equation}

Definition for Discrete distributions

For two discrete probability distributions \(P = (p_{1}, \ldots , p_{k})\) and \(Q = (q_{1},\ldots ,q_{k})\), their Hellinger distance is defined as,

\begin{equation} H(P,Q) = {\frac {1}{\sqrt {2}}}\;{\sqrt {\sum _{i=1}^{k}({\sqrt {p_{i}}}-{\sqrt {q_{i}}})^{2}}} ~, \end{equation}

which is directly related to the Euclidean norm of the difference of the square root vectors,

\begin{equation} H(P,Q) = {\frac {1}{\sqrt {2}}}\;{\bigl \|}{\sqrt {P}}-{\sqrt {Q}}{\bigr \|}_{2} ~. \end{equation}

It follows that,

\begin{equation} 1 - H^{2}(P, Q) = \sum_{i=1}^{k}{\sqrt{p_{i} q_{i}}} ~. \end{equation}

Properties of the Hellinger Distance

  1. The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.
  2. The maximum distance \(1\) is achieved when \(P\) assigns probability zero to every set to which \(Q\) assigns a positive probability, and vice versa.
  3. Sometimes the factor \(\frac{1}{2}\) in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.
  4. The Hellinger distance is related to the Bhattacharyya coefficient \(BC(P,Q)\) as it can be defined as,

    \begin{equation} H(P,Q) = {\sqrt{1 - BC(P,Q)}} ~. \end{equation}

  5. Hellinger distances are used in theory of sequential and asymptotic statistics.
  6. The squared Hellinger distance between two normal distributions \(P\sim{\mathcal{N}}(\mu_{1}, \sigma_{1}^{2})\) and \(Q\sim{\mathcal{N}}(\mu_{2},\sigma_{2}^{2})\) is,

    \begin{equation} H^{2}(P,Q) = 1 - {\sqrt {\frac {2\sigma_{1}\sigma_{2}}{\sigma_{1}^{2}+\sigma _{2}^{2}}}}\,e^{-{\frac {1}{4}}{\frac {(\mu _{1}-\mu _{2})^{2}}{\sigma _{1}^{2}+\sigma _{2}^{2}}}} ~. \end{equation}

  7. The squared Hellinger distance between two multivariate normal distributions \(P\sim {\mathcal {N}}(\mu _{1},\Sigma _{1})\) and \(Q\sim {\mathcal {N}}(\mu _{2},\Sigma _{2})\) is,

    \begin{equation} H^{2}(P,Q)=1-{\frac {\det(\Sigma _{1})^{1/4}\det(\Sigma _{2})^{1/4}}{\det \left({\frac {\Sigma _{1}+\Sigma _{2}}{2}}\right)^{1/2}}}\exp \left\{-{\frac {1}{8}}(\mu _{1}-\mu _{2})^{T}\left({\frac {\Sigma _{1}+\Sigma _{2}}{2}}\right)^{-1}(\mu _{1}-\mu _{2})\right\} ~. \end{equation}

  8. The squared Hellinger distance between two exponential distributions \(P\sim \mathrm {Exp} (\alpha )\) and \(Q\sim \mathrm {Exp} (\beta )\) is,

    \begin{equation} H^{2}(P,Q) = 1 - {\frac {2{\sqrt {\alpha \beta }}}{\alpha +\beta }} ~. \end{equation}

  9. The squared Hellinger distance between two Weibull distributions \(P\sim \mathrm {W} (k, \alpha)\) and \(Q\sim \mathrm {W} (k, \beta)\), where \(k\) is a common shape parameter and \(\alpha\) and \(\beta\) are the scale parameters respectively, is,

    \begin{equation} H^{2}(P, Q) = 1 - {\frac {2(\alpha \beta )^{k/2}}{\alpha ^{k}+\beta ^{k}}} ~. \end{equation}

  10. The squared Hellinger distance between two Poisson distributions with rate parameters \(\alpha\) and \(\beta\), so that \(P\sim \mathrm{Poisson}(\alpha)\) and \(Q\sim \mathrm {Poisson} (\beta)\), is,

    \begin{equation} H^{2}(P, Q) = 1 - e^{-{\frac {1}{2}}({\sqrt {\alpha }}-{\sqrt {\beta }})^{2}} ~. \end{equation}

  11. The squared Hellinger distance between two beta distributions \(P\sim {\text{Beta}}(a_{1},b_{1})\) and \(Q\sim {\text{Beta}}(a_{2},b_{2})\) is,

    \begin{equation} H^{2}(P,Q) = 1 - {\frac {B\left({\frac {a_{1}+a_{2}}{2}},{\frac {b_{1}+b_{2}}{2}}\right)}{\sqrt {B(a_{1},b_{1})B(a_{2},b_{2})}}} ~, \end{equation}

    where \(B\) represents the beta function.
  12. The squared Hellinger distance between two gamma distributions \(P\sim {\text{Gamma}}(a_{1}, b_{1})\) and \(Q\sim {\text{Gamma}}(a_{2},b_{2})\) is,

    \begin{equation} H^{2}(P, Q) = 1 - \Gamma\left({\scriptstyle{\frac {a_{1}+a_{2}}{2}}}\right)\left({\frac {b_{1}+b_{2}}{2}}\right)^{-(a_{1}+a_{2})/2}{\sqrt {\frac {b_{1}^{a_{1}}b_{2}^{a_{2}}}{\Gamma (a_{1})\Gamma (a_{2})}}} ~, \end{equation}

    where \(\Gamma\) is the gamma function.

Connection with Total Variation Distance (TVD)

The Hellinger distance \(H(P, Q)\) and the total variation distance (or statistical distance) \(\delta(P,Q)\) are related as follows,

\begin{equation} H^{2}(P, Q)\leq \delta(P, Q)\leq {\sqrt{2}}H(P, Q) ~. \end{equation}

These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

See also
pm_distanceBhat
pm_distanceEuclid
pm_distanceHellinger
pm_distanceKolm
pm_distanceMahal
Test:
test_pm_distanceHellinger


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, March 22, 2012, 2:21 PM, National Institute for Fusion Studies, The University of Texas at Austin

Variable Documentation

◆ MODULE_NAME

character(*, SK), parameter pm_distanceHellinger::MODULE_NAME = "@pm_distanceHellinger"

Definition at line 162 of file pm_distanceHellinger.F90.