paramonte.vis.kde2d

Module Contents

Functions

kde2d(x, y, n=256, limits=None)

Return the 2d density map from discrete observations via

paramonte.vis.kde2d.kde2d(x, y, n=256, limits=None)[source]

Return the 2d density map from discrete observations via 2-dimensional diffusion Kernel density estimation.

First the input data is binned. After binning, the function determines the optimal bandwidth according to the diffusion-based method. It then smooths the binned data over the grid using a Gaussian kernel with a standard deviation corresponding to that bandwidth.

This module is based on the KDE estimation method of

Z. I. Botev, J. F. Grotowski, D. P. Kroese: Kernel density estimation via diffusion. Annals of Statistics 38 (2010), no. 5, 2916–2957. doi:10.1214/10-AOS799

and

John Hennig DOI: 10.5281/zenodo.3830437

Parameters

x

A lists of array of numbers that represent discrete observations of a random variable with two coordinate components. The observations are binned on a grid of n*n points, where n must be a power of 2 or will be coerced to the next one. If x and y are not the same length, the algorithm will raise a ValueError.

y

A lists of array of numbers that represent discrete observations of a random variable with two coordinate components. The observations are binned on a grid of n*n points, where n must be a power of 2 or will be coerced to the next one. If x and y are not the same length, the algorithm will raise a ValueError.

n (optional)

The number of grid points. It must be a power of 2. Otherwise, it will be coerced to the next power of two. The default is 256.

limits (optional)

Data limits specified as a tuple of tuples denoting ((xmin, xmax), (ymin, ymax)). If any of the values are None, they will be inferred from the data. Each tuple, or even both of them, may also be replaced by a single value denoting the upper bound of a range centered at zero. The default is None.

Returns

A tuple whose elements are the following:

density

The density map of the data.

grid

The grid at which the density is computed.

bandwidth

The optimal values (per axis) that the algorithm has determined. If the algorithm does not converge, it will raise a ValueError.