ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
pm_sampleNorm Module Reference

This module contains classes and procedures for normalizing univariate or multivariate samples by arbitrary amounts along specific directions. More...

Data Types

interface  getNormed
 Generate a sample of shape (nsam), or (ndim, nsam) or (nsam, ndim) that is normalized by the specified input shift and scale along the specified axis dim.
More...
 
interface  setNormed
 Return a sample of shape (nsam), or (ndim, nsam) or (nsam, ndim) that is normalized by the specified input shift and scale along the specified axis dim.
More...
 
type  zscore_type
 This is the derived type whose instances are meant to signify a sample shifting by an amount equal to the negative of the sample mean and scaling the result by an amount equal to the inverse of the sample standard deviation or an equivalent measure.
More...
 

Variables

character(*, SK), parameter MODULE_NAME = "@pm_sampleNorm"
 
type(zscore_type), parameter zscore = zscore_type()
 

Detailed Description

This module contains classes and procedures for normalizing univariate or multivariate samples by arbitrary amounts along specific directions.

Normalization can have a wide variety of meanings in science.
In this module, it refers to the creation of a shifted and scaled version of a sample, where the intention is that these normalized values allow the comparison of corresponding normalized values for different datasets in a way that eliminates the effects of certain gross influences.

The procedures of this module facilitate the computation of the following popular sample normalizations, among others:

Standard score

The standard score (z-score) is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured.
Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

If the population mean and population standard deviation are known, a raw score x is converted into a standard score by,

\begin{equation} z = {x - \mu \over \sigma} ~, \end{equation}

where:

  1. \(\mu\) is the mean of the population,
  2. \(\sigma\) is the standard deviation of the population.

When the population mean and the population standard deviation are unknown, the standard score may be estimated by using the sample mean and sample standard deviation as estimates of the population values.
In these cases, the z-score is given by,

\begin{equation} z = {x - {\hat\mu} \over \hat\sigma} ~, \end{equation}

where:

  1. \(\hat\mu\) is the mean of the sample,
  2. \(\hat\sigma\) is the standard deviation of the sample.

Rescaling (min-max normalization)

Also known as min-max scaling or min-max normalization, it consists of rescaling the range of features to scale the range in \([0, 1]\) or \([−1, 1]\).
Selecting the target range depends on the nature of the data.
The general formula for a min-max of \([0, 1]\) is given as:

\begin{equation} \tilde x = \frac {x - {\text{min}}(x)}{{\text{max}}(x)-{\text{min}}(x)} ~, \end{equation}

where \(x\) is an original value and \(\tilde x\) is the normalized value.
For example, suppose that we have the students weight data, and the students weights span [160 pounds, 200 pounds].
To rescale this data, we first subtract \(160\) from each student weight and divide the result by \(40\) (the difference between the maximum and minimum weights).

To rescale a range between an arbitrary set of values \([a, b]\), the formula becomes:

\begin{equation} \tilde x = a + {\frac {(x-{\text{min}}(x))(b-a)}{{\text{max}}(x)-{\text{min}}(x)}} ~, \end{equation}

where \(a, b\) are the min-max values.

Mean normalization

\begin{equation} \tilde x = {\frac {x-{\bar {x}}}{{\text{max}}(x)-{\text{min}}(x)}} ~, \end{equation}

where \(x\) is an original value and \(\tilde x\) is the normalized value and \({\bar{x}} = {\text{average}}(x)\) is the mean of that feature vector.
There is another form of the means normalization which divides by the standard deviation which is also called standardization.

Developer Remark:
While it is tempting to add generic interfaces for automatic standard normalization of the sample (in the absence of arbitrary shift andscale` arguments), such interfaces were not added to this module for the following reasons:
  1. Why should standard normalization be the default behavior?
  2. Even though standard normalization is popular, its implementation as the default normalization in the generic interfaces of this module requires inclusion of sample weight and variance correction arguments, thus significantly complicating the interfaces of this module with little gain.
See also
pm_sampling
pm_sampleACT
pm_sampleCCF
pm_sampleCor
pm_sampleCov
pm_sampleConv
pm_sampleECDF
pm_sampleMean
pm_sampleNorm
pm_sampleQuan
pm_sampleScale
pm_sampleShift
pm_sampleWeight
pm_sampleAffinity
pm_sampleVar
Normalization
Test:
test_pm_sampleNorm


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Fatemeh Bagheri, Thursday 12:45 AM, August 20, 2021, Dallas, TX

Variable Documentation

◆ MODULE_NAME

character(*, SK), parameter pm_sampleNorm::MODULE_NAME = "@pm_sampleNorm"

Definition at line 132 of file pm_sampleNorm.F90.

◆ zscore

type(zscore_type), parameter pm_sampleNorm::zscore = zscore_type()

Definition at line 154 of file pm_sampleNorm.F90.