There is a difference between normalizing and standardizing data. Both are common steps in preprocessing feature variables for a machine learning model.

Standardizing centers the data by subtracting the mean, and scales by the standard deviation. This centers the data around 0 as the mean, with unit variance. This is typically what is used to preprocess feature vectors in machine learning.

Normalizing scales the data between 0 and 1. Note that normalizing will lose outliers.

Scikit-learn in python has some great preprocessing modules for standardizing and normalization.

Standardize to 0 mean with unit variance

sklearn.preprocessing.scale(X, axis=0, with_mean=True, with_std=True, copy=True)

Normalize to (0,1) range.

sklearn.preprocessing.normalize(X, norm='l2', axis=1, copy=True, return_norm=False)

### Like this:

Like Loading...

## Published by Claire

Data Scientist and Statistician
View all posts by Claire

Why standardise? To reduce the impact of outliers?

LikeLike