Why feature scaling is performed in certain Algorithms of ML and DL?

What is Feature Scaling?

In simple words, it is a way of transforming your data into common range of values. Now there are two ways of performing feature scaling.

1) Standardizing

2) Normalizing

Standardizing

Standardizing is completed by taking each value of your column, subtracting the mean of the column, and then dividing by the standard deviation of the column. In Python, let's say you have a column in your dataframe called weight You could create a standardized weight as:

`df["weight_standard"]= \frac{(df['weight'] - df['weight'].mean())}{df['weight'].std()}`

This will create a new "standardized" column where each value is a comparison to the mean of the column, and a new, standardized value can be interpreted as the number of standard deviations the original weight was from the mean. This type of feature scaling is by far the most common of all techniques

Normalizing

A second type of feature scaling that is very popular is known as normalizing. With normalizing, data are scaled between 0 and 1. Using the same example as above, we could perform normalizing in python in the following way:

`df["weight_normal"]=\frac{df['weight']-df['weight'].min()}{df['weight'].max()-df['weight'].min()}`

When Should I Use Feature Scaling?

In many machine learning algorithms, the result will change depending on the units of your data. This is especially true in two specific cases:

When your algorithm uses a distance based metric to predict.
When you incorporate regularization.

Distance Based Metrics

One common supervised learning technique that is based on the distance points are called Support Vector Machines (or SVMs). Another technique that involves distance based methods to determine a prediction is k-nearest neighbors (or KNN). With either of these techniques, choosing not to scale your data may lead to drastically different (and likely misleading) ending predictions.

For this reason, choosing some sort of feature scaling is necessary with these distance based techniques.

Now previously i talked about regularization! So what is it?

Regularization

When you start introducing regularization, you will again want to scale the features of your model. The penalty on particular coefficients in regularized linear regression techniques depends largely on the scale associated with the features. When one feature is on a small range, say from 0 to 10, and another is on a large range, say from 0 to 1 000 000, applying regularization is going to unfairly punish the feature with the small range. Features with small ranges need to have larger coefficients compared to features with large ranges in order to have the same effect on the outcome of the data. (Think about how $ab = ba$ for two numbers $a$ and $b$ .) Therefore, if regularization could remove one of those two features with the same net increase in error, it would rather remove the small-ranged feature with the large coefficient, since that would reduce the regularization term the most.

Again, this means you will want to scale features any time you are applying regularization.

Feature Scaling in Regularization : Quora

Another point of discussion in the above article is the fact that feature scaling can also speed up convergence of your machine learning algorithms, which you may have to think about as you scale machine learning applications.

Linkedin: Vaibhav Saini