Machine Learning Basics-Linear Regression

4 min readJul 18, 2019

We have learnt about a lot of machine learning classification algorithms in my previous blogs. In this blog, we are going to look into Regression and where we apply these algorithms.

Classification vs Regression:

In classification algorithms, the output is a set of classes. For example, if you want to predict what the whether will be like today, then the classes will be Rainy, Cloudy, Hot, Cold and Dry. But if you want to predict the temperature today, it would be a numeric value like 70 degrees Fahrenheit. This is a regression problem, where you don’t have specific classes as output, but a numeric value as output. Some other examples of regression are predicting house price, predicting bike sales, predicting car price, etc.

Linear Regression:

If we consider predicting the TV sales based on the price of the TV, then the price is called independent variable and TV sales is called the dependent variable. If we try to plot a graph between the two and we can see the data is linearly separable, then we can solve the problem using linear regression. The equation would be nothing but a line equation y=mx+c, but we will represent it also as

where βo is the intercept and β1 is the slope.

Take a look at the above figure. When we plot all the points in a graph, we find that the data is linearly separable. But the problem arises where we can fit the line. We can draw a straight line in parallel above or below the blue line also. But, would that be the best fit? To find the best fit line, we need to minimize the loss encountered. We calculate the overall loss for a simple linear regression problem as below,

The residual sum of squares is nothing but the sum of squares of the difference between the predicted value and actual value. This is nothing but the sum of losses. The RSS for multiple linear regression problem would be as follows

For any model, we want this RSS value to be as low as possible, meaning the predicted value is very close to the actual value.

But there is a problem with RSS. Let us consider you want to predict the weight of the person given the height. If you created the model using weight in kgs, but now if we want to compute it in lbs, then the value of RSS would differ. To avoid this, we use R² metric for evaluation.

R² Loss (Coefficient of Determination):

Before we consider the formula for R², let us learn another metric called TSS(Total Square Error). The TSS is computed based on the worst approximation line you can draw for your model. So for the above figure, if we wanted the worst line, it would be a straight line parallel to x axis such that the line passes through the median of all the points. In such a case, the RSS would be sum of difference between predicted value and the mean of all the points.

R² can be defined as R²=1-(RSS/TSS). Now, even if we change the units of our variables, the value of R² will not change. We want our RSS to be as low as possible and TSS would be the worst RSS. So we want the value of (RSS/TSS) to be as low as possible and the value of R² much closer to 1 for a good model. If the value of R² is close to 0, then our model is performing very poor.

Residual Standard Error (RSE):

This is another metric with which we evaluate our model. It is linked to the RSS and is defined as

where (n-2) is called the degree of freedom and ’n’ is the number of data points in the model. Since this is linked to RSS, we will have the same disadvantages of RSS. It is better to use R² metric for evaluating your model.

Conclusion:

Linear Regression is widely used in a lot of economics applications like predicting imports and exports of a country, predicting sales, predicting investments, labor supply, etc. If your data is linear, this model would be very effective in predictions.

Hope you enjoyed reading my blog. Leave your comments below or contact me via LinkedIn. Hit the clap icon if you enjoyed reading my article :)