Saturday, April 27, 2019

Linear Regression Implementation with python


Hello all, I hope from last few posts you already have good theoretical concept about the Machine Learning Algorithms. Today, we will do a Simple Linear Regression implementation with python. It won’t take much time and I will try to explain every step with simple words.
It is called Simple Linear Regression as it considers only one feature of input data and make the prediction. For example, here we will consider a housing price data set. As it is Simple Regression, it will only consider the size of the house to predict the price of it. But Multiple Regression, to predict the price it may consider several features such as locality, Front/back facing house etc. Below is the input data which we will use for the prediction, here house_size(x) is the input ranging from 1k sqr meter to 14k sqr meter and price(y) of the house ranging from 300 to 1100 dollar.






A scattered plot of the housing data looks like this:


Now we must find a line, which fits this scattered plot known as Regression line, so that we can predict house price for any given size(x). The equation for the Regression line looks like this

-          h(x_ith)= B0 + B1*x_ith

where, h(x_ith) represents prediction for x_ith and B0,B1 are the regression coefficients. To make the prediction, we need to estimate the regression coefficients (B0, B1). For implementation we need to follow the below steps:
  • Step1: Import the libraries. NumPy is the fundamental package for scientific computing with Python. In machine learning as we need to deal with a huge amount of data we use NumPy, which is faster than normal array. Matplotlib is a plotting library in python, we will use it for visualization.

  • Step2: Take the mean of the house_size(x) and the price(y). Calculate cross-deviation and deviation by calculating Sum of Squared Errors.
  • Step3: Calculate regression coefficients or the prediction error(explained in previous block here:  )
  • Step4: Plot the scattered points on the graph with red colors. The x-axis represents the size of the house(house_size) and the y-axis represents the price. (figure above)

  • Step5: Predict the regression line with minimum error and plot it with purple color.

  • Step6: Lastly, write the main and call the main function. And the final output of the code is

Estimated coefficients:
b_0 = 295.95147839272175 
b_1 = 57.31614859742229
                And the graph should look like this-


You can download the full code(linearRegression.py) from github here: source code
Hope you enjoyed today’s post. Stay tuned for more python implementation. Do let me know your feedbacks and comments below.
I want to share a good news, my blog was featured in the top4 machine learningblogs, please look at number 19 here: https://blog.feedspot.com/machine_learning_blogs/

Next blog:Harry Potter's magical Cloak with opencv

No comments:

Post a Comment