We will be given data points let’s say Diabetes vs Obesity.
So as per the CDC data, we have the percentage of people suffering from diabetes for each country which is differentiated by FIPS number.
What we can do is to check whether obesity is affecting diabetes or not.
So obesity will be the independent variable and diabetes will be the dependent variable.
In order to plot linear regression for diabetes against obesity.
Let’s see if we have both the points for common areas
X(Obesity)={Xp1,Xp2,Xp3……..}
Y(Diabetes)={Yp1,Yp2,Yp3,……..}
Now we must find a line that best represents the above values
line equation-.
Yi=mXi+c
where c is the y-intercept. Let’s say we draw a line from the origin so c=0
Yi=mXi
Now we have to select different slopes {m1,m2,m3……mi} or select different angles from where the line is to be drawn in the first quadrant
After that, we have to calculate the mean difference of points from the line we draw that is {Y1, Y2, Y3, Y4} to points that we obtain from the data set here it is Obesity percentage {Yp1, Yp2, Yp3, Yp4…..}
This mean difference is called a Cost function/Least Error Squared Function/Residual Error
The formula of the cost function is given by
J(mi)=1/2n*(sum of (Ypi-Yi)^2 till i reaches n)
where n is the number of points