26 Regression SLOPE and INTERCEPT

26.1 Data

Here is the data set we will work with:

year	sales
16	53
17	56
18	58
19	58

The goal here will be to predict the y-value (called $sales$) from the x-value(called $year$). We can see already that as the years increase the sales increase as well. So we can roughly guess the trend by just be looking at the data.

But we can also do more than just guessing by coming up with a formula that will make our predictions for us when we plug in the year.

In order to do that we have to come up with an equation that will allow us to plug in an x-value ($year$) and get out a y-value ($sales$). This is what we mean by prediction.

In our case we will use a straight line equation, so that is why this process is called linear regression, since linear means “using a line”.

First lets see a scatterplot of the data:

So start by copying the data into your spreadsheet in columns A and B:

	A	B	C	D	E	F
1	year	sales
2	16	53
3	17	56
4	18	58
5	19	58
6
7
8

26.2 Finding Slope and Intercept of Regression Line

Next lets compute the slope and intercept using the functions SLOPE and INTERCEPT in a spreadsheet. Make sure you enter the y cell range in first and then the x cell range when you are using SLOPE and INTERCEPT. And in fact most “regression” formulas for a spreadsheet require y-values range first and x-values range second.

Okay so here is what using the functions SLOPE and INTERCEPT look like. Slope is labeled m and intercept is labeled b:

	A	B	C	D	E	F
1	year	sales
2	16	53		m	=SLOPE(B2:B5,A2:A5)
3	17	56		b	=INTERCEPT(B2:B5,A2:A5)
4	18	58
5	19	58
6
7
8

Here are the values for after using the formulas:

	A	B	C	D	E	F
1	year	sales
2	16	53		m	1.7
3	17	56		b	26.5
4	18	58
5	19	58
6
7
8

From this we see that the regression equation is given below:

\[ sales= 1.7(year) + 26.5 \]

26.3 Making Predictions Using the Regression Equation

Next we will make some predictions using the equation above. We want to make predictions for the following values:

year	sales
20	?
21	?

So first we set up a part of the spreadsheet to hold our predictions. We call this sections “predictions”. Then we list the x-values that will make predictions for. Then along side that we fill in a formula for the regression equation. We use the entries for the slope and the intercept that we found above:

	A	B	C	D	E	F
1	year	sales	predictions
2	16	53		m	1.7
3	17	56		b	26.5
4	18	58
5	19	58
6	20		=$E$2*A6+$E$3
7	21		=$E$2*A7+$E$3
8

Here you should use absolute references for the slope and intercept since if you create your predictions by dragging you do not want the formula to update out of the cells where the m and b are.

Finally here are the predictions:

	A	B	C	D	E	F
1	year	sales	predictions
2	16	53		m	1.7
3	17	56		b	26.5
4	18	58
5	19	58
6	20		60.5
7	21		62.2
8

26.4 A Plot of the Predictions

Below we show the regression line

\[ sales= 1.7(year) + 26.5 \]

and the predictions for $year=20$ and $year=21$

The predictions were made by plugging in the x-values into the equation for the line. Here are the details of one of the calculations so you can see what the spreadsheet is doing:

For $year=20$:

\[ sales= 1.7(20) + 26.5 = 60.5 \]

So that point is at $(20,60.5)$ on the line.

Remember these predictions might not be perfect, but they match the trend that we see from the straight line we came up with.

The line we are using is called the regression line or the line of best fit or the line of least squares. It goes by a few different names, we will usually call it the regression line.

The software comes up with it by fitting a line as best it can between the points so that it is not too far away from them and follows the pattern they form. This is where the term best fit comes from as well.