topic badge

6.03 Line of best fit

Lesson

When we display bivariate data that appears to have a linear relationship, we often wish to find a line that best models the relationship so we can see the trend and make predictions. We call this the line of best fit.

 

Exploration

We want to draw a line of best fit for the following scatterplot:

Let's try drawing three lines across the data and consider which is most appropriate.

We can tell straight away that $A$A is not the right line. This data appears to have a positive linear relationship, but $A$A has a negative gradient. $B$B has the correct sign for its gradient, and it passes through three points! However, there are many more points above the line than below it, and we should try to make sure the line of best fit passes through the centre of all the points. The means that line $C$C is the best fit for this data out of the three lines.

 

Drawing a line of best fit by eye

  • One method is to draw an oval around the points on the scatterplot, then cut the oval in half with a line.
  • The line may pass exactly through all of the points, some of the points, or none of the points.
  • It always represents the general trend of the of the data (increasing or decreasing).
  • The number of points above the line should be the same as the number of points below the line.
  • You should generally ignore outliers (points that fall very far from the rest of the data) as they can skew the line of best fit. 

Below is an example of what a good line of best fit might look like.

 

Practice questions

Question 1

The following scatter plot shows the data for two variables, $x$x and $y$y.

  1. Determine which of the following graphs contains the line of best fit.

    A

    B

    C

    D

Question 2

The following scatter plot shows the data for two variables, $x$x and $y$y.

A scatter plot with an $x$x-axis labeled from $0$0 to $10$10 and a $y$y-axis labeled from $0$0 to $10$10. Both axes are in increments of 1 unit. Gray gridlines divide the plane into square units. Nine points are plotted on the coordinate plane: $\left(1,2\right)$(1,2), $\left(2,1\right)$(2,1), $\left(3,3\right)$(3,3), $\left(4,5\right)$(4,5), $\left(5,6\right)$(5,6), $\left(6,5\right)$(6,5), $\left(7,7\right)$(7,7), and $\left(8,7\right)$(8,7). The points are plotted as solid blacks dots but the coordinates are not explicitly labeled nor stated in this problem.
  1. Determine which of the following graphs contains the line of best fit.

    A scatter plot with an $x$x-axis labeled from $0$0 to $10$10 and a $y$y-axis labeled from $0$0 to $10$10. Both axes are in increments of $1$1 unit. Gray gridlines divide the plane into square units. Nine points are plotted on the grid: $\left(1,2\right)$(1,2), $\left(2,1\right)$(2,1), $\left(3,3\right)$(3,3), $\left(4,5\right)$(4,5), $\left(5,6\right)$(5,6), $\left(6,5\right)$(6,5), $\left(7,7\right)$(7,7), and $\left(8,7\right)$(8,7). A green line passes through the graph at an upward diagonal slope that follows the trend of the points, starting at point approximately $coord(0,0.7)$coord(0,0.7) and extending near the top-right corner at point approximately $coord(10,9.2)$coord(10,9.2). Points $\left(1,2\right)$(1,2),$\left(4,5\right)$(4,5),$\left(5,6\right)$(5,6), and $\left(7,7\right)$(7,7) are above the green line, while points $\left(2,1\right)$(2,1), $\left(3,3\right)$(3,3), $\left(6,5\right)$(6,5) and $\left(8,7\right)$(8,7) are below the green line. The points are plotted as solid blacks dots but the coordinates are not explicitly labeled nor stated in this problem.
    A
    A scatter plot with an $x$x-axis labeled from $0$0 to $10$10 and a $y$y-axis labeled from $0$0 to $10$10. Both axes are in increments of $1$1 unit. Gray gridlines divide the plane into square units. Nine points are plotted on the grid: $\left(1,2\right)$(1,2), $\left(2,1\right)$(2,1), $\left(3,3\right)$(3,3), $\left(4,5\right)$(4,5), $\left(5,6\right)$(5,6), $\left(6,5\right)$(6,5), $\left(7,7\right)$(7,7), and $\left(8,7\right)$(8,7). A green line passes through the graph at an upward diagonal slope, starting at point approximately $coord(0,1.1)$coord(0,1.1) and extending near the top-right corner at point approximately $coord(10,9.7)$coord(10,9.7). Point $\left(1,2\right)$(1,2) lies on the green line, while points $\left(4,5\right)$(4,5), and $\left(5,6\right)$(5,6) are above the green line, and points $\left(2,1\right)$(2,1), $\left(3,3\right)$(3,3), $\left(6,5\right)$(6,5), $\left(7,7\right)$(7,7) and $\left(8,7\right)$(8,7) are below the green line. The points are plotted as solid blacks dots but the coordinates are not explicitly labeled nor stated in this problem.
    B
    A scatter plot with an $x$x-axis labeled from $0$0 to $10$10 and a $y$y-axis labeled from $0$0 to $10$10. Both axes are in increments of $1$1 unit. Gray gridlines divide the plane into square units. Nine points are plotted on the grid: $\left(1,2\right)$(1,2), $\left(2,1\right)$(2,1), $\left(3,3\right)$(3,3), $\left(4,5\right)$(4,5), $\left(5,6\right)$(5,6), $\left(6,5\right)$(6,5), $\left(7,7\right)$(7,7), and $\left(8,7\right)$(8,7). A green line passes through the graph at an upward diagonal slope, starting at point approximately $coord(0,0.1)$coord(0,0.1) and extending near the top-right corner at point approximately $coord(10,8.8)$coord(10,8.8). Point $\left(8,7\right)$(8,7) lies on the green line, while points $\left(1,2\right)$(1,2), $\left(3,3\right)$(3,3), $\left(4,5\right)$(4,5), $\left(5,6\right)$(5,6), and $\left(7,7\right)$(7,7) are above the green line, and points $\left(2,1\right)$(2,1), and $\left(6,5\right)$(6,5) are below the green line. The points are plotted as solid blacks dots but the coordinates are not explicitly labeled nor stated in this problem.
    C
    A scatter plot with an $x$x-axis labeled from $0$0 to $10$10 and a $y$y-axis labeled from $0$0 to $10$10. Both axes are in increments of $1$1 unit. Gray gridlines divide the plane into square units. Nine points are plotted on the grid: $\left(1,2\right)$(1,2)$\left(2,1\right)$(2,1)$\left(3,3\right)$(3,3)$\left(4,5\right)$(4,5)$\left(5,6\right)$(5,6)$\left(6,5\right)$(6,5)$\left(7,7\right)$(7,7), and $\left(8,7\right)$(8,7).  A green line passes through the graph at an upward diagonal slope but does not follow the trend of the points, starting at point approximately $coord(0,1.5)$coord(0,1.5) and extending near the top-right corner at point approximately $coord(10,8.2)$coord(10,8.2). Points $\left(4,5\right)$(4,5),$\left(5,6\right)$(5,6), $\left(7,7\right)$(7,7) and $\left(8,7\right)$(8,7) are above the green line, while points $\left(1,2\right)$(1,2), $\left(2,1\right)$(2,1), $\left(3,3\right)$(3,3) and $\left(6,5\right)$(6,5) are below the green line. The points are plotted as solid blacks dots but the coordinates are not explicitly labeled nor stated in this problem.
    D
  2. Use the line of best fit to estimate the value of $y$y when $x=4.5$x=4.5.

    A scatter plot with an $x$x-axis labeled from $0$0 to $10$10 and a $y$y-axis labeled from $0$0 to $10$10. Both axes are in increments of $1$1 unit. Gray gridlines divide the plane into square units. Nine points are plotted on the grid: $\left(1,2\right)$(1,2), $\left(2,1\right)$(2,1), $\left(3,3\right)$(3,3), $\left(4,5\right)$(4,5), $\left(5,6\right)$(5,6), $\left(6,5\right)$(6,5), $\left(7,7\right)$(7,7), and $\left(8,7\right)$(8,7). A green line with an equation of $y=(78/90)*x+(54/90)$y=(78/90)*x+(54/90) passes through the graph at an upward diagonal slope that follows the trend of the points. The equation of the line is not explicitly given nor stated in the graph and problem. Points $\left(1,2\right)$(1,2),$\left(4,5\right)$(4,5),$\left(5,6\right)$(5,6), and $\left(7,7\right)$(7,7) are above the green line, while points $\left(2,1\right)$(2,1), $\left(3,3\right)$(3,3), $\left(6,5\right)$(6,5) and $\left(8,7\right)$(8,7) are below the green line. The points are plotted as solid blacks dots but the coordinates are not explicitly labeled nor stated in this problem.

    $4.5$4.5

    A

    $5$5

    B

    $5.5$5.5

    C

    $6$6

    D
  3. Use the line of best fit to estimate the value of $y$y when $x=9$x=9.

    $6.5$6.5

    A

    $7$7

    B

    $8.4$8.4

    C

    $9.5$9.5

    D

Outcomes

3.4.15

fit a trend line by eye

3.4.16

interpret relationships in terms of the variables, for example, describe trend as increasing or decreasing

3.4.17

use the trend line to make predictions, both by interpolation and extrapolation

What is Mathspace

About Mathspace