topic badge

9.04 Quadratic regression

Quadratic regression

Functions can be used to model real-world events and interpret data from those events. Data that measures or compares two characteristics of a population is known as bivariate data.

When analyzing data, we previously described the relationship between two variables as linear or nonlinear. In this lesson, we will focus on nonlinear relationships that can be modeled by a quadratic function.

Exploration

Each table shown represents a different set of data.

Table 1
x00.20.40.60.811.21.41.61.8
y13743102369
Table 2
x33.544.555.566.577.58
y63687790104100112120114127127
Table 3
x012345678910
y6365615958595455535250
Table 4
x122.53456.36.87.27.48
y1.534.857.48765.542

Without creating a scatterplot:

  1. Does the data in Table 1 have a linear or quadratic relationship? Explain your answer.

  2. Does the data in Table 2 have a linear or quadratic relationship? Explain your answer.

  3. Does the data in Table 3 have a linear or quadratic relationship? Explain your answer.

  4. Does the data in Table 4 have a linear or quadratic relationship? Explain your answer.

1
2
3
4
5
6
7
8
9
x
1
2
3
4
5
6
7
8
9
y

To more easily analyze a set of data and determine if there is a quadratic relationship between the variables, we often construct a scatterplot.

Data presents a quadratic relationship if it forms a symmetric curve or parabolic shape.

The quadratic curve of best fit that approximately models the data can be calculated using technology. Most calculators will write the model in standard form (of a quadratic function) , y = ax^2 + bx + c.

If points are more tightly clustered along the model, it represents a stronger relationship between the variables.

The curve of best fit can help us make predictions or conclusions about the data. If we are given an x-value, we can predict the y-value by substituting x into the equation and solving for y.

We can also use the graph of the model to approximate x and y-values.

1
2
3
4
5
6
7
8
9
x
1
2
3
4
5
6
7
8
9
y
When x=8,\,y\approx 3
1
2
3
4
5
6
7
8
9
x
1
2
3
4
5
6
7
8
9
y
When y=7,\,x\approx 4 and 6

When anayzing the data, it is often helpful to interpret the x-intercepts or the vertex in context. For example, if the equation models a company's sales over time, the x-intercepts represent the times the company made no sales, and the vertex represents the time the highest amount of sales were made.

It is important to consider the context of the data when communicating results as the model may only be appropriate over a part of the domain.

Domain constraint

A limitation or restriction of the possible x-values, usually written as an equation, inequality, or in set notation

Examples

Example 1

For each scatterplot, determine whether the variables have a linear relationship or a quadratic relationship. If there is a relationship, describe its strength.

a
1
2
3
4
5
6
7
x
5
10
15
20
25
30
35
y
Worked Solution
Create a strategy

A relationship between two variables exists if the points follow a similar trend. The points will roughly form a line if there is a linear relationship or a parabola if there is a quadratic relationship.

To describe the strength of the relationship, we can analyze how tightly the data points are clustered or grouped together.

Apply the idea

As the x-values increase, the y-values decrease then increase, causing the points to form a U-shaped curve. This shows there is a quadratic relationship between the variables.

Because the points are tightly clustered, the relationship is strong.

b
1
2
3
4
5
6
7
8
9
x
10
20
30
40
50
60
70
80
90
y
Worked Solution
Apply the idea

As the x-values increase, the y-values decrease. This indicates there is a linear relationship between the variables.

However, the points are not tightly clustered, so the relationship between the variables is moderate.

Reflect and check

Recall that we can describe a linear relationship as positive or negative. For this data set, the relationship is negative since one variable increases and the other decreases. This implies that the equation of the line of best fit would have a negative slope.

c
2
4
6
8
10
12
14
16
18
x
5
10
15
20
25
30
35
40
45
50
y
Worked Solution
Create a strategy

A relationship between two variables exists if the points follow a similar trend. If the y-values increase and decrease over the domain, the relationship can be modeled by a quadratic function.

Apply the idea

As the x-values increase, the y-values increase then decrease, causing the points to form an upside down, U-shaped curve. This shows there is a quadratic relationship between the variables.

Because the points are not tightly clustered, the relationship is moderate.

Example 2

A conservationist tracks the population, y, of manatees that regularly visit a river over a number of years, x, (starting at zero). The data is displayed in the table:

x0123456
y65615860667490
a

Was the data most likely collected through measurement, observation, a survey or an experiment?

Worked Solution
Create a strategy

Consider whether the population was measured (with a measurement tool such as a rule or protactor) or observed. Also consider whether anyone was surveyed or whether any variables were controlled.

Apply the idea

The population of manatees was not measured, and the conservationist did not survey anyone to collect the data. The information does not specify whether any other variables were controlled, so we can assume that an experiment was not used.

The data was most likely collected by observation.

Reflect and check

Many times, populations of species are tracked using tracking devices. It is possible that the manatees each have a tracking device, and a conservationist collects data from those devices each year.

b

Determine if the manatee population over time has a quadratic relationship.

Worked Solution
Create a strategy

Construct a scatterplot to visually determine if a linear or quadratic model is a better fit.

Apply the idea

After plotting the data on a graph, we get the following scatterplot:

1
2
3
4
5
6
7
x
60
65
70
75
80
85
90
y

There is a clear curve in the pattern of the data, so a quadratic function would better fit the data.

c

Using technology, determine an appropriate equation to model the data set. Round all values to two decimal places.

Worked Solution
Create a strategy

We can use technology to calculate the quadratic regression equation. Remember that a quadratic function is a polynomial of degree 2.

To find the equation using technology, we can follow these steps:

  1. Enter the x-values and y-values in two separate columns.

  2. Highlight the data and select Two Variable Regression Analysis.

  3. Under the Regression Model drop down menu, choose Polynomial. The degree drop down menu defaults to 2, which is a quadratic function.

Apply the idea
  1. Enter the x-values and y-values in two separate columns.

    A screenshot of the GeoGebra statistics tool showing how enter a given set of data. Speak to your teacher for more details.
  2. Highlight the data and select Two Variable Regression Analysis.

    A screenshot of the GeoGebra statistics tool showing how to select the Two Variable Regression Analysis option. Speak to your teacher for more details.
  3. Under the Regression Model drop down menu, choose Polynomial. The degree drop down menu defaults to 2, which is a quadratic function.

    A screenshot of the GeoGebra statistics tool showing how to display the curve of best fit. Speak to your teacher for more details.

Rounding the values to two decimal places, we find the approximate curve of best fit is y=1.94x^2-7.75x+65.74.

d

Using the model in part (b), determine the population 10 years afer the numbers were first recorded.

Worked Solution
Create a strategy

We can find the population, y, after 10 years by substituing x=10 into the equation of the curve of best fit.

Apply the idea
\displaystyle y\displaystyle =\displaystyle 1.94x^2-7.74x+65.74State the equation
\displaystyle y\displaystyle =\displaystyle 1.94\left(10\right)^2-7.74\left(10\right)+65.74Substitute x=10
\displaystyle y\displaystyle =\displaystyle 182.34Evaluate

We can see that after 10 years, the population will have grown to about 182 manatees.

Reflect and check

Remember that the coefficients in the equation for the curve of best fit have been rounded. Rounding values reduces the accuracy of the prediction. If we had used technology to make this prediction, we would have gotten a slightly different answer.

A screenshot of the GeoGebra statistics tool showing how to use the scatterplot to predict the value of y given a value of x. Speak to your teacher for more details.

The calculator's answer is more accurate because it includes more decimal values in the coefficients and does not round them to only four place values. However, the differences between these values is small and does not change our final, rounded answer.

Example 3

Carlos is a goalie on the school soccer team. When he kicks a soccer ball dropped from his hands, he notices that the angle of trajectory for each kick is different. He also notices that there are times when the ball does not travel as far as other times. He wants to investigate this further using the data cycle.

a

Formulate a statistical question that Carlos can use for his investigation.

Worked Solution
Create a strategy

We can assume that Carlos is interested in determining the optimum angle at which he should kick a soccer ball dropped from his hands to achieve the maximum distance. There are many statistical questions we can ask, but we should focus the question around the purpose of the investigation.

Apply the idea

One possible statistical question is, "At what angle should Carlos kick the soccer ball for it to travel farthest?"

Reflect and check

Other possible questions are:

  • How does the distance the ball travels change with the angle of trajectory?

  • If Carlos kicked the ball and it traveled 130 feet, what was the ball's angle of trajectory?

  • If the ball is kicked at the optimum angle, what is the farthest distance the ball will travel?

b

Determine what variables could be used to answer the statistical question formulated in part (a).

Worked Solution
Apply the idea

The two things that Carlos would need to collect data on to answer the question are the angle of trajectory for each kick and the distance the ball travels.

Reflect and check

The angle of trajectory can impact the distance the ball travels, but the distance the ball travels cannot impact the angle of trajectory. This means the angle of trajectory is the independent variable, and the distance the ball travels is the dependent variable.

c

Carlos records 10 kicks and analyzes them to determine the angle of trajectory and also the distance traveled. His results are recorded in the table:

Angle (degrees)24303337434851566064
Distance (feet)112129138155161164158148134124

Determine if the data suggests a linear or quadratic relationship. Explain your answer.

Worked Solution
Create a strategy

We can determine if the data suggests a linear or quadratic relationship by plotting the points on a coordinate plane and determining if the data resembles a line or a parabola.

To do this using technology, we can follow these steps:

  1. Enter the x-values and y-values in two separate columns.

  2. Highlight the data and select Two Variable Regression Analysis.

Apply the idea
  1. Enter the x-values and y-values in two separate columns.

    A screenshot of the GeoGebra statistics tool showing how to enter a given set of data. Speak to your teacher for more details.
  2. Highlight the data and select Two Variable Regression Analysis.

    A screenshot of the GeoGebra statistics tool showing how to select the Two Variable Regression Analysis option. Speak to your teacher for more details.
A screenshot of the GeoGebra statistics tool showing how to create the scatterplot of a given data set. Speak to your teacher for more details.

The data has a parabolic shape which is symmetric. The y-values begin increasing, then reach a maximum value, then decrease after. This means the data has a quadratic relationship.

d

Using technology, determine an appropriate equation to model the data set.

Worked Solution
Create a strategy

We can use technology to calculate the quadratic regression equation. Remember that a quadratic function is a polynomial of degree 2.

Apply the idea
A screenshot of the GeoGebra statistics tool showing how to display the curve of best fit. Speak to your teacher for more details.

y=-0.1132x^2+10.3245x-74.5885, where x is the angle of trajectory (in degrees) and y is the distance traveled (in feet).

Reflect and check

If the instructions do not specify to round the coefficients, it is best to include all the digits given by the calculator. This increases the accuracy of the model and the predictions.

e

Draw a conclusion about the data by answering the statistical question from part (a).

Worked Solution
Create a strategy

The statistical question from part (a) is, "At what angle should Carlos kick the soccer ball for it to travel farthest?" When considering the quadratic regression model, the largest y-value represents the farthest distance traveled by the ball.

The vertex is the maximum point of the parabola and represents the angle of trajectory (x) that Carlos should kick the ball for it to travel farthest (y). We can find this angle (x-value) using the equation x=-\dfrac{b}{2a}.

Apply the idea

The vertex represents the optimum angle to kick the ball to achieve the maximum distance traveled.

The equation of the curve of best fit is y=-0.1132x^2+10.3245x-74.5885, where a=-0.1132 and b=10.3245.

\displaystyle x\displaystyle =\displaystyle -\frac{b}{2a}Equation of the x-value of the vertex
\displaystyle x\displaystyle =\displaystyle -\frac{10.3245}{2\left(-0.1132\right)}Substitute a=-0.1132,\,b=10.3245
\displaystyle x\displaystyle \approx\displaystyle 45.6Simplify

For the ball to travel farthest, Carlos would need to kick the ball at an angle of about 45.6\degree.

Reflect and check

To find the farthest distance the ball is expected to travel, we can substitute x=45.6 into the equation and solve for y.

\displaystyle y\displaystyle =\displaystyle -0.1132x^2+10.3245x-74.5885State the equation
\displaystyle y\displaystyle =\displaystyle -0.1132\left(45.6\right)^2+10.3245\left(45.6\right)-74.5885Substitute x=45.6
\displaystyle y\displaystyle \approx\displaystyle 160.8Simplify

The vertex occurs at about \left(45.6,\,160.8\right) which means that the maximum distance of 160.8 feet is achieved by kicking the ball at an angle of 45.6 \degree.

When looking at the raw data, we see that Carlos actually kicked the ball farther than this. One of his kicks traveled 164 feet when it was kicked at an angle of 48\degree. This implies that there are other factors that affect the distance the ball travels, such as the force Carlos uses to kick the ball.

Idea summary

Data presents a quadratic relationship if it forms a symmetric curve or parabolic shape.

If points are more tightly clustered along the model, it represents a stronger relationship between the variables.

Outcomes

A.ST.1

The student will apply the data cycle (formulate questions; collect or acquire data; organize and represent data; and analyze data and communicate results) with a focus on representing bivariate data in scatterplots and determining the curve of best fit using linear and quadratic functions.

A.ST.1a

Formulate investigative questions that require the collection or acquisition of bivariate data.

A.ST.1b

Determine what variables could be used to explain a given contextual problem or situation or answer investigative questions.

A.ST.1c

Determine an appropriate method to collect a representative sample, which could include a simple random sample, to answer an investigative question.

A.ST.1d

Given a table of ordered pairs or a scatter plot representing no more than 30 data points, use available technology to determine whether a linear or quadratic function would represent the relationship, and if so, determine the equation of the curve of best fit.

A.ST.1e

Use linear and quadratic regression methods available through technology to write a linear or quadratic function that represents the data where appropriate and describe the strengths and weaknesses of the model.

A.ST.1h

Analyze relationships between two quantitative variables revealed in a scatterplot.

A.ST.1i

Make conclusions based on the analysis of a set of bivariate data and communicate the results.

What is Mathspace

About Mathspace