Key Concepts
After this lesson student will be able to:
- Understand how to find the line of best fit.
- Understand how to correlation coefficients.
- Understand how to interpret residual plots.
- Understand how to interpolate and extrapolate using linear models.
- Understand how to correlation and caution.
Analyzing Line of Best Fit
Explore & Reason
The scatter plot shows the number of beachgoers each day for the first six days of July. The head lifeguard at the beach uses the data to determine the number of lifeguards to schedule based on the weather forecast.
The head lifeguard compares two linear models:
g(x) = 13x + 25
h(x) = 12x + 30
- Copy the scatter plot and graph the linear functions on the same grid.
- What is a reasonable domain for each function? Explain.
- Construct Arguments: Which model is the better predictor of the number of beach goers based on the temperature above 80°F? Defend your model.
Solution:
- For g(x), plot the y-intercept which is at (0, 25) and another point, say (10, 155) and connect with a straight line.
The graph of g(x) is shown in red. For h(x), plot the y-intercept which is at (0, 30) and another point, say (10, 150) and connect with a straight line.
The graph of h(x) is shown in blue.
- Since the domain represents the degrees (F) above 80°F, a reasonable domain is 0 ≤ x ≤ 20.
- h(x) seems to be the better model since it balances out points above and below the trend line.
Find the Line of Best Fit
Example 1:
What is the equation of the line of best fit for the data in the table?
Enter the data into a graphing calculator.
Steps:
- Hit “stat” and then “edit”.
- Enter the x-values in L₁ and your y-values in L₂
- Hit “stat”, go over to the “calc” option, and then hit option 4 (Line Reg).
- Make sure the X-List says L₁ and the Y-list says L.2.
- Scroll down and hit “calculate.”
Once you have completed these steps, you will be presented with some values.
They will help you write the line of best fit.
y = ax + b
Round substitute in the a and b values that the calculator gives you.
You now have your line of best fit!
Example 1
Solution:
A linear regression is a method used to calculate the line of best fit. A line of best fit is the trend line that most closely matches the data.
Step 1
Enter the data into a graphing calculator.
Step 2
Perform a linear regression. Use the linear regression function.
The values of a and b from the linear regression – the slope and the y-intercept are displayed.
Step 3
Write the equation of the line of best fit. Substitute 13.56 for a and 17.59 for b.
y = 13.56x + 17.59
The equation for the line of best fit for the data is y = 13.56x +17.59.
Try It
1. Use the linear regression function to find the equation of the line of best fit for the data in the table.
Solution:
Using a graphing calculator, enter the x-values on L1 and the y-values on L2.
Then use the linear regression function (LinReg).
The values of a and b from the linear regression – the slope and the y-intercept – are displayed:
The equation for the line of best fit for the data is:
y = 0.78x + 4.69
Understand Correlation Coefficients
Example 2
What does the correlation coefficient reveal about the quantities in a bivariate data set?
Solution:
Study Tip
If your calculator is not showing the correlation coefficient, r, you may need to turn “Stat Diagnostics” ON in the MODE menu. On older calculators, you can choose “Diagnostic On” from the CATALOG menu.
When you perform a linear regression using technology, you are also given the correlation coefficient.
The correlation coefficient, represented by r, is a number between -1 and 1 that indicates the direction and strength of the linear relationship between two quantitative variables in a bivariate data set, a set of data that uses two variables.
When the correlation coefficient is close to 1, there is a strong positive correlation between the two variables.
That is, as the values of x increase, so do the values of y.
When the correlation coefficient is close to 0, there is a weak correlation between the two variables.
When the correlation coefficient is close to -1, there is a strong negative correlation between the two variables.
Common Error
You may think that a correlation coefficient of -1 indicates that there is no correlation. Instead, it tells you that there is a strong negative correlation.
Try It!
2. What does each correlation coefficient reveal about the data it describes?
a. r= 0.1
b. r=-0.6
Solution:
1. Given r = 0.1 which is very close to O. So, it has a week correlation between the two variables.
If the correlation weak correlation coefficient is close to 0, then there is week correlation between two variables.
2. Given r = -0.6 which is close to -1. So, it has a strong negative correlation between the two variables.
If the correlation coefficient is close to –1, then there is a strong negative correlation between the two variables.
Concept Residuals
A residual is the difference between the y-value of a data point and the corresponding y-value from the line of best fit, or the predicted y-value. Residual = actual y-value – predicted y-value.
A residual plot shows how well a linear model fits the data set. If the residuals are randomly distributed on either side of the x-axis and clustered close to the x-axis, then the linear model is likely a good fit.
Interpret Residual Plots
Example 3
Student enrollment at Blue Sky Flight School over 8 years is shown.
The owner used linear regression to determine the line of best fit.
The equation for the line of best fit is y=-35x+1208.
How well does this linear model fit the data?
Solution:
Step 1
Evaluate the equation for each x-value to find the predicted y-values.
Step 2
Calculate the differences between the actual and predicted y-values for each x-value.
Step 3
Plot the residual for each x-value.
The scatter plot with the line of best fit suggests that there is
a negative correlation between years and enrollment.
The residual plot shows the residuals randomly distributed above
and below the x-axis and somewhat clustered close to the x-axis.
The linear model is likely a good fit for the data.
Common Error
The appearance of a residual plot does not correspond to a positive or negative correlation.
The data shown might be misinterpreted as data with no correlation, but it has a negative correlation that is seen when the actual data points are plotted.
Try It!
3. The owner of Horizon Flight School also created a scatter plot and calculated the line of best fit for her enrolment data shown in the table. The equation of the line of best fit is y = 1.44x + 877.
Find the residuals and plot them to determine how well this linear model fits the data.
Solution:
Add two additional rows: Predicted value (which uses the given equation of the line of best fit) and the
residual (which is the difference of the actual values and predicted values).
A residual plot shows how well a linear model fits the data set.
If the residuals are randomly distributed on either side of the x-axis and clustered close to £-axis, then the linear model is likely a good fit.
Here, the residuals are not clustered close to the x-axis.
So, the linear model is not a good fit of the data set.
Interpolate and Extrapolate Using Linear Models
Example 4
The graphic shows regional air travel data recorded by a domestic airline company. How can you use the data to estimate the number of air miles people flew in 2003? If the trend in air travel continues, what is a reasonable estimate for the number of miles that people will fly in 2030?
Solution:
Formulate
Plot the data points on a scatter plot. Using technology, perform a linear regression to determine the line of best fit for the data. For the x-values, use number of years since 1975.
Compute
Use the values of a and b (from the linear regression) to write the line of best fit. y = 47.87x+1345.04
Interpolation
Interpolation is using a model to estimate a value within the range of known values. Interpolate to estimate the miles people flew in 2003, or 28 years after 1975.
y=47.87(28) +1345.04
y = 2,685.4
or
Extrapolation
Extrapolation is using a model to make a prediction about a value outside the range of known values. Extrapolate to predict the miles that people will fly in 2030, 55 years after 1975.
y=47.87(55) +1345.04
y= 3,977.89
Interpret
The model predicts that people flew a total of 2,685 thousand air miles on the airline in 2003, and that people will fly a total of 3,978 thousand air miles in 2030.
This prediction is not as reliable as the estimate for 2003 because the trend may not continue.
Try It!
4. Using the model from example 4, estimate the number of miles people flew on the airline in 2012.
Solution:
We use the line of best fit obtained from example 4:
y = 47.87x + 1345.04
2012 corresponds to x = 2012 – 1975 = 37 so we have:
y = 47.87(37) + 1345.04
y = 3116.23
The model predicts that people flew a total of 3116 thousand air miles on the airline in 2012.
Correlation and Causation
Example 5
1. A student found a positive correlation between the number of hours of sleep his classmates got before a test and their scores on the test. Can he conclude that he will do well on the test if he goes to bed early?
2. A lifeguard notices that as the outside temperature rises, the number of people coming to the beach increases. Can she conclude that the change in temperature results in more people going to the beach?
A.
Solution:
Causation describes a cause-and-effect relationship. A change in the one variable causes a change in the other variable.
To determine whether two variables have a causal relationship, you have to carry out an experiment that can control for other variables that might influence the relationship between the two target variables.
The student cannot conclude that he will do well if he goes to bed early.
Other variables, like the time spent studying or proficiency with the content, could affect how well he does on the test.
Common Error
Be careful not to assume that if a correlation exists between two variables, that a change in one causes the change in other. The change could be caused entirely by a third, unknown variable.
B.
Solution:
She did not carry out an experiment or control for other variables that might affect the relationship.
These include weather forecast and time of year.
She cannot conclude that the only reason that more people come to the beach is the outside temperature.
Try It!
5. The number of cars in a number of cities shows a positive correlation to the population of the respective city. Can it be inferred that an increase of cars in a city leads to an increase in the population? Defend your response.
Solution:
No.
Other factors or variables such as migration and fertility rate can affect the increase in population
1. Make Sense and Persevere
Temperatures at different times of day are shown. How can you describe the relationship between temperature and time? Would a linear model be a good fit for the data? Explain.
Solution:
As the time increases from 12 AM to 12 PM, the temperature also increases.
As the time increases from 12 PM to 12 AM, the temperature decreases.
No.
During the time interval, the temperature increases, then decreases suggesting that the data points are not linear.
Check your knowledge
- The table shows the number of customers y at a store for x weeks after the store’s grand opening. The equation for the line of best fit is y= 7.77x + 38.8. Assuming the trend continues, what is a reasonable prediction of the number of visitors to the store 7 weeks after its opening?
- Two quantities of a data set have a strong positive correlation. Can the line of best fit for the data set have a correlation coefficient of 0.25? Explain.
- Generalize and describe how the values of a and b in a linear model are related to the data being modeled?
- Use technology to perform a linear regression to determine the equation for the line of best fit for the data. Estimate the value of y when x = 19.
Check your knowledge – Answers
Solution:
1. Using the equation for the line of best fit, we substitute x=7 corresponding to 7 weeks after the stores opening and evaluate
y=7.77x+38.8
y=7.77 x 7 +38.8
y=93.19
2. We cannot have a fractional number of visitors, so we predict that there are 93 visitors 7 weeks after the stores opening.
Given r = 0.25 which is very close to 0. So, the correlation between two variables is weak correlation instead of strongly positive
3. The value of a gives the slope of the linear model. This describes the rate of change of y with respect to x. The value of b gives the y-intercept of the linear model. This is the y-value when x = 0.
4. Using a graphing calculator, enter the x-values on L1 and the y-values on L2. Then use the Linear Regression function (LinReg). The values of a and b from the linear regression – the slope and the y-intercept – are displayed:
The equation for the line of best fit for the data is:
y = -5.725x + 197.2
When x = 19,
y = -5.725(19) + 197.2
y = 88.424
Exercise
- Use technology to perform a linear regression to determine the equation for the line of best fit for the data. Estimate the value of y when x = 18.
- Make a residual plot for each linear model and the data set it represents. How well does each model fit its data set? Y=0.15x+13.4
- The data below represents the average number of student absence as the temperature increases.
Concept Summary
Linear Models, Lines of Best Fit, and Residuals
Words
A linear regression is a method for finding the line of best fit, or a linear model, for a bivariate data set.
A residual plot reveals how well the linear model fits the data set.
If the residuals are fairly symmetrical around and clustered close to the x-axis, the linear model is likely a good fit.
Algebra
Use the values of a and b from the linear regression to write the equation for the line of best fit.
The equation is y = 0.542x + 1.
Graph
Concept map
Related topics
Addition and Multiplication Using Counters & Bar-Diagrams
Introduction: We can find the solution to the word problem by solving it. Here, in this topic, we can use 3 methods to find the solution. 1. Add using counters 2. Use factors to get the product 3. Write equations to find the unknown. Addition Equation: 8+8+8 =? Multiplication equation: 3×8=? Example 1: Andrew has […]
Read More >>Dilation: Definitions, Characteristics, and Similarities
Understanding Dilation A dilation is a transformation that produces an image that is of the same shape and different sizes. Dilation that creates a larger image is called enlargement. Describing Dilation Dilation of Scale Factor 2 The following figure undergoes a dilation with a scale factor of 2 giving an image A’ (2, 4), B’ […]
Read More >>How to Write and Interpret Numerical Expressions?
Write numerical expressions What is the Meaning of Numerical Expression? A numerical expression is a combination of numbers and integers using basic operations such as addition, subtraction, multiplication, or division. The word PEMDAS stands for: P → Parentheses E → Exponents M → Multiplication D → Division A → Addition S → Subtraction Some examples […]
Read More >>System of Linear Inequalities and Equations
Introduction: Systems of Linear Inequalities: A system of linear inequalities is a set of two or more linear inequalities in the same variables. The following example illustrates this, y < x + 2…………..Inequality 1 y ≥ 2x − 1…………Inequality 2 Solution of a System of Linear Inequalities: A solution of a system of linear inequalities […]
Read More >>
Comments: