Graded Assignment
Practice: Linear Regression Lines
Use the data set on the following page to answer Questions 1 through 13. Give all results to the nearest thousandth.
The data included with this Assignment are hitting statistics for the top 25- homerun hitters from both the National and American Leagues in Major League Baseball. The statistics included are the number of base hits, the number of runs, and the number of homeruns each player had in the 1999 season as of 8/17/99. (Source: www.majorleaguebaseball.com)
(In case you’re rusty on your baseball terms, here are a few definitions: A base hit is a hit that gets the batter on a base. A run is any time a runner reaches home plate and scores, and a home run is hitting the ball and reaching home plate in the same play.
Base Hits | Runs | Homeruns |
136 | 94 | 36 |
123 | 91 | 35 |
141 | 95 | 33 |
130 | 89 | 32 |
106 | 88 | 32 |
87 | 63 | 31 |
140 | 70 | 31 |
101 | 70 | 29 |
123 | 74 | 28 |
144 | 85 | 28 |
109 | 71 | 28 |
132 | 62 | 27 |
93 | 74 | 27 |
122 | 80 | 27 |
143 | 74 | 26 |
133 | 86 | 25 |
130 | 66 | 24 |
101 | 58 | 23 |
104 | 50 | 23 |
102 | 73 | 23 |
135 | 77 | 23 |
157 | 74 | 22 |
87 | 50 | 22 |
101 | 50 | 21 |
160 | 99 | 21 |
134 | 91 | 47 |
109 | 86 | 47 |
131 | 108 | 36 |
136 | 86 | 31 |
133 | 91 | 30 |
149 | 78 | 29 |
123 | 96 | 29 |
89 | 71 | 29 |
136 | 74 | 28 |
132 | 62 | 27 |
137 | 77 | 27 |
125 | 70 | 27 |
103 | 68 | 26 |
118 | 83 | 26 |
89 | 60 | 26 |
128 | 57 | 26 |
130 | 63 | 25 |
122 | 78 | 25 |
108 | 77 | 24 |
115 | 65 | 24 |
114 | 66 | 24 |
128 | 79 | 24 |
116 | 73 | 23 |
117 | 63 | 23 |
86 | 57 | 22 |
- Enter the hits for all 50 players into L1 of your TI-83. Enter the number of runs into L2. Do a linear regression (STAT CALC 8 ) of the number of runs (y) onto the number of hits (x). Make sure you save the function in Y1 because you’ll be plotting it for a later question. (If you’re not sure of how to do this on your calculator, refer to the Study Guides for the Tutorials in this lesson or look in your calculator’s manual.)
What is the regression equation in the form = a + bx? Is the relationship between hits and runs positive or negative? (2 points)
- Create a scatterplot, and plot y (L2) against x (L1) with the regression line (the equation should be stored in Y1 and the equals sign turned on).
Sketch the scatterplot and regression line in the box below. (The sketch does not need to be perfect.) Does the relationship appear linear? Why or why not? (1.5 points)
- What’s the predicted number of runs for the player with 149 hits? What’s the predicted number of runs for the player with only 86 hits? (2 points)
- What’s the formula for calculating a residual? Find the residuals for the players with 149 and 86 hits, respectively. (2 points)
- Calculate the residuals and store them in L3. You can do this by highlighting L3 in the data list window and hitting ENTER , then hit 2nd [LIST], select RESID, and press ENTER ENTER.
Plot the residuals in L3 against the x-values in L1. Sketch the residual plot in the box below (the sketch does not have to be perfect.) Is there any pattern in the residuals? Based on this, does it appear that the relationship between hits and runs is linear? Not linear? (2 points)
- What are two different methods you’ve used so far to check linearity? (1.5 points)
- How would you characterize the relationship between hits and runs: linear and strong; linear and moderate; linear and weak; non-linear and strong; nonlinear and moderate; or non-linear and weak? (NOTE: “non-linear and strong” mean that the data points form a definite curved pattern.) (1 point)
- Clear L2 and L3. (Leave number of hits in L1.) Enter the number of homeruns into L2. Regress the number of homeruns (y) onto the number of hits (x), again saving the function in Y1.
Give the regression equation in the form = a + bx. How would you interpret the regression coefficient in terms of hits and homeruns? (3 points)
- Plot y (L2) against x (L1) with the regression line.
Sketch the scatterplot and regression line in the box below. (The sketch does not need to be perfect.) Does the relationship appear linear? Why or why not? Does the data plot indicate that there is a linear relationship between hits and homeruns? Why or why not? (2 points)
- Calculate by hand the predicted number of home runs for the player with 93 hits and the residual for that player. Would this residual be above or below the regression line? (1.5 point)
- Find the residuals and store them in L3. Plot L3 against the x-values (L1).
Sketch the residual plot in the box below. (The sketch does not have to be perfect.) What does the residual plot suggest about the possible non-linearity of the relationship between hits and home runs? (1.5 points)
- Would you characterize hits and home runs as having a strong linear relationship, a weak linear relationship, a strong non-linear relationship, a weak non-linear relationship, or no relationship? Use the data plot, and the residual plot to justify your answer. (3 points)
- We’ve seen how residuals are used to assess the fit of a regression line to the data, but they have another important role. How are residuals used in the definition of least-squares regression? (2 points)