19+Regression+(least+squares+regression+line,+r,+r2,+and+Fathom)

Regression (least squares regression line, r, r2, and Fathom) Tom Tillison, Christine Zafran, Zach Golub

__**Summary:**__ R=correlation coefficient. It tells you how strong the relationship is between the two variables. Always between -1 and 1 R2= you use this number when making predictions. Tells you the percent of dependent variable that is explained by the line of best fit. It tells you the percentage of how accurate your prediction will be. Residual= observed- predicted Observed= actual data Predicted= number obtained from the line of fit

When writing the equation of the least squares regression line and correlation coefficient you find y=mx+b and the r^2 provided on the graphs has to be solved by square root to find r. . **Least squares linear regression** is a method for predicting the value of a dependent variable //Y//, based on the value of an independent variable //X//. Regression generates what is called the "least-squares" regression line. The best fit line is a formula that produce the best prediction for Y given X. Fathom is the application we use for our problems to graph, summarize and collect our information in tables. Our goals of this section is to display our information in fathom and explain how to solve the least squares regression line, r, and r^2.

__**A typed list of rules, formulas, and properties:**__


 * Regression Data:** Regression Data consists of two quantitative variables

1. __Explanatory Value__ (The independent variable - always graphed on the x-axis) 2. __Response Variable__ (The dependent variable- always graphed on the y-axis)

We want to find if there is a relationship between the explanatory variable and the response variable. If a relationship exists we cannot conclude that is is cause and effect.


 * Correlation:** A numerican measure of direction and strength of a //linear// relationship. The correlation coefficient, r, is calculated using the following equation.

//r// will always be in the range ( //r// is greater or equal to -1 but less then or equal to 1) When //r > 0 the variables are posotively associated. When// r //is close to 1, the association is strong.// //When r < 0 the variables are negatively associated. When// r //is close to -1, the association is strong.// //When r// is equal to 0, there is no positive or negative linear association (or a very weak linear association)


 * Regression:** Finding a best fit line to describe the form of the relationship. This line is known as the least squares regression line (LSRL)

y = mx + b

y = the predicted, or estimated value of the response variable obtained from the LSRL


 * Characters of the Regression Line:**

1. The least squares regression line always passes through the ( 2. The slop of the LSRL is //m = r times Sy divided by Sx// 3. r^2, the coefficient of the determination, indicates the proportion of the reponse variable variation that is explained by the least squares regression line


 * Residuals: Simply the distance the point is from the regression line. If the point is above the regression line, the residual is positive. If the point is below the regression line, the residual is negative. **


 * //Residual = observed - predicted// **

1. The sum of the residuals is always equal to 0 2. The mean of the residuals is always equal to 0
 * Characters of Residuals:**


 * Residual Plot:** A scatter plot of residuals. The residuals plotted on the y-axis vs. the explanatory variable on the x-axis. Visualize the regression line being rotated to make it horizontal and each point's distance from the line stays the same.


 * -** If the residual plot shows an even scattering about line y = 0, and the correlation coefficient is strong then the regression line is a good fit to the data

__**3 Sample problems with solutions:**__





**__Regression Problems/Solutions__**
 * 1) __**MarriageAge100**__

Y=1.02x + 1.3 r=0.91
 * 1) a. Write the equation of the Least Squares Regression Line and Correlation Coefficient


 * 1) b. What is the predicated husband age for a 42-year old wife?

Y=1.02(42) +1.3 Y= 44.14 years old


 * 1) c. What is the predicted wife age of a 45-year old husband?

45=1.02x +1.3 x= 42.84 years old

Residual= observed – predicted Observed= actual data Predicted= number obtained from the line of fit 25-33.92= -8.94
 * 1) d. What is the value of the residual when //x// = 32.

Wife: mean=31.16 sd=11.0048 Husband: mean=33.08 sd=12.3105
 * 1) e. Compute the mean and standard deviation for both data sets.


 * 1) f. Do you think that predictions made using this line are reliable? Why?

R2=.8281 That tells me that I have an 83% chance of making an accurate prediction with the least squares regression line (line of best fit). Therefore the predictions I make with the line of best fit will be somewhat accurate, but I would not rely on these answers too heavily.


 * 1) __**2. Metabolic Rate vs. Weight.**__

How fast you metabolize food
 * 1) a. Define Metabolic Rate.


 * 1) b. Write the equation of the Least Squares Regression Line and the Correlation Coefficient.

Y= 26.9x+1.10+02 R=.87 Y=26.9(47)+1.10+02 Y=1374.3
 * 1) c. What is the predicated metabolic rate for a person that weighs 47kg?

Observed: 1124 Predicted: 26.9(42.4)+110= 1250.56
 * 1) d. What is the value of the residual for 42.4 kg?

Residual= observed minus predicted Residual= -126.56

Weight: mean=46.7421 sd= 8.28441 Metabolic rate: mean=1369.53 sd=257.504
 * 1) e. Compute the mean and standard deviation for both data sets.

Yes, because r (.867) indicated to use that there is a fairly strong positive correlation. In addition r2 tells me that there’s if a 75% chance my predictions will be accurate. This is not great, but it is definitely a high number
 * 1) f. Do you think that the linear model is appropriate? Why?


 * 1) __**3. Fuel Used vs. Speed.**__

Y=-.0147x+11
 * 1) a. Write the equation of the Least Squares Regression Line and correlation coefficient

65=-.0147x+11 54=-.0147x x=3673.47
 * 1) b. What is the predicated amount of fuel used if the speed is 65 km/h?

R=.1732 Residual= observed-predicted Residual= 8.27-12.47 Residual= -4.2
 * 1) c. What is the value of the residual for 100km/h?

Fuel- Mean=9.88467 SD= 3.81938 Speed- Mean=80 SD=44.7214
 * 1) d. Compute the mean and standard deviation for both data sets.

Y=.0148x+8.7, work on paper
 * 1) e. Find the equation of the Least Squares Regression Line using the correlation coefficient, the means, and the standard deviations

__**3 Web pages:**__

__**http://www.duke.edu/~rnau/regintro.htm**__

http://www.jerrydallal.com/LHSP/slr.htm

http://www.sascommunity.org/wiki/PROC_REGRESSION:_A_simple_explanation_of_options_and_results