# week 4: correlation vs. causation

People often get confused with correlation and causation. Because variable A is correlated with a variable B, that does not mean A causes B. It could mean that B causes A, A causes B, or it could be a case of omitted variable bias. For example: a high college GPA (dependent variable [y]) could be correlated with a higher high school GPA (independent variable [x]). When doing a regression analysis a person can perform a variety of tests to prove if X is correlated with Y. The most basic step is to find the value of R^2, which measures the fraction proportion of the sample variation in Y explained by X. If a researcher gets a high R^2, this is good but still does not mean there is causation. You can estimate how much causation this is by finding the marginal effect X has on Y. For example, if you get a value of .08 for variable X this implies that an increase in high school GPA is associated with an increase in college GPA of 8%. The next step would be to run an OLS (ordinary least squares) estimator on the population regression function. The purpose of the OLS is to find any bias in the variables you are comparing. This also takes into consideration any unmeasurable variables (error terms) like motivation or ability.

According to act.org, first year college GPA and high school GPA are directly associated with each other. It has a much larger direct effect on first-year college GPA than it has on degree completion within 6 years. This analysis did not take into effect any unobserved variables which could have an effect on the results.

Reference:

http://www.act.org/research/researchers/briefs/pdf/2013-8.pdf

This entry was posted in Uncategorized. Bookmark the permalink.