Am J Orthod Dentofacial Orthop 2016;149:298-9
Correlation and linear regression
I n the next series of articles, I will discuss correlation and linear regression.1 Correlation indicates whether there is any association between 2 quantitative variables and the strength of that association. Linear regression is a statistical tool that allows us to investigate the relationship between a causal variable and a variable of interest: eg, the effect of the amount of pretreatment crowding (causal variable) on the number of days required to reach alignment (variable of interest). Example We will investigate the effect of the amount of pretreatment crowding on the number of days required to reach alignment. Days to alignment is a continuous variable expressed in days, and the irregularity index is also a continuous variable expressed in millimeters. The assumption is that the greater the initial crowding, the longer it will take to align the dentition. Table I gives summary information of the 2 variables. The first step is to see whether the 2 variables are correlated. We can assess this using the Pearson correlation coefficient r (also termed product moment correlation coefficient), which expresses the strength of the linear relationship between 2 variables, and it takes values from 1 to 1. If the correlation coefficient is 1 or 11, then the points in a scatter plot will lie exactly on a straight line, indicating a strong correlation between the variables. The correlation is positive if higher values of one variable are associated with higher values of the other variable, but the points do not have to lie exactly on a straight line. The correlation is negative if the values of one variable decrease as the values of the other variable increase. Again, the points do not have to lie exactly on a straight line.