by TomDoan » Wed May 05, 2010 4:23 pm
However, an outlier is specific to a model. If you have Y,X pairs
Y X
0 0
1 1
2 2
50 50
looking at the Y values in isolation (in effect thinking of a "model" in which the Y's are i.i.d. N(mu,sigma^2)), the 4th observation appears to be an outlier. In the model Y=a+bX, it isn't; it's right on the regression line along with everyone else. Even if you assume i.i.d. data, what would be seen as an outlier in Normally distributed data might be perfectly reasonable for a fat-tailed distribution like a Cauchy.