I am a rising statistics enthusiast, as I believe in its power in helping address biological and health questions. Recently, I have been digging deep in some subfields of statistics, while taking classes in statistics. A homework question asked us to predict the response variable in a linear model for a patient along with an appropriate 95% CI, given new values of the regressors. Given my understanding in the material, I believe this question is asking for a 95% prediction interval, or 95% CI of the prediction, so the R code should be writted as predict(lm, data.frame, interval = "prediction", level = 0.95)
, because the 100(1 - )% CI for a single future response is in fact the prediction interval. Based on this question, I intend to write this short post as an chance to clarify some basic statistics terminologies and the definition behind them. While CI and PI (prediction intervals) may be used differently in many places under different contexts, CI as calculated as
(i.e.
interval = "confidence"
in R code) measures the mean response confidence interval. If this interval is expected in the original question I posted, the question should have been phrased as “predict average patient response with CI given the profile of the new values of regressors”. PI or CI of prediction as calculated as is used to predict what the question asked, “the response of a single patient with appropriate CI (i.e. PI here)”. Again, PI of an individual is expected to be wider than mean response CI because of the uncertainty to predict a single individual. Therefore, in conclusion, when solving a real-life problem is important to think and understand why.