Darren Dahly PhD Statistical Epidemiology

Prediction vs Risk Factors

Seán Millar and I were recently asked to review a paper. In his review, he pointed out the difference in predicting an outcome vs. identifying risk factors for an outcome. It is an important point I have clumsily tried to convey in other reviews, but he put it so nicely I asked if I could quote him here (I’ve made a few small editorial changes, mainly intended to generalize what he wrote).

The authors state, “This study found that X predicts risk of Y better than Z and W in this population.” Technically, this analysis demonstrates that X is more strongly associated with Y, but not that it is more predictive. Although terms such as association/risk and prediction are used interchangeably within epidemiological research, they are quite different, and even strong associations do not necessarily indicate increased predictive ability. There are statistical methods which may be employed to evaluate discrimination, but these have not been used in this paper. I would suggest that the authors may wish to use terms such as association, risk, or relationship, rather than prediction within the manuscript. The title should also possibly be amended, i.e. “Is X or Z and W a greater risk factor for Y….etc.”

I think I may be pasting this into the occasional review from now on.

If you really want to tuck into this topic, see To Explain or to Predict? by Galit Shmueli (@gshmueli), posted on arxiv.org. It’s a clear, comprehensive overview of the topic. There is also this paper in the AJE, Limitations of the Odds Ratio in Gauging the Performance of a Diagnostic, Prognostic, or Screening Marker, which clearly demonstrates that big odds ratios don’t always lead to useful classifications (predictions).