Methods for standard meta‐analysis of diagnostic test accuracy studies are well established and understood. For the more complex case in which studies report test accuracy across multiple thresholds, several approaches have recently been proposed. These are based on similar ideas, but make different assumptions. In this article, we apply four different approaches to data from a recent systematic review in the area of nephrology and compare the results. The four approaches use: a linear mixed effects model, a Bayesian multinomial random effects model, a time‐to‐event model and a nonparametric model, respectively. In the case study data, the accuracy of neutrophil gelatinase‐associated lipocalin for the diagnosis of acute kidney injury was assessed in different scenarios, with sensitivity and specificity estimates available for three thresholds in each primary study. All approaches led to plausible and mostly similar summary results. However, we found considerable differences in results for some scenarios, for example, differences in the area under the receiver operating characteristic curve (AUC) of up to 0.13. The Bayesian approach tended to lead to the highest values of the AUC, and the nonparametric approach tended to produce the lowest values across the different scenarios. Though we recommend using these approaches, our findings motivate the need for a simulation study to explore optimal choice of method in various scenarios.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.