This paper evaluates a set of candidate phenology models for simulating grapevine development under present conditions and future climate change scenarios. The models were developed on a combination of working hypotheses, designed to extend their predictive ability to multiple phenological events, i.e. budburst, flowering and veraison.The aspects considered in the modelling analysis were: (i) effect of dormancy description, by comparison of Chilling and Forcing (CF) with Forcing (F) models; (ii) degree of phenophase-dependent specificity after endodormancy break; (iii) adopting a linear or curvilinear function for modelling temperature dependency of development rate; (iv) performance under climate change scenarios.We also analyzed to what extent model behavior under present and future climate was affected by factors other than model structure, focusing on the variability of the dataset used to calibrate the models. To this purpose two parallel series of models were evaluated: one calibrated on a historical phenological dataset, and the other on the same dataset extended with budburst observations obtained under controlled condition simulating shorter winters, as expected from climate change predictions.Estimation accuracy of the models was evaluated on independent data. A further evaluation considered parameterization consistency with experimental physiological data, and model behavior under future climate change scenarios, to assess to what extent future predictions are affected by model structure and complexity.Under current climatic conditions the models showed a comparable accuracy, only slightly higher when growth-room data were used for calibration.The similar performance was obtained despite wide discrepancies in parameterizations between the calibration types. In CF models, which describe dormancy dynamics, chilling requirements were highly overestimated in field-calibrated models, and different temperature requirements were likewise found for post-dormancy development.The parameterization differences had a clear impact on a demonstrational scenario analysis, where models were driven by synthetic weather data derived from downscaled HadCM3 predictions for the SRES-A2 scenario.Here all the selected models predicted a forward shift of phenology. The type of calibration affected the response more than the model structure or complexity, the effect of which only became detectable at the remotest projections in the future. The effect of calibration was more evident for budburst than for flowering and veraison. Field-calibrated models underpredicted the shift with respect to their counterparts, due to a delayed onset of forcing accumulation, which postponed budburst and subsequent phases, counterbalancing the effect of higher spring temperatures.It is concluded that for present time applications all the evaluated models have comparable accuracy. Also for climate change impact analysis the variability of the calibration dataset is a more crucial factor than model structure or complexity. The simpler forcing models can therefore be recommended for both present conditions as well as scenario-based applications.