In the development cycle of a spoken dialogue system (SDS), it is important to know how users actually behave and talk and what they expect of the SDS. We are developing SDSs which realize natural communication between users and systems. To collect users’ real data, a wide-scale experiment was carried out with a smart-phone prototype SDS. In this brief paper, we report on the experiment’s results and make a tentative analysis of cases in which there were gaps between system performance and user judgment. This requires both an adequate experimental design and an evaluation methodology that considers users’ judgement criteria.