This paper presents a quest for the most suitable setting and method to assess the naturalness of the output of an existing algorithm for the generation of multimodal referring expressions. For the evaluation of this algorithm a setting in Second Life was built. This paper reports on a pilot study that aimed to assess (1) the suitability of the setting and (2) the design of our evaluation method. Results show that subjects are able to... Read more