Gesture-based interaction is becoming more and more available each day with the continuous advances and developments in acquisition technology and recognition algorithms as well as with the increasing availability of personal (mobile) devices, ambient media displays and interactive surfaces. Vision-based technology is the preferred choice when non-intrusiveness, unobtrusiveness and comfortable interactions are being sought. However, it also comes with the additional costs of difficult (unknown) scenarios to process and far-than-perfect recognition rates. The main challenge is represented by spotting and segmenting gestures in video media. Previous research has considered various events that specify when a gesture begins and when it ends in conjunction with location, time, motion, posture or various other segmentation cues. Therefore, video events identify, specify and segment gestures. Even more, when gestures are being correctly detected and recognized by the system with the appropriate feedback delivered to the human, the result is that gestures become themselves events in the human-computer dialogue: the commands were understood and the system reacted back.
This chapter addresses the double view of meaningful events: events that specify gestures together with intelligent algorithms that detect them in video sequences; gestures, that once recognized and accordingly interpreted by the system, become important events in the human-computer dialogue specifying the common understanding that was established. The chapter follows the duality aspect of events from the system as well as the human perspective contributing to the present understanding of gestures in human-computer interaction.