Multimodal Recognition of Human Input and Behavior
One of the biggest obstacles for constructing effective sociable virtual humans lies in the failure of machines to recognize the desires, feelings and intentions of the human user. Virtual humans lack the ability to fully understand and decode the communication signals human users emit when communicating with each other. This article describes our research in overcoming this problem by developing senses for the virtual humans which enables them to hear and understand human speech, localize the human user in front of the display system, recognize hand postures and to recognize the emotional state of the human user by classifying facial expression. We report on the methods needed to perform these tasks in real-time and conclude with an outlook on promising research issues of the future.