Recognition of speech to enhance social communication in collaborative virtual worlds. Bachelorthesis.

Period of employment at Mercedes Benz AG from October 2018 to 2021.

Due to internal company confidential information, no details can be published within this portfolio without a non-disclosure agreement.

A paper including the conducted pilot study has been published here.

Abstract

Within vehicle development, virtual reality collaborations are increasingly used in the automotive industry in the context of virtual conferences around three-dimensional objects. Different research showed that when using very abstract avatars in shared virtual spaces, communication between people is hampered by the absence of non- verbal attributes.

The objective of this work is to develop a scientific recommendation for a method of lip synchronization and emotion recognition via the speech channel in order to increase social communication and likewise the fidelity of the avatars by appropriate animation. For this purpose, evaluation metrics for the methods in the context of vehicle development are being established. These metrics, along with a weighting from a pairwise comparison, will be used to evaluate the methods. To study the methods, they are theoretically evaluated from the literature review and practically assessed through an implementation. Furthermore, the interaction between the human and avatar on emotion recognition is investigated within a pilot study.

Emotion recognition and lip synchronization methods are based on machine learning approaches. Implemented emotion recognition models thereby exhibiting poor accuracy values for application to natural datasets. The pilot study shows that using a proximity algorithm (SALSA) to represent lip synchronization is sufficient for acceptance and affinity towards avatars. Furthermore, the study shows that the additional representation of emotion on the avatar through a facial expression has a crucial positive impact on the recognizability of emotions, which is why the representation of emotions together with lip synchronization is recommended for future collaborative virtual platforms.

Based on the researched information and the comparison of economic and technical aspects of different methods, it is finally recommended to choose the SALSA application together with a deep learning-based approach for emotion recognition, which should be trained on a natural dataset yet to be created.

Zurück
Zurück

Animation

Weiter
Weiter

Music Listening For The Hearing Impaired