Sentiment Detection from Speech Recognition Output

Ivan Tashev; Dimitra Emmanouilidou

Sentiment Detection from Speech Recognition Output

Engineering Sciences | July 2020 , Vol LVII(2)

Download BibTex

Emotion and sentiment detection from text have been one of the first text analysis applications. In the recent years emotion and sentiment analysis from human voice made serious progress with the application of the modern deep learning algorithms. Practical use of the emotion and sentiment detection include human-computer interaction (HCI), media content discovery and applications for monitoring the quality of customer service. To increase the detection accuracy, multi-modal algorithms that use both voice and text have been deployed. In both scenarios, low voice quality comprises a shared challenge that affects both the audio processing and speech recognition, leading to low recognition rate from the automatic speech recognition (ASR) and the need to revisit and reevaluate the algorithms for emotion and sentiment detection from text. In this paper we perform a review of established and novel features for text analysis, combine them with the latest deep learning algorithms and evaluate the proposed models for the needs of sentiment detection for monitoring of the customer satisfaction from support calls. The issues we address are robustness to the low ASR recognition rate, the variable length of the text queries, and the case of highly imbalanced data sets. We use a labeled dataset of more than 100,000 utterances from real support calls, and propose new optimality criterion, which is a combination of weighted and unweighted accuracy. The proposed algorithm is shown to significantly outperform the accuracy of the baseline algorithms.