What is Sentiment Analysis?
Sentiment analysis or opinion mining is a notoriously difficult sub-field of Natural Language Processing and Data Science. At the most fundamental level, the task is to take a piece of text and automatically score it for the opinions and sentiments contained within.
- “I had the most wonderful stay” (= positive/satisfaction).
- “I’m really disappointed with the battery life of my device” (= negative/dissatisfaction).
These examples are relatively easy to deal with. However, we soon run into problematic cases.
- The phone was well packaged but I had to wait a whole week for delivery.
It is obvious for a human to infer that the customer is dissatisfied with the delivery speed. But, taking a step back, where it is actually mentioned that waiting a week for delivery is bad? There are no overtly negative words.
It is also important to separate the satisfaction with the packaging from the dissatisfaction with the delivery. These are different, unrelated aspects of the product.There is an abundance of other difficulties with automatic sentiment analysis, including, but not limited to: lexical ambiguity, domain dependent model overfitting, lack of training data, lack of sufficiently-varied training data.
Why is Sentiment Analysis important?
Automated Sentiment Analysis is essential for properly understanding and quantifying the opinions expressed in the text. With large amounts of data, understanding the feedback in any meaningful way becomes time-consuming and expensive. On an Internet-wide scale, resorting to manual categorisation is impossible.
For online data, the insight lies in how people online are talking about your brand. For proprietary data, such as customer satisfaction or employee satisfaction reviews, the key business insight is in properly gauging the satisfaction level of respondents.
How does Gavagai handle Sentiment Analysis?
The most common sentiment analysis solutions in the industry use a machine learning (or deep learning) approach. An algorithm makes generalisations from large, annotated sets of data which are applied to customer texts. These models function as a ‘black box’ with no possibility of explanation or interpretation. Such an approach does also not transfer well to unseen data from other domains or industries.
Most services offer a binary classification (positive/negative) or a ternary classification (positive/negative/neutral). At Gavagai, we offer a wide spectrum of eight different sentiments: positivity, negativity, scepticism, love, hate, fear, desire and violence. This provides a more nuanced understanding of texts and comments.
We rely on a heuristic-based method which is explainable, interpretable and scalable. It has also proven to work well on gold standard benchmarks from academia. In experiments for customers, the method performs well across a range of different data types, freeing us from the classic Machine Learning problem of overfitting. (This is where model learns patterns that are too specific to the data it was trained on. This is at the expense of generalising well to unseen data. Dealing with new data is extremely important for commercial sentiment analysis).
A more advanced task is to identify how expressed opinions actually relate to the different entities in the text.
- The food was delicious but the service was appalling.
In this last example, it is helpful if we can attach the sentiment of ‘delicious’ to ‘food’ and the sentiment of ‘appalling’ to ‘service’. We use a topical sentiment detection algorithm to attach sentiments in the text to the topics they describe. This is sometimes called aspect-based sentiment analysis.
Gavagai Explorer works with sentiment analysis in Azerbaijani, Albanian, Arabic, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Farsi, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Korean, Latvian, Lithuanian, Malay, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Ukrainian, Urdu, and Vietnamese.