Why sentiment analysis engines need customization
The human side of data
There was a time, not too long ago, when the general population would have bet their life savings that flying cars and human-like robots would exist by the year 2014.
There are definitely no flying cars, and robots aren't exactly how we pictured them. However, we do have artificial intelligence that can understand what people are saying.
Creepy? Sort of, but more cool than creepy in my opinion.
When artificial language is used to understand human (natural) language, it's referred to as natural language processing (NLP). Most NLP engines that are used to analyze text come equipped with something called sentiment analysis, which is a technology that lets us know whether text is positive, negative or neutral.
Good NLP engines will be able to assign sentiment to a single word or phrase. "Awful", for example, is a word with negative sentiment. "Delicious" is positive and "Blue chair" is neutral.
Sensing the sentiment
Sentiment analysis can also tell us the polarity of an entire document. For example, a tweet that reads, "The service was awful but the food was delicious!" would be neutral. That's because the positive and the negative cancel out to make for a neutral score.
Really good NLP engines will give you a sentiment score for the individual words and phrases that bear sentiment, and another score for the entire document as a whole. So in the example above, we would know that the tweet is neutral, but it contains valuable positive and negative information.
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
The problem with sentiment analysis is sometimes it's wrong. It's just a limitation that we have to deal with. I mean, humans can't agree on the polarity of a document half the time. Even grad students won't agree 20% of the time.
"Oh man, that was nasty!" Is this sentence positive or negative?
Surely, it must be negative. "Nasty" is a negative word, and everything else in this sentence is neutral. Final answer, negative! Drum role…
Wrong! It's positive.
Customization is key
The person who said this used the American slang definition of nasty, which has positive sentiment. There is absolutely no way to know by reading the sentence. So, if you (a human) were just tricked by reading this article, how is a machine supposed to figure it out? Answer: Tell the engine what's positive and what's negative.
High quality NLP engines will let you customize your sentiment analysis settings. "Nasty" is negative by default. If you're processing slang where "nasty" is considered a positive term, you would access your engine's sentiment customization function, and assign a positive score to the word.
The better NLP engines out there will make this entire process a piece of cake. Without this kind of customization, the machine could very well be useless in your work. When you choose a sentiment analysis engine, make sure it allows for customization.
Otherwise, you'll be stuck with a machine that interprets everything literally, and you'll never get accurate results.
- Scott Van Boeyen is the Community Manager for Lexalytics and Semantria. He contributes to the data community by writing/blogging about text analytics and sentiment analysis and helping journalists with ideas for related content.