Improving the Effectiveness of Customer Sentiment Analysis

by   |   September 13, 2012 5:35 pm   |   0 Comments

Take a gander at this negative tweet about a key Apple product: “If I break/lose my iPhone 4s, I’m seriously going to commit suicide.”

If you say that comment doesn’t sound the least bit “negative” to you, it proves one thing—you’re not software. A customer sentiment analysis program, used by Minneapolis-based digital marketer Neil James, classified the above tweet—which would make any marketer giddy—as negative.

Related Story

Listening to the Web: effective sentiment analysis makes all departments customer centric.

       Read more»

He acknowledges that’s an “egregious example” of a misclassification. Still it strikes at the heart of the uneasiness over sentiment analysis: While there’s no question companies want to gleam insights from the new town hall that is social media, they debate the accuracy of the tools that are available to do so.

“The  limitations of sentiment analysis lie in the fact that machine learning is still at an early stage of development and ‘emotion’ is really difficult to predict computationally,” says Steven Ramirez, CEO and president of Beyond the Arc, which manages voice of the customer and social media insight programs for financial institutions and large media companies. “Cultural factors, linguistic nuances and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment.”

Seth Grimes, an analyst who runs the annual Social Analysis Symposium in San Francisco, says automated sentiment analysis tools out-of-the-box generally have a 50 to 60 percent accuracy level, as measured against how human beings would rate the same comments.

The Trouble with Subtlety
The iPhone example notwithstanding, “the problem generally isn’t that the software will rate a positive comment as negative,” Grimes says. “It’s between positive and neutral. For example, you might say you don’t like a particular flavor of ice cream. But you’re not saying you don’t like the ice cream. Some might rate that comment negative, some neutral.”

Granted, even human beings have difficulty with such subtleties. Grimes ran a Twitter poll where he asked people to evaluate the phrase, “I bought a Honda yesterday.”  Of the 22 respondents, 45 percent rated the comment as positive; 55 percent as neutral. If people have trouble agreeing, it’s understandable why machines would often miss the mark. Beyond that, sentiment analysis tools have trouble with irony, humor, and subtleties of human speech, like how an emoticon such as 🙂 can change the intent of blistering words.

Stephen Ramirez, CEO of Beyond the Arc

Stephen Ramirez: Beware of spam comments.

That’s not the only issue with accurately classifying the tsunami of social media feelings. “In some areas, we’ve seen anywhere from 20 percent to close to 100 percent of comments in social media as spam,” Ramirez says. “When you search the key word ‘auto loan’ and find all these people asking, ‘Any suggestions on where to get an auto loan?’ that isn’t a conversation that’s really happening—it’s spam.” He says human beings can usually spot spam right off; machines often get fooled. And a large amount of spam may skew the data.

Venkat Viswanathan, CEO and founder of LatentView, a data analytics company that works with Fortune 500 companies, finds the current iteration of sentiment analysis tools works most accurately when applied to comments about consumer electronics products. That’s because the products have distinct feature sets that can provide the structure for analyzing sentiment.  He says doing analysis based on one or two key differentiating features eliminates some of the confusion.

“We did a project for a company that introduced a high quality camera at a medium price,” he says. “We got negative feedback based on their use of an older operating system. That provided very actionable insights.”

However, he says even with a relatively simple consumer product like a camera or smartphone, sentiment analysis software needs to be coupled with human analysis. For example, a human may set up categories and then train the software how to classify comments based on the categories.

“Some topics and conversations are easy to classify, some are complex,” Viswanathan says. “In any case, you always need humans to provide the context. There might be comments in a discussion forum about the amount of heat that a Dell laptop battery is generating. But only a human being can make the connection that heat in the context of a laptop battery is not a good thing. That is an aspect you need to teach the computer.”

Sentiment Analysis Limitations and Techniques to Improve Results

While they are getting better all the time, machines still face challenges when deciphering human sentiments in online statements.

Examples where sentiment analysis tools fall short:

  • Irony, humor and other subtleties of human speech, like how the emoticon 🙂 can change the tone of an otherwise negative statement.
  • Spam-loaded conversations in social media that strike people as inauthentic.
  • False negatives, where the software sees a negative word like “crap” but doesn’t realize it’s positive in the overall context—”Holy crap! I loved this!”
  • Cultural differences, where some people from some countries might be more or less effusive in their use of language.

Techniques that help improve the effectiveness of sentiment analysis:

  • Picking a limited number of concrete product features to analyze
  • Pairing sentiment analysis tools with human analysts to examine contextual references
  • Use sentiment analysis as a starting point to identify issues for follow-up action
  • Connect sentiment analysis questions to a business problem
  • Going beyond the polarity of “positive” and “negative” to classify sentiment, and using more fine-grained categories like “angry,” “happy,” “frustrated,” and “sad.”

Accuracy Tied to Relevance

Seth Grimes, sentiment analysis expert

Seth Grimes: Relevance boosts accuracy.

One issue about sentiment analysis accuracy is there is no standard agreement on what “accuracy” means. It’s typically defined by a combination of “recall” (how many relevant examples you have detected) and “precision” (how well the examples are classified).  However, Grimes suggests accuracy should also take into account “relevance”—a mild complaint from a key customer may be more important than a tirade from a one-time buyer. If that doesn’t complicate things enough, here’s one more wrinkle—Grimes questions whether people make too much of a fuss about accuracy in the first place.

“People who worry about accuracy first and foremost are looking at the wrong criteria,” he says. “You should be looking at the business problem you want to solve.” He points out that using sentiment analysis for counterterrorism would ideally desire 100 percent “recall,” with a high tolerance for low precision and false-positives (such as the software thinking the remark “this product made me jump for joy” is a negative). In contrast, many marketing initiatives might find 70 percent accuracy sufficient for their needs.

Remember that customer sentiment analysis is a starting point, not an end. If you find the word “problem” affixed to 10 percent of posts about your product, that suggests there is an issue, but you need to dig deeper to find the issue.

“Quantitative data gives you the symptoms of where your problems lie; qualitative data tells you what to do about it,” says Joyce O’Donnell Maroney, senior director, customer experience and services marketing for Kronos Incorporated, a maker of workplace management tools. “The new tools for sentiment analysis can be extremely powerful, but you still need to build out a meaningful hypothesis and test it.”

Joe Mullich, a freelance writer based in Los Angeles, can be reached at

Tags: , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>