Most businesses have a huge amount of text-based data, such as memos, company documents, emails, reports, media releases, customer records and communication, websites, blogs and social media posts. Until recently, it wasn’t always very useful, at least in terms of easily extracting business-critical insights. But that has all changed thanks to text analytics.
Text analytics, also known as text mining, is a process of extracting value from large quantities of unstructured text data. While the text itself is structured to make sense to a human being (i.e., a company report split into sensible sections), it is unstructured from an analytics perspective because it doesn’t fit neatly into a relational database or the rows and columns of a spreadsheet. Traditionally, the only structured part of text was the name of the document, the date it was created, and who created it.
Access to huge text data sets and improved technical capability means text can be analyzed to extract high-quality information above and beyond what the document actually says. For example, text can be assessed for commercially relevant patterns, such as an increase or decrease in positive feedback from customers, new insights that could lead to product tweaks, etc. As such, text analytics is now capable of telling us things we didn’t already know and, perhaps more importantly, had no way of knowing before. And these insights can be incredibly useful in business.
Text analytics is particularly useful for information retrieval, pattern recognition, tagging and annotation, information extraction, sentiment assessment, and predictive analytics. It could, for example, shed light on what your customers think of your product or service, or highlight the issues that your customers complain about most frequently.
Make Sure your Text is Analysis-ready
It’s not enough for the text to be in a digital format, it also needs to be “datafied.” If you copied a page from a book as a jpeg file, you, technically, would have a digital copy of the text. But that would be no good for running text analytics. What you need is datafied text, like the text we see in many e-readers, which allow you to interact with the text (by highlighting sections, adding notes, searching the text, etc.). So, any old paper files that you want to analyze will need to be rendered in a digital but also in a datafied format.
Once the text is ready for analysis, there are a number of commercially available text analytics tools that can help you. Which one you use will depend on your objective.
Text Analytics in Action
Unsure of how you would use text analytics in practice? Say, for example, you are concerned about the level of employee engagement in your company and decide to conduct an employee-engagement survey. You could read through hundreds of questionnaire responses, and that might give you some good ideas or a sense of who is happy and who is not. But it wouldn’t really give you any indication of trends or what the collective was really feeling.
Text analytics allows you to assess all that free-flowing unstructured text and establish trends or clusters of opinion in the business, divisions, and within specific teams. In fact, I know of one organization that uses text analytics to avoid having to do employee surveys in the first place. Instead, the company simply scans and analyzes the content of emails sent by the staff as well as their employees’ media posts on Facebook or Twitter. This allows the company to understand the levels of staff engagement without the time and expense of a traditional survey.
Text analytics is also having a big impact beyond the world of business. In healthcare, for example, companies are using text analytics to extract large amounts of information from patient medical records – information that then can be used to understand the overall health of the population and improve treatment methods. Once such company, Apixio, analyzes the information found in electronic healthcare records, such as GP notes, consultant notes, radiology notes, pathology results, etc.
To analyze this information, which comes in a wide variety of formats and may even be handwritten, the company first has to turn it into something that computers can analyze. The company does this using OCR (optical character recognition) technology to create a textual representation of the information that computers can read and understand. The data then can be analyzed at an individual patient level, or it can be aggregated across the population in order to derive big-picture insights around disease prevalence, treatment patterns, etc. Apixio hopes that, by mining such practice-based clinical data for information – who has what condition, what treatments are working, etc., we can learn a lot about the way we care for individuals and make improvements based on actual knowledge of what works and what doesn’t.
A Word of Warning
Converting older, paper-based text documents into something that can be used for analysis can be very time consuming and expensive, so it’s best to be selective rather than attempting to analyze everything you have lying around in your archives. Also keep in mind that most data has a shelf life. Rather than converting old text into an analysis-ready format, it is often better to focus on the new text data you already have access to, such as emails and social media posts.
Bernard Marr is a bestselling author, keynote speaker, strategic performance consultant, and analytics, KPI, and big data guru. In addition, he is a member of the Data Informed Board of Advisers. He helps companies to better manage, measure, report, and analyze performance. His leading-edge work with major companies, organizations, and governments across the globe makes him an acclaimed and award-winning keynote speaker, researcher, consultant, and teacher.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.