Text Analytics – Mining the Abundance of Wealth underneath the vastness of Text

What exactly is Text Analytics?Text Analytics is the process of examining large collections of written resources to generate new information, and to transform the unstructured text into structured data for use in further analysis..It is the process of deriving high-quality information from text..The overarching goal is, essentially, to turn text into data for analysis, via application of natural language processing (NLP) and analytical methods.Text mining identifies facts, relationships and assertions that would otherwise remain buried in the mass of textual data..These facts are extracted and turned into structured data, for analysis, visualization, integration with structured data in databases or warehouses, and further refinement using machine learning (ML) systems.2..The Foundations – Decoding & Quantifying TextWhile text is considered unstructured, there is an enormous amount of complexity and nuance contained in high-level human language, which makes text analytics extremely fertile ground for gleaning insights about people and what they’re thinking and feeling.The challenge though is obvious..The data is unstructured as well as voluminous, and unless a method is found to quantify various aspects of it, the information derivation will always be slow, subjective and limited..The process by which text mining solves the problems of structure and scale is where data science comes in..The basic approach is to turn text into numbers, so that we can use machines to analyze the large volumes of documents and discover insights through mathematical algorithms.Lets go through a real example of dealing with a dataset and the steps to convert the unstructured data to a structured one for being able to apply the algorithm of our choice depending on our goal..I downloaded a dataset about tweets of passengers about US airlines..The objective is to build a model that can predict the sentiment of a customer based on what he/she comments (without a human intervention to interpret)..The data can be downloaded from data.world and link to it is Twitters About US Airline.One of the first steps in the text mining process is to organize and structure the data in some fashion so it can be subjected to both qualitative and quantitative analysis..The foundation structure is to convert the text to either what is technically called as Term Document Matrix(TDM) or Document Term Matrix (DTM).. More details

