What is Text Mining?
Text mining is the process of deriving novel information from a collection of texts (also known as a corpus). By “novel information,” we mean associations, hypotheses, or trends that are not explicitly present in the text sources being analyzed.¹ Text mining is designed to help businesses identify valuable insights from text-based content like survey open-ended questions, CRM systems, word documents, and social media.
How Does Text Mining Work?
Text mining software transposes words from unstructured data into numerical values. A library of pattern identification algorithms is then deployed by our analysts to search for patterns. Often, interesting learning emerges by comparing one set of documents against others. Large-scale deployments of text mining can require a significant amount of development work for specific applications. We may need to build a custom ontology (i.e. special terminology “dictionary” for the subject domain), develop business-rules based parsing engines, etc.
For simple one-time deployments, we often use software to help automate the development of coding and to produce descriptive summary reports.
Advantages of Text Mining
If you have available text and a curiosity about the patterns, relationships, etc. that are hidden in the text, you can benefit from text mining. You can save time and resources by performing text analytics efficiently, compared to individuals drudging through the content; and the results will be consistently derived—very useful when tracking opinions over time. Text mining is most beneficial when:
- Summarizing documents
- Extracting concepts from text
- Indexing text for use in predictive analytics
Applications of Text Mining
- Search engines
- Email spam filters
- Product suggestions at check-out
- Fraud detection
- Customer Relationship Management
- Social Media Analysis
Text Mining Techniques
We use a number of text mining techniques and apply the technique that best fits your problem. We do everything from manual coding for 1-time projects, to computer assisted indexing (HyperResearch) to automated coding for on-going tracking studies.
HyperResearch™ enables you to code and retrieve, build theories, and conduct analyses of your data. With its advanced multimedia capabilities, HyperResearch allows you to work with text (including “rich text” and Unicode text), graphics, audio, and video sources — making it an invaluable research analysis tool.
¹ From: Nisbet, Elder and Miner, “Handbook of Statistical Analysis and Data Mining Applications,” Elsevier Inc, 2009.