Text Analytics in R – Internet of Things (IoT)

Internet of Things (IoT) Text Analytics in R

A small corpus of ten articles related to the Internet of Things (IoT) were collected for the purpose of text analytics.  Using R, each article was cleaned for unusual characters, changed to lower case,  removed numbers, punctuation, stop words, white space along with any additional terms that were not caught using the default stop words.  This resulted in 4,366 terms.

The top terms in R:  data(235); IoT (187); analyt(162); busi(88); product (87) and for comparison purposes, were also run in Termine which results in similar results:  analytics, data, things, predictiveanalytic, time, IoT

Stemming the terms in R, resulted in the following top 50 IoT terms from the corpus of documents.  The goal of both stemming and lemmatization is to reduce inflectional forms and  derivations forms of a word to a common base form.  For example, words such as analyst, analysts, analytic, analytics, analysis when stemmed become  –> ‘analyt’ which we see below.  Stemming and lemmatization is very powerful and easy to do.

Text Analytics - Top Terms

Text Analytics – Top Terms

Snippet of the terms based on each article:

Text Analytics Snippet of Terms

Text Analytics Snippet of Terms

Viewing the different topics and terms:

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
1 analyt iot technolog data product
2 predict connect learn use predict
3 data cost new devic data
4 iot industri provid base iot
5 busi compani use custom devic
6 time organ digit inform servic
7 process build smart scientist mainten
8 creat reduc transform collect analyt
9 also better machin group ibm
10 manag mani lead one model

Word Cloud

Using R to create a word cloud based on the words meeting a volume threshold.  As you can see, all of these terms can be easily related to the IoT.

Text Analytics - Word Cloud

Text Analytics – Word Cloud

Interesting to see how the words are correlated.  Below are the top 25 word correlations.  Again, we can easily relate to the terms such as ‘busi’ and ‘oper’ for business operations and say, ‘manag’ and ‘prod’ for manage products.  Two of the articles in the corpus was written by IBM and hence, it’s easily to understand how ‘IBM’ is correlated with the other IoT terms.

Text Analytics - Word Correlation

Text Analytics – Word Correlation

A simple dendrogram based on the ten articles

Text Analytics Dendrogram

Text Analytics Dendrogram

 

Clustering the documents, we end up with three main clusters.

Text Analytics resulting word clusters

Text Analytics resulting word clusters

 

A simple sentiment analysis was also conducted on all of the ten articles.  While not wanting to bore everyone with the details, I’ve provided one of the articles to view and compare the results to.

Sentiment Analysis based on the article “Creating New Value with Digital Transformation” 

Text Analytics Sentiment Analysis

Text Analytics Sentiment Analysis

The articles used in the corpus are as follows:

  • Opportunities and Challenges:  Predictive Analytics for IoT by Bala Deshpande, Founding Partner, SimaFore & Chair, PAW – Manufacturing
  • Solution Brief: IBM Software Internet of Things – IBM 
  • How to Get Started with IoT and Generate Quick Returns on Your Investment – Maciej Kranz
  • Predictive Analytics Are a Traveler’s Best Friend – This content is made possible by support from SAS
  • Predictive Analytics as a service for IoT by ajit
  • Transformational Analytics: Internet of Things analytics by Arnab Chakraborty, Michael Svilar and Prith Banerjee (l-r)
  • Are Predictive Analytics The Future Of IoT? By: Tripp Braden 
  • How to Start a Successful IoT Journey, Smart Cities, and the Industrial Internet Revolution by Calum McClelland
  • Innovation Today’s IoT Opportunity: B2B or B2C? By Maciej Kranz 
  • Creating New Value with Digital Transformation by The Business Times

Leave a Reply