So we can get how many times each word is used in each sentiment, say anticipation sentiment group or joy sentiment group.# Plot the frequent words facet by emotionstop_nrc = dtm_sentiment %>% count(term, sentiment, wt = count) %>% group_by(sentiment) %>% top_n(10)top_nrc %>% ggplot(aes(x = reorder(term, n), y = n, fill = sentiment)) + geom_col(show.legend = F) + coord_flip() + facet_wrap(sentiment~., scales = 'free') + labs(x = NULL, y = NULL, title = 'The Frequent Words with NRC Lexicon')With facet_wrap(), we can make plots separately per sentiments..coord_flip() flips the x-axis and y-axis, so it’s very useful when the x-axis has long label names, which is often cases in NLP..You can also make word cloud plot with this sentiment..More detailed descriptions are available here.Step 4..Principal Component AnalysisDimension reduction technique is often used in natural language processing..The purpose of dimension reduction is for caring multicollinearity to capture the most compelling features from the data..Principal component analysis is one of the most commonly used dimension reduction techniques.Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components.In plain English, principal components are ‘extract’ of the data, which means taking out the most informative part of the data and leaving redundant information..Something like losing unnecessary fat.# PCApca = prcomp(x = term_dtm_2)plot(pca, type = 'l')# the value of new featureshead(pca$x[, 1:10])We can see the actual values of the components like below..The interesting part here is what information do these new features explain to us..Actually, the order of the features isn’t a random one but the order of ‘discrimination ability’..The first component explains maximum variance in the dataset..In other words, that feature is the optimal vector to spread and differentiate the data the most efficiently.. More details