Random Forests

Introduction Following on from the previous post about decision trees let us move on to Random Forests. Let us use the Soybean data from the ‘mlbench’ package. There are 35 features and 683 observations with 16 varieties of Soybean. Why care about Random Forests? Let us look at how our decision trees predict previous unseen data. First we will load the data in: library(mlbench) library(caret) data("BreastCancer") dim(BreastCancer) Let us now split the data up into a training and test data set.

Who is the angriest?

Overall sentiments - magnitude overallData <- subset(sentimentData, select = c('file','Date','magnitude','score')) p <- ggplot(overallData, aes(x=Date, y = magnitude, colour=file)) + geom_line() + ggtitle('Overall show sentiment magnitude') + xlab('Date') + ylab('Magnitude') + labs(color="Shock Jock") + theme_bw() p ggsave('1.png',p) Overall sentiments - score p <- ggplot(overallData, aes(x=Date, y = score, colour=file)) + geom_line() + ggtitle('Overall show sentiment score') + xlab('Date') + ylab('Score') + labs(color="Shock Jock") + theme_bw() p ggsave('2.png',p) Segment Analysis - By Day - 1st August dateData <- filter(sentimentData, sentimentData$Date == '2018-08-01') dateData <- mutate(dateData, percentageDone = case_when( file == 'Ben Fordham' ~ X / nrow(filter(dateData, file == 'Ben Fordham')), file == 'Ray Hadley' ~ X / nrow(filter(dateData, file == 'Ray Hadley')), file == 'Chris Smith' ~ X / nrow(filter(dateData, file == 'Chris Smith')), file == 'Alan Jones' ~ X / nrow(filter(dateData, file == 'Alan Jones')) )) p <- ggplot(dateData, aes(x=percentageDone, y = sentiment.