Skip to content

Instantly share code, notes, and snippets.

@noamross
Created April 22, 2015 01:13
Show Gist options
  • Save noamross/54562da185a393c09ccf to your computer and use it in GitHub Desktop.
Save noamross/54562da185a393c09ccf to your computer and use it in GitHub Desktop.
Create a wordcloud of your google search history
# Script to make a word cloud of your google searches. Get your google search
# history at http://history.google.com. This script assumes the JSON files
# exported are in a 'Searches' subfolder
library(jsonlite)
library(rlist)
library(magrittr)
library(stringi)
library(wordcloud)
library(tm)
library(SnowballC)
queries = lapply(list.files('Searches', full.names=TRUE), fromJSON, simplifyDataFrame=FALSE) %>%
do.call("c", .) %>%
do.call("c", .) %>%
do.call("c", .) %>%
list.mapv(.$query) %>%
tolower %>%
removePunctuation %>%
removeWords(stopwords("english")) %>%
wordStem
words = stri_split_regex(queries, "\\s") %>%
do.call("c", .) %>%
`[`(., . != "")
word_table = table(words) %>%
sort(decreasing = TRUE)
pal <- colorRampPalette(c("red","blue"))(10)
wordcloud(names(word_table), word_table, scale=c(3, 1), min.freq=10,colors=pal,random.order=TRUE, max.words=200)
@timelyportfolio
Copy link

neat, thanks for sharing. I hope some people extend to other text analysis. Unimportant, but the title is missing the "g" in google.

@daattali
Copy link

works perfectly, really cool

@oganm
Copy link

oganm commented Apr 23, 2015

Below is a crappy way to remove google map searches from the results since they mess up the results a little. It'll screw up your results if you're using "->" a lot in your search terms though.

queries = lapply(list.files('Searches', full.names=TRUE), fromJSON, simplifyDataFrame=FALSE) %>%
    do.call("c", .) %>%
    do.call("c", .) %>%
    do.call("c", .) %>%
    list.mapv(.$query) %>%
    tolower

queries = queries[!grepl(pattern = '[-][>]',x = queries)]

queries = queries %>%    
    removePunctuation %>%
    removeWords(stopwords("english")) %>%
    wordStem

@irichgreen
Copy link

Hello,

I've got an error meesage when ececute queries as below.
Can you resolve this problem?

queries = lapply(list.files('Searches', full.names=TRUE), fromJSON, simplifyDataFrame=FALSE) %>%
do.call("c", .) %>%
do.call("c", .) %>%
do.call("c", .) %>%
list.mapv(.$query) %>%

tolower

Error in do.call("c", .) : second argument must be a list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment