Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active September 12, 2018 20:19
Show Gist options
  • Save dannguyen/f415b1797f686f995f8e to your computer and use it in GitHub Desktop.
Save dannguyen/f415b1797f686f995f8e to your computer and use it in GitHub Desktop.
Bash function to read Wikipedia Pageviews API and pass it into Google Static Charts API to get 120-day time series of pageviews

An example in Bash of how to use Wikipedia's newly-announced Pageviews API and pass it into Google's hopefully-eternally-functioning Static Charts API. The result is a a quick time-series chart that visualizes 120 days of pageviews for any given Wikipedia page. Shout-out to the awesome jq command-line JSON tool.

The charts are crudely labeled...with a little more work you could turn it into a more interesting comparison chart (i.e. a grouped bar chart). But for now, here's what the last 120 days of page visits look like for Donald Trump and Hillary Clinton

(I've manually edited the URL for Hillary Clinton's results so that it's on the same y-axis scale as Trump's. Again, if I spent a little more time reading Google's API docs, I could make a nicer-looking grouped chart instead of doing 2 different charts) ¯\_(ツ)_/¯

Donald Trump pageviews

Hillary Clinton pageviews

The Bash function

I thought there's a Bash one-liner in there somewhere to read the API and convert it to a Google Static chart...but I haven't had my full cup of coffee this morning. Also, my Bash-fu is weak, so here it is as an ugly function that should obviously not be written in Bash...but I'm sure someone can turn it into a cool Bash one-liner.

Wikipedia (or rather, Wikimedia) has a nice documentation page for their APIs.

The endpoint to get per-article user (i.e. not web spider) pageviews, on a daily basis, from August 2015 to December 1st, 2015, looks like this:

https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Star_Wars/daily/20150801/20151201

That URL returns a JSON response that looks like:

{
    "items": [{
        "project": "en.wikipedia",
        "article": "Star_Wars",
        "granularity": "daily",
        "timestamp": "2015080100",
        "access": "all-access",
        "agent": "user",
        "views": 15271
    }]
}

How the bash function described below works: Given a user-supplied argument, such as "Star_Wars", the JSON response from Wikipedia's API is parsed with the jq tool, then wrangled into a series of comma-delimited values to be passed as a string that fits the Google Static Charts API specification.

Sample use

$ chart_wpageviews Star_Wars

Result

A very long URL with the pageview values passed in as a comma-delimited string of numbers.

Rendering this URL in an image tag:

Google static chart

Here's the bash function in all of its ugliness:

function chart_wpgviews(){
      # first argument is article name
      # since it seems to be locked at a max of 4 months anyway...and hard-coding en.wikipedia
      PAGENAME="$1"
      PROJECT="en.wikipedia"
      ENDTIME=$(date +%s) 
      STARTTIME=$(date --date="@$(($ENDTIME-120*24*60*60))" +%s) # 120 days ago
      ENDDATE=$(date --date="@$ENDTIME" +%Y-%m-%d)
      STARTDATE=$(date --date="@$STARTTIME" +%Y-%m-%d)

      RESPONSE=$(curl -s https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/$PROJECT/all-access/user/$PAGENAME/daily/$(tr -d '-' <<< $STARTDATE)/$(tr -d '-' <<< $ENDDATE))
      VALS=$(jq '.items[] | .views' <<< $RESPONSE | tr '\n' ',')
      MAXVAL=$(jq '.items[] | .views' <<< $RESPONSE | sort -rn | head -n1)
      echo "https://chart.googleapis.com/chart?chs=800x200&cht=bvg&chd=t:${VALS%?}&chds=0,$MAXVAL&chbh=5,0,1&chxt=x,y&chxp=0,0&chxl=0:|$STARTDATE|$ENDDATE|1:||$((MAXVAL/2))|$MAXVAL&chtt=Last+120+days+of+pageviews+for+Wikipedia+page+on+$PAGENAME"
    }

Obviously, this should be done in Python, as Wikimedia developers have already created a helpful Python wrapper for the API:

https://github.com/mediawiki-utilities/python-mwviews

@tomayac
Copy link

tomayac commented Dec 16, 2015

Thanks for this fun hack. Using my freshly released pageviews.js JavaScript client library, this could easily be done in a browser as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment