Ritchie King

Visual Journalist, FiveThirtyEight

April 7, 2015

I love D3. I use it all the time in the work that I do. Here are some examples:

FiveThirtyEight's 2014 Senate Forecast

I still get nerdily excited about it. You can use D3 to make an endless array of charts and data visualizations. And the code that underlies it is super elegant.

D3 was written by Mike Bostock, who now works primiarly at the New York Times, where he and his colleagues have made some exsquisite data visualizations:

##What is D3?

It's JavaScript, for starters. It's an open source JavaScript library, freely available online, and maintained on Github.

d3.js

But to understand what D3 actually is, it's important to start with what D3 is not. D3 is not a charting library. That might seem a bit surprising to you... you've probably heard the exact opposite.

To see what I mean, let's take a quick look at how an honest-to-god charting library — ggplot in R — does things.

If you want to make a super simple line chart in ggplot. Here's how you do it:

data <- data.frame(x=c(2005, 2006, 2007, 2008), y=c(121, 110, 115, 117))

qplot(x, y, data=d, geom='line')

If D3 were a charting library, it might look pretty similar, maybe something like this:

var x = [2005, 2006, 2007, 2008],
    y = [121, 110, 115, 117];

d3.chart({
  x: x,
  y: y,
  type: 'line',
  width: 400,
  height: 400
});

Just like with qplot, you have a function, in this case I've called it d3.chart(), you pass in some x coordinates and some y coordinates, you specificy the type of chart that you want to create, and maybe, since we're dealing with web pages, you specify some dimesions.

BUT, that's not what D3 looks like. In reality, if you wanted to create that basic line chart in D3, you would need to do something like this:

var data = [
  {x: 2005, y: 121},
  {x: 2006, y: 110},
  {x: 2007, y: 115},
  {x: 2008, y: 117},
];

var margin = {top: 20, right: 20, bottom: 30, left: 50},
    width = 400 - margin.right - margin.left,
    height = 400 - margin.top - margin.bottom;

var x = d3.scale.linear()
    .domain(d3.extent(data, function(d) { return d.x; } ))
    .range([0, width]);

var y = d3.scale.linear()
    .domain([0, d3.max(data, function(d) { return d.y ;})])
    .range([height, 0]);

var xAxis = d3.svg.axis()
    .scale(x)
    .orient('bottom');

var yAxis = d3.svg.axis()
    .scale(y)
    .orient('left');

var line = d3.svg.line()
    .x(function(d) { return x(d.x); })
    .y(function(d) { return y(d.y); });

var svg = d3.select('body').append('svg')
    .attr('width', width + margin.left + margin.right)
    .attr('height', height + margin.top + margin.bottom)
  .append('g')
    .attr('transform', 'translate(' + margin.left + ',' + margin.top + ')');

 svg.append('g')
     .attr('transform', 'translate(0,' + height + ')')
     .call(xAxis);

 svg.append('g')
     .call(yAxis);

 svg.append('path')
     .datum(data)
     .attr('d', line);

That's a lot of code. Especially to produce arguably the most basic chart form for a data set of 4 points. How does D3 justify making you go through so much trouble to do something so simple? With software and coding packages there's almost always a tradeoff between ease of use and flexibility and power. Which makes sense. A tool is easy to use if a lot of the decisions have already been made for you — but if it's really hard or impossible to override those decisions, the range of what that tool can do is limited.

D3 is on the extreme end of that spectrum.

Which is why Mike Bostock's introductory tutorial on how to make a bar chart is an extensive 3-part series

Even though D3 is tough, I'm going to try to convince you that learning it is very much worth it.

Make trees out of data

So, if D3 isn't a charting library, what is it, exactly? The answer lies in its name. D3 stands for data-driven documents. The documents in question here aren't secret FBI files or anything — they're web documents. Basically, web pages. And the data-driven bit refers to the fact that D3 enables you to determine what goes on a web page based entirely on data.

Every web page has a structure that is specified by what's called the Document Object Model, or the DOM. The DOM is organized into a tree-like hierarchy, and every element on the page is a node in that tree. For instance, you might have a web page that looks like this...

Headline

Paragraph

... and the underlying HTML would look something like this ...

<html>
  <head>
  </head>
  <body>
    <h1>Headline</h1>
    <div>
      <p>Paragraph</p>
    </div>
  </body>
</html>

... which would mean the so-called DOM tree looks something like this:

So at the top you have the document, then you have your root HTML element, then the head, and the body of the page, and then your h1's and div's, and paragraphs, etc. Every web page is represented by a similar underlying tree structure.

What D3 does on a fundamental level is allow you to take some data and use it to shape that DOM tree. Using D3 is like being a data-driven DOM arborist.

Let's take a look at how this works. Let's say you have a really basic set of data, just an array of strings:

var data = [
    'Paragraph 1',
    'Paragraph 2',
    'Paragraph 3',
    'Paragraph 4',
    'Paragraph 5'
];

And you want to create a new paragraph on your web page for each of those strings. D3's way of doing that is through something called a data-join. Say this is what your DOM tree looks like:

show basic structure

And this is what your data looks like:

show diagram of data

With D3, you can grab a node, in this case let's say the body of the page and, using a data-join, you can say, okay, take my array of data and create a new node for every entry in that array. In this case, we want each of those nodes to be a paragraph element.

diagram

So we've created five paragraph elements. But they're still empty. Fortunately, when D3 performs a data-join like this, as the name suggests, it actually joins, or binds the data to the nodes that it creates. We'll see the significance of that later, but for now, suffice it to say, you can take advantage of this fact to set the text in each of those paragraphs equal to the corresponding string in the data set. So you get this:

diagram

Headline

Paragraph

And here is what the code looks like:

d3.select('body')
    .selectAll('p')
    .data(data)
  .enter().append('p')
    .text(function(d) {
      return d;
    });

D3 uses this method called .select() to grab nodes, .selectAll() grabs all the paragraph elements (but of course there aren't any) and then .data() initiates the data-join and creates those empty placeholder nodes, .enter() selects the new nodes, .append() makes those nodes paragraph nodes and .text() sets the text of each of those paragraphs to equal to the corresponding string in the array of data.

So this business of creating new nodes on a page is really great. But it turns out that's not all data-joins are capable of. They can also update the nodes that currently exist on a page.

So let's say we have a page that we've just created with all of those paragraphs and we want to swap out the text of those paragraphs with new text:

var data2 = [
    'Paragraph A',
    'Paragraph B',
    'Paragraph C',
    'Paragraph D',
    'Paragraph E'
];

We can perform another data-join to do that:

show diagram

And here's what the code looks like:

d3.select('body')
    .selectAll('p')
    .data(data2)
    .text(function(d) {
      return d;
    });

This time, since those nodes already exist, we don't need to use .enter() and .append() to create them. We just bind that new data and use it to update the text in each paragraph.

One last example: What if we, yet again, want to change the values of these paragraphs based on a new data set, but that data set has fewer entries?

var data3 = [
    'Paragraph I',
    'Paragraph II',
    'Paragraph III'
];

show diagram

It would be good to have some way to target those extra, now unneeded nodes so we can get rid of them. Fortunately, with D3 you can do that.

d3.select('body').selectAll('p')
    .data(data3)
    .text(function(d) { return d; });
  .exit().remove();

show web page

Data-joins in D3 can do basically three things: they can make new elements enter the page, they can update elements that are currently on the page, and they can tell elements to leave or exit the page. Enter, update, exit. And they do these three things simply by counting. If your data set has more entries than there are nodes, it will tell nodes to enter the page. If a node already exists for the data point in question, then you can use a data-join to update it. And excess nodes? Exit.

show diagram

Here's an example of these three phases of a data-join in action:

That gif is taken from an example of Mike Bostock's. With each cycle, a random array of letters that can vary in length is created and bound to the page with a data-join. The entering letters are shown in green, the updated letters in black, and the exiting letters disappear immediately.

Here's another example, also from Bostock, with transitions that make it more apparent just what's going on:

##Using SVG

At this point you're probably wondering: what does all of this have to do with data visualization, exactly? Creating paragraphs and changing their text is a neat party trick, but it's a little hard to see how you can use it to make a line chart.

If you haven't heard of SVG before, I should introduce you. SVG stands for scalable vector graphics, and it's an XML-based web standard for representing graphical elements online. It's a lot like HTML, except the elements, instead of being mostly textual, like headers, paragraphs, and tables, are mostly geometrical: lines, paths, circles, rectangles, and other shapes.

When you use SVG, those shapes exist as nodes in the DOM tree, so you can manipulate them with D3.

For example, here's a gif of an interactive bar chart that I show readers how to build in my intro D3 textbook:

The chart shows a breakdown of the world's population by age group, and you can click on different years to see how it's changed over time. If you take a look at the DOM when the page first loads, this is what you see:

Inside the body of the page, there are a bunch of familiar HTML elements, but then there's this SVG element, with g elements — groups — and rect's — rectangles - tucked inside.

show abbreviated diagram

You can also see that the rectangles each have a different, and very precise, width. When D3 creates this bar chart, it joins a data point — which, unlike our paragraph example above, is not a string of text, but a number signifying the proportion of the population within a given age group — with each of those rectangles. Then, it scales the width of that rectangle proportionally so that it's a nice size on the screen.

What you can also see from this bar chart is something that's really nice about the update part of conducting a data-join. When you click on one of the years above, a data-join is conducted, binding the data for that year to the rectangles in the DOM. Telling those rectangles to animate — to grow or shrink to their new size – takes only one line of code in D3. Let's say the data for year you just clicked on is referenced by a variable called newData:

d3.selectAll('rect')
    .data(newData)
    .transition()
    .attr('width', function(d) { return x(d.value); });

That's it — .transition() is all you need to create that animation when you click. (In this case, when we update the bars, instead of changing the text like we did with the paragraph example above, we're changing the width of each of the bars. x is the function that takes the value of each data point and scales it up to the right number of pixels).

Extra stuff

Now, of course, a chart or a data visualization is more than just the basic shapes that encode the data. A line chart is more than just a line, a bar chart is more than just a set of rectangles. There are also labels or axes that explain what value is represented by each point along that line or by each of the bars in that set. Fortunately, D3, even though it isn't a charting library, has some funcitons to help with that stuff.

When it comes to axes, D3 has axis generators...

d3.svg.axis()
    .scale(x);

... which draw an axis, including tick marks and text labels, onto a chart based on a scale function. And once you've created that axis, there are all sorts of ways you can customize it:

d3.svg.axis()
    .scale(x)
    .orient('top')
    .ticks(5)
    .tickFormat('%');

That's the code I used to create the x axis in the population distribution bar chart above. I want that axis to be at the top of the chart, hence .orient('top'), I want there to be tick marks at 0, 5, 10, and so on, and I want the text of the tick mark labels to be formatted so they have percent signs at the end. The axis generator can do all of those things.

##Resources

ritchieking/README.md

d3.js

Make trees out of data

Headline

Headline