Skip to content

Instantly share code, notes, and snippets.

@alansmithy
Last active February 11, 2019 17:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save alansmithy/4a863fed88f346e89921454dae3ab8f9 to your computer and use it in GitHub Desktop.
Save alansmithy/4a863fed88f346e89921454dae3ab8f9 to your computer and use it in GitHub Desktop.
Anscombe's Quartet with simple-statistics and d3js v4 -
license: mit

Anscombe's Quartet

I use Frank Anscombe's quartet alot when teaching data visualisation courses: Despite the arrival of the brilliant Datasaurus Dozen, it's still a powerful illustration of the drawback of relying solely on basic descriptive statistics to summarise data. The data in all 4 of the graphs in the quartet are virtually indentical when using standard descriptive methods - the graphs reveal the truth. Looking at your data before analysing it is something that Anscombe was passionate about:

"Most kinds of statistical calculation rest on assumptions about the behaviour of the data. Those assumptions may be false, and then the calculations may be misleading. We ought always to try to check whether the assumptions are reasonably correct; and if they are wrong we ought to be able to perceive in what ways they are wrong. Graphs are very valuable for these purposes."

Frank Anscombe, Graphs in Statistical Analysis (1973)

I became increasingly embarassed that I kept using screengrabs from Wikipedia in my course slides, so I decided to create my own version using d3js and simple-statistics, now updated to use d3js v4 and simple-statistics v4.1.

seriesname x y
series 1 10 8.04
series 1 8 6.95
series 1 13 7.58
series 1 9 8.81
series 1 11 8.33
series 1 14 9.96
series 1 6 7.24
series 1 4 4.26
series 1 12 10.84
series 1 7 4.82
series 1 5 5.68
series 2 10 9.14
series 2 8 8.14
series 2 13 8.74
series 2 9 8.77
series 2 11 9.26
series 2 14 8.1
series 2 6 6.13
series 2 4 3.1
series 2 12 9.13
series 2 7 7.26
series 2 5 4.74
series 3 10 7.46
series 3 8 6.77
series 3 13 12.74
series 3 9 7.11
series 3 11 7.81
series 3 14 8.84
series 3 6 6.08
series 3 4 5.39
series 3 12 8.15
series 3 7 6.42
series 3 5 5.73
series 4 8 6.58
series 4 8 5.76
series 4 8 7.71
series 4 8 8.84
series 4 8 8.47
series 4 8 7.04
series 4 8 5.25
series 4 19 12.5
series 4 8 5.56
series 4 8 7.91
series 4 8 6.89
<!DOCTYPE html>
<head>
<meta charset="utf-8">
<script src="https://d3js.org/d3.v4.min.js"></script>
<script src="https://unpkg.com/simple-statistics@4.1.0/dist/simple-statistics.min.js"></script>
<style>
body { margin:0;position:fixed;top:0;right:0;bottom:0;left:0;font-family:metric,sans-serif; }
text{font-family:metric,sans-serif;}
circle{fill:#bb6d82}
.background{fill:#fff1e0}
.chartTitle{text-anchor:middle}
.regLine{stroke:#94d7ea;stroke-width:2px;opacity:0.7}
.summary{fill:#aaaaaa;font-size:9.6px}
</style>
</head>
<body>
<script>
//basic layout information
//bl.ocks.org iframe is 960x500, so these should do OK;
const w = 400;
const h = 240;
const margin = {left:50,right:20,top:50,bottom:20}
const dotSize = 5;
const innerW = w-(margin.left+margin.right);
const innerH = h-(margin.top+margin.bottom);
//load the data
d3.csv("anscombe.csv",function(data){
//co-erce the numeric data into floats...
data.forEach(function(d) {
d.x = +d.x;
d.y = +d.y;
});
//...set chart parameters
const xMin = 2;
const yMin = 2;//for consistency with all the other Anscombe charts out there ;-)
const xMax = d3.max(data, function(d) { return d.x; });
const yMax = d3.max(data, function(d) { return d.y; });
//create scales
const xScale = d3.scaleLinear()
.domain([xMin,xMax])
.range([0,innerW])
.nice();//neater axes courtesy of californ-i-a
const yScale = d3.scaleLinear()
.domain([yMin,yMax])
.range([innerH,0])
.nice();
//create axes
const xAxis = d3.axisBottom()
.scale(xScale)
.ticks(7);
const yAxis = d3.axisLeft()
.scale(yScale)
.ticks(5)
//nest the data into the different series
const nested = d3.nest()
.key(function(d) { return d.seriesname; })
.entries(data);
//create an SVG element for each plot
let plots = d3.select("body").selectAll("svg")
.data(nested)
.enter()
.append("svg")
.attr("width",w)
.attr("height",h);
//background rect - FT pink ;-)
plots.append("rect")
.attr("x",margin.left)
.attr("y",margin.top)
.attr("width",innerW)
.attr("height",innerH)
.attr("class","background");
//plot titles
plots.append("text")
.attr("class","chartTitle")
.attr("x",w/2)
.attr("y",margin.top-10)
.text(function(d,i){
return "Series "+(i+1)
})
//put x axis on each plot
plots.append("g")
.attr("transform","translate("+margin.left+","+(h-margin.bottom)+")")
.call(xAxis)
//put y axis on each plot
plots.append("g")
.attr("transform","translate("+margin.left+","+margin.left+")")
.call(yAxis)
//plot the data
plots.append("g")
.attr("transform","translate("+margin.left+","+margin.top+")")
.selectAll("circle")
.data(function(d){return d.values})
.enter()
.append("circle")
.attr("r",dotSize)
.attr("cx",function(d){
return xScale(d.x)
})
.attr("cy",function(d){
return yScale(d.y)
})
//now to show statistical similarity between the series
plots.each(function (d,i){
//extract the x vals and y vals into separate arrays
const xVals = d.values.map(function(e,j){
return e.x;
});
const yVals = d.values.map(function(e,j){
return e.y;
});
//calculate summary stats to correct precision
const meanX = ss.mean(xVals).toFixed(0);
const meanY = ss.mean(yVals).toFixed(2);
const varX = ss.sampleVariance(xVals).toFixed(0);
const varY = ss.sampleVariance(yVals).toFixed(1);
const corCoeff = ss.sampleCorrelation(xVals,yVals).toFixed(3);
//for regression in simple-statistics, we need to generate x,y co-ordinate pairs
const pairs = [];
xVals.forEach(function(d,i){
pairs.push([xVals[i],yVals[i]])
})
//calculate slope and intercept
const linReg = ss.linearRegression(pairs);
//generate line function from slope and intercept
const linRegLine = ss.linearRegressionLine(linReg);
//text content for the summary stats
let summary=[
["Mean of x: "+meanX],
["Mean of y: "+meanY],
["Sample variance of x: "+varX],
["Sample variance of y: "+varY],
["Correlation between x and y :"+corCoeff],
["Linear regression line: y="+linReg.b.toFixed(2)+" + "+linReg.m.toFixed(3)+"x"]
]
d3.select(this).append("g")
.selectAll("text")
.data(summary)
.enter()
.append("text")
.attr("class","summary")
.attr("y",function(d,i){
return 160+(i*10)
})
.attr("x",230)
.text(function(d,i){return d})
//create initial line; x1/2 and y1/2 the same
let line = d3.select(this).append("line")
.attr("transform","translate("+margin.left+","+margin.top+")")
.attr("class","regLine")
.attr("x1",xScale(xMin))
.attr("x2",xScale(xMin))
.attr("y1",yScale(linRegLine(xMin)))
.attr("y2",yScale(linRegLine(xMin)))
//animate line - just to re-inforce the similarity
line.transition().duration(1000).delay(2000)
.attr("x2",xScale(xMax))
.attr("y2",yScale(linRegLine(xMax)))
})//end calc summary stats
})//end data load
</script>
</body>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment