Skip to content

Instantly share code, notes, and snippets.

@mbostock
Forked from jfirebaugh/faithful.js
Last active March 24, 2024 13:43
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save mbostock/4341954 to your computer and use it in GitHub Desktop.
Save mbostock/4341954 to your computer and use it in GitHub Desktop.
Kernel Density Estimation
license: gpl-3.0
redirect: https://observablehq.com/@d3/kernel-density-estimation

Kernel density estimation is a method of estimating the probability distribution of a random variable based on a random sample. In contrast to a histogram, kernel density estimation produces a smooth estimate. The smoothness can be tuned via the kernel’s bandwidth parameter. With the correct choice of bandwidth, important features of the distribution can be seen, while an incorrect choice results in undersmoothing or oversmoothing and obscured features.

This example shows a histogram and a kernel density estimation for times between eruptions of Old Faithful Geyser in Yellowstone National Park, taken from R’s faithful dataset. The data follow a bimodal distribution; short eruptions are followed by a wait time averaging about 55 minutes, and long eruptions by a wait time averaging about 80 minutes. In recent years, wait times have been increasing, possibly due to the effects of earthquakes on the geyser’s geohydrology.

This example is based on a Protovis version by John Firebaugh. See also a two-dimensional density estimation of this dataset using d3-contour.

[79,54,74,62,85,55,88,85,51,85,54,84,78,47,83,52,62,84,52,79,51,47,78,69,74,83,55,76,78,79,73,77,66,80,74,52,48,80,59,90,80,58,84,58,73,83,64,53,82,59,75,90,54,80,54,83,71,64,77,81,59,84,48,82,60,92,78,78,65,73,82,56,79,71,62,76,60,78,76,83,75,82,70,65,73,88,76,80,48,86,60,90,50,78,63,72,84,75,51,82,62,88,49,83,81,47,84,52,86,81,75,59,89,79,59,81,50,85,59,87,53,69,77,56,88,81,45,82,55,90,45,83,56,89,46,82,51,86,53,79,81,60,82,77,76,59,80,49,96,53,77,77,65,81,71,70,81,93,53,89,45,86,58,78,66,76,63,88,52,93,49,57,77,68,81,81,73,50,85,74,55,77,83,83,51,78,84,46,83,55,81,57,76,84,77,81,87,77,51,78,60,82,91,53,78,46,77,84,49,83,71,80,49,75,64,76,53,94,55,76,50,82,54,75,78,79,78,78,70,79,70,54,86,50,90,54,54,77,79,64,75,47,86,63,85,82,57,82,67,74,54,83,73,73,88,80,71,83,56,79,78,84,58,83,43,60,75,81,46,90,46,74]
<!DOCTYPE html>
<style>
.axis--y .domain {
display: none;
}
</style>
<svg width="960" height="500"></svg>
<script src="https://d3js.org/d3.v4.min.js"></script>
<script>
var svg = d3.select("svg"),
width = +svg.attr("width"),
height = +svg.attr("height"),
margin = {top: 20, right: 30, bottom: 30, left: 40};
var x = d3.scaleLinear()
.domain([30, 110])
.range([margin.left, width - margin.right]);
var y = d3.scaleLinear()
.domain([0, 0.1])
.range([height - margin.bottom, margin.top]);
svg.append("g")
.attr("class", "axis axis--x")
.attr("transform", "translate(0," + (height - margin.bottom) + ")")
.call(d3.axisBottom(x))
.append("text")
.attr("x", width - margin.right)
.attr("y", -6)
.attr("fill", "#000")
.attr("text-anchor", "end")
.attr("font-weight", "bold")
.text("Time between eruptions (min.)");
svg.append("g")
.attr("class", "axis axis--y")
.attr("transform", "translate(" + margin.left + ",0)")
.call(d3.axisLeft(y).ticks(null, "%"));
d3.json("faithful.json", function(error, faithful) {
if (error) throw error;
var n = faithful.length,
bins = d3.histogram().domain(x.domain()).thresholds(40)(faithful),
density = kernelDensityEstimator(kernelEpanechnikov(7), x.ticks(40))(faithful);
svg.insert("g", "*")
.attr("fill", "#bbb")
.selectAll("rect")
.data(bins)
.enter().append("rect")
.attr("x", function(d) { return x(d.x0) + 1; })
.attr("y", function(d) { return y(d.length / n); })
.attr("width", function(d) { return x(d.x1) - x(d.x0) - 1; })
.attr("height", function(d) { return y(0) - y(d.length / n); });
svg.append("path")
.datum(density)
.attr("fill", "none")
.attr("stroke", "#000")
.attr("stroke-width", 1.5)
.attr("stroke-linejoin", "round")
.attr("d", d3.line()
.curve(d3.curveBasis)
.x(function(d) { return x(d[0]); })
.y(function(d) { return y(d[1]); }));
});
function kernelDensityEstimator(kernel, X) {
return function(V) {
return X.map(function(x) {
return [x, d3.mean(V, function(v) { return kernel(x - v); })];
});
};
}
function kernelEpanechnikov(k) {
return function(v) {
return Math.abs(v /= k) <= 1 ? 0.75 * (1 - v * v) / k : 0;
};
}
</script>
@xtitter
Copy link

xtitter commented Dec 6, 2016

In index.html line 118, the "/ scale" in the true condition should be removed, I believe.

@Abhilash-Chandran
Copy link

I am new to d3,js. I am not able to understand this width = +svg.attr("width") in line 14 and 15 of index.html. I can only think of it as a resizing measure but unable to understand how this works.

@blindmonkey
Copy link

blindmonkey commented Nov 28, 2018

@Abhilash-Chandran svg is referring to the <svg width="960" height="500"></svg> element and .attr gets the specified attribute on the element, so svg.attr("width") gets the width attribute that was specified on the element (in this case 960). It looks like .attr will either always return a string or just may return a string in some instances, so the + unary operator is used to coerce the value to a number. If you try this yourself in a console, you should see something like the following:

> +'960'
960

So to summarize, it's getting the width and the height of the HTML element and ensuring they're numeric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment