Skip to content

Instantly share code, notes, and snippets.

@wolfDynamics
Last active December 30, 2016 13:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wolfDynamics/8682a6ff548e820d4acb5cd0e87ca603 to your computer and use it in GitHub Desktop.
Save wolfDynamics/8682a6ff548e820d4acb5cd0e87ca603 to your computer and use it in GitHub Desktop.
k-means clustering

k-means clustering algorithm in action using D3.js and javascript

============================================================

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

source: wikipedia

k-means clustering

See the algorithm in action here.

Source Code Layout

kmeans.css          CSS stylesheet
index.html          webpage demonstrating the algorithm
kmeans.js           JavaScript file with the source code for the algorithm visualization
README.md           README file that appears on the website's github page

Raw Data

Data are randomly generated x, y-coordinates between 0 and the width/height of the canvas.

The Algorithm

  1. The algorithm is initialized with n random points, along with k randomly generated 'means' (centroid) within the data domain.
  2. When the algorithm starts, each point is assigned the color of the closest centroid, forming k clusters. Closest is defined as the smallest Euclidean distance between two points.
  3. The centroids are moved to the center of the cluster.
  4. Steps 2 and 3 are repeated until the maximum number of iterations is reached.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="description" content="Free Web tutorials">
<meta name="keywords" content="HTML,CSS,XML,JavaScript">
<meta name="author" content="joegi">
<title>D3.JS k-means</title>
<link rel="stylesheet" type="text/css" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css">
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.1.4/jquery.min.js"></script>
<script type="text/javascript" src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script>
<!-- <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.9/d3.min.js"></script> -->
<script type="text/javascript" src="https://d3js.org/d3.v3.min.js"></script>
<script type="text/javascript" src="https://labratrevenge.com/d3-tip/javascripts/d3.tip.v0.6.3.js"></script>
<!--
<link rel="stylesheet" type="text/css" href="css/bootstrap.min.css">
<script type="text/javascript" src="js/jquery-2.1.4.min.js"></script>
<script type="text/javascript" src="js/bootstrap.min.js"></script>
<script type="text/javascript" src="js/d3.min.js"></script>
-->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js"></script>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script>
<link type="text/css" rel="stylesheet" href="kmeans.css">
<style>
.jumbotron
{
/*background-color: white;*/
/*margin-bottom: 40px;*/
margin-top: 20px;
padding-bottom: 20px;
padding-top: 5px;
}
#chart
{
background-color: #F5F2EB;
border: 1px solid black;
}
</style>
</head>
<body>
<script src="kmeans.js"></script>
<script>
$(document).ready(function()
{
kMeans("#kmeans", 600, 600, 1000, 5, 50);
//kMeans("body", 250, 250, 1000, 5, 10);
});
</script>
<div class="container">
<div class="jumbotron">
<h1 align="center">K-means clustering</h1>
</div>
<div class="row" style="margin-top:10px">
<div id="kmeans" class="kmeans-chart col-md-10" align="center"></div>
<div class="clearfix"></div>
</div>
</div>
</body>
</html>
.kmeans-chart {
font-family: "Open Sans", Arial, Helvetica, sans-serif;
font-size: 9px;
fill: #ccc;
}
/*.kmeans-chart */
.centroid {
/*fill: black;*/
stroke: #000;
stroke-width: 2px;
}
/*.kmeans-chart */
text.label {
font-size: 12px;
fill: black;
}
/*.kmeans-chart */
.axis line, .axis path {
fill: none;
stroke-width: 1px;
stroke: #ccc;
shape-rendering: crispEdges;
}
"use strict";
//Change these values to choose between random points of fixed points
var fix_centroid = "no";
var fix_points = "no"
function kMeans(divname, w, h, numPoints, numClusters, maxIter)
{
//numPoints = 500;
//console.log(elt)
// the current iteration
var iter = 1;
var centroids = [];
var points = [];
//var margin = {top: 30, right: 20, bottom: 80, left: 30}
var margin = {
top: 20,
right: 20,
bottom: 20,
left: 20
};
var width = w - margin.left - margin.right;
var height = h - margin.top - margin.bottom;
var colors = d3.scale.category10().range();
//To set range for random values
var min1 = -4; //-2
var max1 = 3; //2
var min2 = -4; //-3
var max2 = 4; //3
var xScale = d3.scale.linear()
.domain([3, -4])
.range([0, width])
.clamp('true');
//.nice();
var yScale = d3.scale.linear()
.domain([4, -4])
.range([height, 0])
.clamp('true');
//.nice();
//console.log(xScale(2))
var svg = d3.select(divname).append("svg")
.attr("id", "chart")
//var svg = d3.select("#kmeans").append("svg")
.attr("width", width + margin.left + margin.right) //The svg does not have margin
.attr("height", height + margin.top + margin.bottom) //The svg does not have margin
svg.append("g")
.append("text")
.attr("class", "label")
.attr("transform", "translate(" + (width - margin.left - margin.right - 25) +
"," + (height + margin.top + margin.bottom - 10) + ")")
.text("Initialize");
var group = svg.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")")
/**
* Computes the euclidian distance between two points.
*/
function getEuclidianDistance(a, b)
{
var dx = b.x - a.x,
dy = b.y - a.y;
return Math.sqrt(Math.pow(dx, 2) + Math.pow(dy, 2));
}
/**
* Returns a point with the specified type and fill color and with random
* x,y-coordinates.
*/
function getRandomPoint(type, fill)
{
return {
//x: Math.round(Math.random() * width),
//y: Math.round(Math.random() * height),
//x: Math.round(Math.random() * 2),
//y: Math.round(Math.random() * 2),
x: Math.random() * (max1 - min1) + min1,
y: Math.random() * (max2 - min2) + min2,
type: type,
fill: fill
};
}
/**
* Generates a specified number of random points of the specified type.
*/
function initializePoints(num, type)
{
var result = [];
for (var i = 0; i < num; i++)
{
var color = colors[i];
if (type !== "centroid")
{
color = "#ccc";
}
var point = getRandomPoint(type, color);
point.id = point.type + "-" + i;
result.push(point);
//console.log(point.x, point.y,point.type,point.fill)
//console.log(point)
//console.log(result)
}
return result;
}
/**
* Find the centroid that is closest to the specified point.
*/
function findClosestCentroid(point)
{
var closest = {
i: -1,
distance: width * 2
};
centroids.forEach(function(d, i)
{
var distance = getEuclidianDistance(d, point);
// Only update when the centroid is closer
if (distance < closest.distance)
{
closest.i = i;
closest.distance = distance;
}
});
return (centroids[closest.i]);
}
/**
* All points assume the color of the closest centroid.
*/
function colorizePoints()
{
points.forEach(function(d)
{
var closest = findClosestCentroid(d);
d.fill = closest.fill;
});
}
/**
* Computes the center of the cluster by taking the mean of the x and y
* coordinates.
*/
function computeClusterCenter(cluster)
{
return [
d3.mean(cluster, function(d)
{
return d.x;
}),
d3.mean(cluster, function(d)
{
return d.y;
})
];
}
/**
* Moves the centroids to the center of their cluster.
*/
function moveCentroids()
{
centroids.forEach(function(d)
{
// Get clusters based on their fill color
var cluster = points.filter(function(e)
{
return e.fill === d.fill;
});
// Compute the cluster centers
var center = computeClusterCenter(cluster);
// Move the centroid
d.x = center[0];
d.y = center[1];
//console.log(cluster[0])
//console.log(d)
//console.log(d.x,d.y,d.id)
});
}
/**
* Updates the chart.
*/
function update()
{
var data = points.concat(centroids);
//console.log(points)
// The data join
var circle = group.selectAll("circle")
.data(data);
// Create new elements as needed
circle.enter().append("circle")
.attr("id", function(d)
{
return d.id;
})
.attr("class", function(d)
{
return d.type;
})
//.attr("r", 4);
.attr("r", function(d,i)
{
if (d.type == "centroid")
{
return 4;
}
else
{
return 3;s
}
});
// Update old elements as needed
circle
.transition()
.delay(10).duration(200)
//.ease('bounce')
.attr("cx", function(d)
{
//return d.x;
return xScale(d.x)
//return console.log(xScale(d.x), d.x), xScale(d.x)
})
.attr("cy", function(d)
{
//return d.y;
return yScale(d.y)
//return console.log(yScale(d.y)), yScale(d.y)
})
// .style("fill", function(d)
// {
// return d.fill;
// });
.style("fill", function(d, i)
{
if (d.type == "centroid")
{
return "white";
}
else
{
return d.fill;
}
});
//console.log(data)
// Remove old nodes
circle.exit().remove();
}
/**
* Updates the text in the label.
*/
function setText(text)
{
svg.selectAll(".label").text(text);
}
/**
* Executes one iteration of the algorithm:
* - Fill the points with the color of the closest centroid (this makes it
* part of its cluster)
* - Move the centroids to the center of their cluster.
*/
function iterate()
{
// Update label
setText("Iteration " + iter + " of " + maxIter);
// Colorize the points
colorizePoints();
// Move the centroids
moveCentroids();
// Update the chart
update();
}
/**
* The main function initializes the algorithm and calls an iteration every
* two seconds.
*/
function initialize()
{
//var fix_centroid = "yes";
//var fix_points = "yes"
// Initialize random points and centroids
//centroids = initializePoints(numClusters, "centroid");
//points = initializePoints(numPoints, "point");
if (fix_centroid == "yes")
{
centroids = [
{
fill: "#1f77b4", //blue
id: "centroid-0",
type: "centroid",
x: 2, //2 3.8 crash
y: 1
},
{
fill: "#ff7f0e", //orange
id: "centroid-1",
type: "centroid",
x: -3,
y: 2
},
{
fill: "#2ca02c", //green
id: "centroid-2",
type: "centroid",
x: -0.5,
y: -3.5
},
{
fill: "#d62728", //red
id: "centroid-3",
type: "centroid",
x: 0,
y: -1
}]
}
else
{
centroids = initializePoints(numClusters, "centroid");
}
if (fix_points == "yes")
{
points = [
{
fill: "#ccc",
id: "point-0",
type: "point",
x: 0.204,
y: 2.939
},
{
fill: "#ccc",
id: "point-1",
type: "point",
x: -1.6989,
y: 0
},
{
fill: "#ccc",
id: "point-2",
type: "point",
x: -2.1549,
y: -3
},
{
fill: "#ccc",
id: "point-3",
type: "point",
x: 1,
y: -2.301
},
{
fill: "#ccc",
id: "point-4",
type: "point",
x: -1,
y: 2
},
{
fill: "#ccc",
id: "point-5",
type: "point",
x: 0,
y: 2.929
},
{
fill: "#ccc",
id: "point-6",
type: "point",
x: 0.301,
y: 2.903
},
{
fill: "#ccc",
id: "point-7",
type: "point",
x: -1,
y: 0.4771
},
{
fill: "#ccc",
id: "point-8",
type: "point",
x: -0.3979,
y: 2.929
},
{
fill: "#ccc",
id: "point-9",
type: "point",
x: -2.096,
y: 0
},
{
fill: "#ccc",
id: "point-10",
type: "point",
x: -1.0457,
y: 1
},
{
fill: "#ccc",
id: "point-11",
type: "point",
x: -3,
y: -2.1549
},
{
fill: "#ccc",
id: "point-12",
type: "point",
x: -3,
y: -1.5228
},
{
fill: "#ccc",
id: "point-13",
type: "point",
x: -1,
y: 0
},
{
fill: "#ccc",
id: "point-14",
type: "point",
x: 1,
y: -3
},
{
fill: "#ccc",
id: "point-15",
type: "point",
x: 1.602,
y: -2.301
}
]
}
else
{
points = initializePoints(numPoints, "point");
}
//centroids[0]
//console.log(centroids[0],centroids[1])
//console.log(centroids)
//console.log(points)
// initial drawing
update();
var interval = setInterval(function()
{
if (iter < maxIter + 1)
{
iterate();
iter++;
}
else
{
clearInterval(interval);
setText("Done");
}
}, 7.5 * 50); //time to start iterations
}
// Call the main function
initialize();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment