Skip to content

Instantly share code, notes, and snippets.

@jsanz
Last active September 14, 2016 09:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jsanz/5f7ac01cc6c720c71610c74917d821aa to your computer and use it in GitHub Desktop.
Save jsanz/5f7ac01cc6c720c71610c74917d821aa to your computer and use it in GitHub Desktop.
SQL: JOIN by proximity

Joining two point datasets using distance as the relation between them.

WITH
-- some cities, just for this example
cities AS (
SELECT
cartodb_id,name,
the_geom,
st_x(the_geom) AS lon,
st_y(the_geom) AS lat
FROM populated_places
WHERE megacity = 1
),
-- same cities slightly moved
moved_cities AS (
SELECT
cartodb_id,
concat(name, ' 2') AS other_name ,
CDB_LatLng(
lat*(1+random()*0.0001), -- randomizing a bit the coordinates
lon*(1+random()*0.0001)
) AS the_geom
FROM cities
),
-- get the id of the closest geometry
with_ids AS (
SELECT
c.*,
(
-- the trick is here,
-- returning the id of the closest geometry
-- important to use the <#> operator so
-- you work with the index, way faster!
SELECT cartodb_id AS moved_id
FROM moved_cities m
ORDER BY c.the_geom <#> m.the_geom
LIMIT 1
)
FROM cities c
)
-- JOIN with the moved query
SELECT
c.cartodb_id,
c.name,
m.other_name
FROM with_ids c
JOIN moved_cities m
ON c.moved_id = m.cartodb_id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment