Skip to content

Instantly share code, notes, and snippets.

@TDahlberg
Last active August 29, 2015 14:13
Show Gist options
  • Save TDahlberg/92779b66b62769f237ee to your computer and use it in GitHub Desktop.
Save TDahlberg/92779b66b62769f237ee to your computer and use it in GitHub Desktop.
Boundless Geo PostGIS workshop translated to CartoDB

#Boundless Geo Exercise 11 ##What neighborhood and borough is Atlantic Commons in? Solution:

SELECT name, boroname
FROM nyc_neighborhoods
WHERE ST_Intersects(
  the_geom,(SELECT the_geom FROM nyc_streets WHERE name = 'Atlantic Commons'))

In English: Select the name and borough name from New York City neighborhoods where the Street named Atlantic Commons intersects them.

##What streets does Atlantic Commons join with? Solution:

SELECT name
FROM nyc_streets
WHERE ST_Intersects(
  the_geom,(SELECT the_geom FROM nyc_streets WHERE name = 'Atlantic Commons'))

##Approximately how many people live on (within 50 meters of) Atlantic Commons? Solution:

SELECT sum(popn_total)
FROM nyc_census_blocks
WHERE ST_DWithin(
  the_geom_webmercator,(SELECT the_geom_webmercator 
                        FROM nyc_streets 
                        WHERE name = 'Atlantic Commons'),50)

This query first selects the street called Atlantic Commons, and then finds the sum of the population in Census Blocks that lives within 50 meters of the geom. It's important to note that the_geom_webmercator is being used here. This means that the query grabs the projected data (50 meters), instead of what it would grab with the_geom, which is 50 degrees (and would thus include ALL of NYC's population).

#Boundless Geo Exercise 12

SELECT
	subways.name AS subway_name,
	hoods.name AS hood_name,
	hoods.boroname AS boro_name
FROM nyc_neighborhoods AS hoods
JOIN nyc_subway_stations AS subways
ON ST_Contains(hoods.the_geom,
               subways.the_geom)
WHERE subways.name = 'Broad St'

Rather than joining all the subways to all of the neighborhoods and then searching for a specific match, this query looks for whether the 'Broad St' subway is contained in given neighborhood, and then performs the join.

##Join and Summarize Combine JOIN and GROUP BY to replicate analysis done by a traditional GIS system. What is the population and racial make-up of the neighborhoods of Manhattan?

SELECT 
    neighborhoods.cartodb_id,
    neighborhoods.the_geom_webmercator,
    neighborhoods.name AS neighborhood_name,
    SUM(census.popn_total) AS population,
    100.00 * SUM(census.popn_white) /   SUM(census.popn_total) AS white_pct,
    100.00 * SUM(census.popn_black) / SUM(census.popn_total) AS black_pct
FROM nyc_neighborhoods AS neighborhoods
JOIN nyc_census_blocks AS census
ON ST_Intersects(neighborhoods.the_geom, census.the_geom)
WHERE neighborhoods.boroname = 'Manhattan'
GROUP BY neighborhoods.name, neighborhoods.the_geom_webmercator,neighborhoods.cartodb_id
ORDER BY white_pct DESC

This variant calls on the_geom_webmercator to display the results on the map. If the_geom_webmercator was not used, the results would only be visible in the CartoDB table view. This was kind of touchy to get the map to display in CartoDB. Had to toggle some layers on and off before the joined map finally displayed correctly. Was under the impression for an hour or so that the query just didn't work.

From Boundless' explanation of what's going on above:

The JOIN clause creates a virtual table that includes columns from both the neighborhoods and census tables. The WHERE clause filters our virtual table to just rows in Manhattan. The remaining rows are grouped by the neighborhood name and fed through the aggregation function to Sum() the population values. After a little arithmetic and formatting (e.g., GROUP BY, ORDER BY) on the final numbers, our query spits out the percentages.

##Summarize the racial make-up within 200 meters of the A-Train

SELECT
  100.0 * Sum(popn_white) / Sum(popn_total) AS white_pct,
  100.0 * Sum(popn_black) / Sum(popn_total) AS black_pct,
  Sum(popn_total) AS popn_total
FROM nyc_census_blocks AS census
JOIN nyc_subway_stations AS subways
ON ST_DWithin(census.the_geom_webmercator, subways.the_geom_webmercator, 200)
WHERE strpos(subways.routes,'A') > 0

This statement uses the_geom_webmercator again to make sure that the data is translated into meters, rather than degrees. If the_geom is used here, it basically queries the entire US population.

###Join the table of subway lines into the racial make-up query 200m of subway lines:

SELECT
	lines.route,
	100.0 * Sum(popn_white) / Sum(popn_total) AS white_pct,
	100.0 * Sum(popn_black) / Sum(popn_total) AS black_pct,
	Sum(popn_total) AS popn_total
FROM nyc_census_blocks AS census
JOIN nyc_subway_stations AS subways
ON ST_DWithin(census.the_geom_webmercator, subways.the_geom_webmercator, 200)
JOIN subway_lines AS lines
ON strpos(subways.routes, lines.route) > 0
GROUP BY lines.route
ORDER BY black_pct DESC

#Boundless Geo Exercise 13 ##What subway station is in Little Italy? What subway route is it on?

SELECT s.name,s.routes
FROM nyc_subway_stations AS s
JOIN nyc_neighborhoods as n
ON ST_Contains(n.the_geom, s.the_geom)
WHERE n.name = 'Little Italy'

Make sure to note that n.the_geom comes first in the 'ST_Contains' PostGIS statement. Order matters. This operation finds if geometry A contains B, not the other way around. If it was written ST_Contains(s.the_geom,n.the_geom) it would be looking for neighborhoods inside the train station, which is incorrect.

##What are all the neighborhoods served by the '6' train?

SELECT DISTINCT n.name, n.boroname
FROM nyc_neighborhoods AS n
JOIN nyc_subway_stations AS s
ON ST_Contains(n.the_geom, s.the_geom)
WHERE s.routes LIKE '6'
ORDER BY n.name ASC

Pretty straightforward. This query joins neighborhoods and subway stations based on whether or not the neighborhood contains a station for subway line '6'. I wrote LIKE in here so it's less of a pain to match the string postion exactly. This query also uses SELECT DISTINCT rather than a vanilla SELECT so it doesn't return the same neighborhood multiple times if there are multiple '6' stops in it.

##How many people had to be evacuated from the Battery Park neighborhood after 9/11?

SELECT SUM(popn_total) AS pop_evac
FROM nyc_census_blocks AS c
JOIN nyc_neighborhoods AS n
ON ST_Intersects(c.the_geom, n.the_geom)
WHERE n.name = 'Battery Park'

Nothing too hard here. This query adds together the population of the census blocks that spatially intersect the 'Battery Park' neighborhood.

##What are the population densities (people / km^2) of the ‘Upper West Side’ and ‘Upper East Side’?” (Hint: There are 1000000 m^2 in one km^2.) Starting hint: We already know we need meters, so this is a giveaway in CartoDB that we need to call on the_geom_webmercator at some point.

Let's think this through. We need to first find the population of two different neighborhoods by joining the sum of census block populations to them, and then perform some math to divide that population / km^2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment