Skip to content

Instantly share code, notes, and snippets.

@planemad
Last active April 18, 2016 09:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save planemad/dbf931ae97e3752be5040c81d0a11964 to your computer and use it in GitHub Desktop.
Save planemad/dbf931ae97e3752be5040c81d0a11964 to your computer and use it in GitHub Desktop.
Refreshing NYC building footprints

NYC had an import of over 1 million building foorptints and 900,000 addresses in 2014 from the New York City Department of Information Technology and Telecommunications (DoITT). The DoITT GIS releases an updated shapefile of the footprints every quarter, and the latest version can be accessed here: Building footprints | Address points

Open datasets like these are a great opportunity to explore how OSM can be used as a bridge between authoritative information and that crowdsourced by citizens. Two years after the import, it is interesting to see how the OSM data compares with the latest official footprints. The interesting questions to ask is:

  • What has improved in the DoITT footprints that can be updated in OSM?
  • What has improved in OSM that can be updated in the DoITT data?

Both of these are pretty challeneging questions and requires some careful data comparison and conflation. user:maning and I were trying our hand at answering these questions and here is our progress.

Preparing the data

Grab the latest NYC footprints from DoITT and the NYC OSM extract from Mapzen.

Use QGIS to filter only the buildings from the OSM extract and save it as a seperate shapefiles to make the analysis slightly faster.

Visual diff of footprints

A simple way to quickly see a difference between the geometries in the two datasets is to do a visual diff by overlapping the layers with different colors. Geometries that dont overlap will show the color of the underlying layer.

screenshot 2016-04-14 17 34 26
Green=missing in DoITT; Yellow= OSM overlaps DoITT; Red=missing in OSM. (interactive map)

This is a purely visual comparison and with some eyeballing, we noticed not much has changed. The green buildings on the New Jersey side are outside the import area and do not exist in the DoiTT dataset. We know have a few more questions that lie unanswered:

  • Why are there missing buildings in the latest DoiTT data, is it because OSM more updated, or have the buildings been demolished and OSM is outdated?
  • Why are there missing buildings in OSM? Were they demolished and deleted, or were they never they never added in the first place?

Both the above questions can be answered if we know which dataset is more updated, and the only reliable method to find out is to visit the site and ground trouth the information.

Centroid diff of footprints

Next we will try to detect changes in building configurations, where buildings might have combined or been split from its parent. This can be done by comparing the centroids of the two footprint datasets and check if they match. To do this, we can first extract the centroid of all the OSM footprints, and using a point in polygon analysis, find out how many centroid intersect with every DoITT footprint.

screenshot 2016-04-14 18 28 57
Black=0 OSM footprints at location; Grey=1 OSM footprint at location; Yellow= 2 OSM footprints at location; Red=2+ OSM footprints at location

untitled2
This method can isolate the footprints that need to be added into OSM, but its not a simple insertion as there may be cases where a single building might need to be split into smaller buildings

untitled2
This also detected overlapping polygons in OSM, and cases where multiple buildings need to be combined into larger buildings

untitled2
In cases where the buildings have transformed their shape drasticaly, the centroid may not overlap and get flagged as a missing building

Minor shape changes don't get flagged

Stats

  • Total footprints in DoITT dataset: 1,082,433
  • No existing OSM footprint at location (PNTCNT=0): 1,223 footprints to add
  • 1 existing OSM footprint at location (PNTCNT=1): 1,072,975 footprints not changed or with minor geometry changes
  • 2 existing OSM footprints at location (PNTCNT=2): 1,661 footprints to be merged or updated
  • More than 2 existing OSM footprints at location (PNTCNT>2): 251 footprints to be merged or updated

View Map

Shape diff of footprints

The above analysis can still miss out on minor shape changes. To find out which buildings were modified since the last import, we tried to compare the footprints using the nycdoitt:bin footprint id. We start to hit limitations of dbf unable to handle large numbers and we see the previous import had duplicate ids in OSM: i.e. nycdoitt:bin=1000000 = 106 duplicates; 2000000 = 1482; 3000000 = 2052; 4000000 = 10119; 5000000 = 2623.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment