Auto-Discovering Open Geo Data Sets

Data sets likely to have geolocation information

    Data sets likely to have neighborhood-level classification

      Data sets likely to have addresses

        Data sets likely to have another type of geographic location


          This page lists datasets from Boston's Open Data Hub that are likely to have geographic coordinates or semi-spatial information, such as addresses, neighborhoods, zip codes, etc.

          This is accomplished by programmatically exploiting the CKAN developer API for retrieving metadata about the Hub's resources, and running heuristics on the data set's field names. The technique is not specific to the Boston data portal. It should work just as well on any CKAN-powered site.

          All code is open source under the Apache version 2 license and copyright IBM Corp. 2017. Developed for the Analyze Boston Open Data Challenge.