Skip to content

Instantly share code, notes, and snippets.

View ssdatar's full-sized avatar

Saurabh Datar ssdatar

View GitHub Profile
@kylemcdonald
kylemcdonald / Collect Parler Metadata.ipynb
Last active September 20, 2023 11:45
Collect video URLs and GPS data for Parler videos.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@tmcw
tmcw / optimization.md
Last active February 14, 2021 14:38
Optimization

Optimization

Correctly prioritizing and targeting performance problems and optimization opportunities is one of the hardest things to master in programming. There are a lot of ways to do it wrong: by prematurely optimizing non-bottlenecks, or preferring fast solutions to clear solutions, or measuring problems incorrectly.

I'll try to summarize what I've learned about doing this right.

First, don't optimize until there's an issue. And issues should be defined as application issues: performance problems that are either detectable by the users (lag) or endanger the platform – i.e. problems that cause downtime, like out-of-memory issues. Until there's an issue, don't think about peformance at all: just solve the problem at hand, which is "creating value for the end-user," or some less-corporate translation of the same.

Second, only optimize with instruments. By instruments, I mean technology that lets you decipher which sub-part of the stack is the bottleneck. Let's say you see slowness around fet

@JoeGermuska
JoeGermuska / 01_readme.md
Last active June 17, 2020 03:22
A cross-reference of ZCTAs by state, and how it was made

A question came up on the US Census slack, leading to the recognition that the US Census Bureau API doesn't support queries for data for "all ZCTAs in a state". Nothing about the Census Bureau's definition of ZCTA requires that they be contained within a single state, which is probably why the API rejects the query with a message, error: unknown/unsupported geography heirarchy.

I've been looking for a general method to answer these kinds of questions for a long time. This Gist demonstrates a workable approach. It's based on data published by the Census LEHD LODES program, which provides, for every Census block in the US, a crosswalk indicating which geographies that block is in. (The set of geographies is limited but still very useful. See the technical doc PDF for more details.)

For any two geography types, one can simply select those two columns from the crosswalk and eliminate dupli

@rshorey
rshorey / voterfiles.md
Last active October 30, 2022 02:49
So you want to report using voterfiles

So you want to report using voterfiles

History

In 2002, the Help America Vote Act required (among other things) that states must maintain a "computerized statewide voter registration list". These lists (henceforth "voterfiles") contain information about every registered voter and their voting history.

But what about the secret ballot?

When people who have not worked with voterfile data before hear about voterfiles, their first response is almost always "But in my 8th grade civics class, I learned that a critical component of American democracy is the secret ballot! How can states have a list of how you vote?" Voterfiles do NOT include information about how an individual voted. They report whether an individual voted in a specific election.

What information do voterfiles contain?

The exact format and contents of a publicly available voterfile differ from state to state. At a minimum, a file will contain:

@benmarwick
benmarwick / gist:70f92dd61700abab1b590afa0040e3fa
Created April 27, 2018 22:17
using sf for points in polygon spatial join
library(sf)
library(tidyverse)
# read in the shapefile first, that gives us the CRS for the analysis
polygons <- st_read("polygons.shp")
# read in the points
points <- read_csv('points.csv')