Skip to content

Instantly share code, notes, and snippets.

View alexhanna's full-sized avatar

Alex Hanna alexhanna

View GitHub Profile
@alexhanna
alexhanna / launch-cliff-gcp.sh
Created November 9, 2020 01:37
Code to get CLIFF working on a GCP instance after installing Tomcat8 using GCP Deployment Manager
#!/bin/sh
## This is copy-pasta from the original Medialab script with some mods
## https://raw.githubusercontent.com/mediacloud/cliff-docker/master/launch.sh
echo "Getting CLIFF..."
echo " downloading Cliff WAR file from GitHub"
wget https://github.com/mitmedialab/CLIFF/releases/download/v2.6.1/cliff-2.6.1.war
sudo mv cliff-2.6.1.war /var/lib/tomcat8/webapps/
echo " done (copied /var/lib/tomcat8/webapps/)"
@alexhanna
alexhanna / sample2013.sql
Created October 28, 2017 14:30
Sample Hive example
insert overwrite local directory '/scratch.1/sample2013_1'
row format delimited
fields terminated by "\t"
select id_str, created_at, regexp_replace(text, "[ \t\r\n]+", " "), user.id_str, regexp_replace(user.name, "[ \t\r\n]+", " "), user.screen_name, retweeted_status.id_str, retweeted_status.created_at, regexp_replace(retweeted_status.text, "[ \t\r\n]+", " "), retweeted_status.user.id_str, regexp_replace(retweeted_status.user.name, "[ \t\r\n]+", " "), retweeted_status.user.screen_name
from gh_rc TABLESAMPLE (10 PERCENT)
WHERE year = 2013 and month = 1;
insert overwrite local directory '/scratch.1/sample2013_2'
row format delimited
fields terminated by "\t"
@alexhanna
alexhanna / social-science-programming.md
Last active March 14, 2024 11:05
Notes on social science programming principles
  1. Code and Data for the Social Sciences: A Practitioner’s Guide, Gentzkow and Shapiro.
  2. Good enough practices in scientific computing, Wilson et al.
  3. Best Practices for Scientific Computing, Wilson et al.
  4. Principled Data Processing, Patrick Ball.
  5. The Plain Person’s Guide to Plain Text Social Science, Healy.
  6. Avoiding technical debt in social science research, Toor.
#!/usr/bin/env python
# encoding: utf-8
"""
Module for parsing Proquest data.
Only tested on limited bits of the Proquest Ethnic Newswire.
Based loosely off a script by Neal Caren (neal.caren@unc.edu)
Alex Hanna, alex.hanna@gmail.com
2017-05-04
"""
@alexhanna
alexhanna / CallForTACCT490.md
Last active August 10, 2016 19:06
Call for TA: CCT 490 (Social Data Analytics)

Call for TA: CCT490 (Social Data Analytics)

The Institute of Communication, Culture, Information and Technology at the University of Toronto Mississauga is looking for a teaching assistant for CCT 490 -- Social Data Analytics -- for Fall 2016, taught by Professor Alex Hanna. The course will cover basics of data collection, processing, and analysis for social trace data, such as Twitter and Facebook messages.

The position is for 40 hours a week for the Fall 2016 term, and will involve grading assignments, assisting in labs, and invigilating exams. The position is represented by CUPE 3902, Unit 3.

Applicants must have proficency in the Python programming language. Knowledge of other programming languages is a plus but not required. Experience with analysis of social media data is preferred. Applicants must live in the Toronto area and be able to travel to the Mississauga campus at least once a week.

To apply, please send a resume or CV to alex.hanna@utoronto.ca, with a short cover letter. The deadline fo

@alexhanna
alexhanna / 20_newsgroups.R
Last active November 17, 2017 20:26
20 newsgroups classification with R
## FILE: Classifying 20 Newsgroups Dataset
## For presentation with Computational Sociology source at Duke.
## AUTHOR: Alex Hanna (ahanna@ssc.wisc.edu)
## DATE: October 14, 2015
## load the RTextTools package
## Documentation of this package is available at
## https://cran.r-project.org/web/packages/RTextTools/RTextTools.pdf
library(RTextTools)
@alexhanna
alexhanna / split_ln.py
Last active February 19, 2020 03:43
Script for splitting Lexis-Nexis files. Adapted from an original from Neal Caren.
#!/usr/bin/env python
# encoding: utf-8
"""
split_ln.py
Created by Neal Caren on 2012-05-14.
neal.caren@unc.edu
Edited by Alex Hanna on 2015-01-29
alex.hanna@gmail.com
National Science Foundation Research Experience for Undergraduates (REU)
“Constructing and Validating an Automated Coding System for Protest Events in Electronic News Sources.”
Principal Investigators: Pamela Oliver, Professor, oliver@ssc.wisc.edu, Chaeyoon Lim, Associate Professor,
clim@ssc.wisc.edu, Alex Hanna (grad student).
This opportunity is for undergraduates interested in social science or media studies to provide research assistance
on a part-time basis during the spring 2015 semester. Depending on schedules and the flow of work, there may be opportunities to
continue during the summer. REU participants will be paid a stipend of $100 a week with an expectation of 10
hours a week of research assistance, meeting attendance, and background reading. Depending on student needs
and interest, we will consider students who wish to work 5-15 hours a week on the project, with proportional
from __future__ import division
import csv, logging, math, os.path
import pickle, random, re, string
import datetime, time
import numpy as np
import pandas as pd
import scipy as sp
## metrics
p <- ggplot(df.p, aes(x=Margin, y=factor(variable), fill = Class, alpha = value))
p <- p + theme_bw() + geom_tile(color = NA, width = 0.005) + scale_fill_manual(values = wes.palette(2, "Royal1"), labels = c("False Positives", "True Positives"))
p <- p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
p <- p + theme(axis.text.y = element_text(size = 7)) + ylab("Feature")
ggsave(p, file = "../img/linearsvc_no-fs_top100_fp-v-tp_20140916.png", width = 16, height = 9)