Last active
March 19, 2019 16:53
-
-
Save maggie-lee/e8b967de8cb5615a40a81d40be4531dd to your computer and use it in GitHub Desktop.
Python scraper of annual voting summaries from Georgia legislature website.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from bs4 import BeautifulSoup | |
import csv | |
from selenium import webdriver | |
from pyvirtualdisplay import Display | |
import time | |
from selenium.webdriver.support.select import Select | |
sessions = ['27', '25', '24', '23', '21', '20', '18', '14'] | |
urls = ['http://www.legis.ga.gov/Legislation/en-US/VoteList.aspx?Chamber=2', | |
'http://www.legis.ga.gov/Legislation/en-US/VoteList.aspx?Chamber=1'] | |
for url in urls: | |
time.sleep(10) | |
for session in sessions: | |
(time.sleep(10)) | |
print (url, session) | |
with Display(): | |
# we can now start Firefox and it will run inside the virtual display | |
driver = webdriver.Firefox() | |
try: | |
driver.get(url) | |
page = driver.page_source | |
my_selection = Select(driver.find_element_by_id("ctl00_SPWebPartManager1_g_f97fdca8_f858_400b_9279_a6a8f76ec618_Session")) | |
my_selection.select_by_value(session) | |
page = driver.page_source | |
soup = BeautifulSoup(page, 'html.parser') | |
divs = soup.find_all('div', style={"width:100%; background-color:#EEEFCE;"}) | |
# print (divs) | |
for row in divs: | |
tds = row.find_all('span') | |
vote = [] | |
for td in tds: | |
vote.append(td.get_text()) | |
print (vote) | |
with open('votes.csv', 'a') as csvfile: | |
csvwriter = csv.writer(csvfile) | |
csvwriter.writerow(vote) | |
divs = soup.find_all('div', style={"width:100%; background-color:#FFFFFF;"}) | |
# print (divs) | |
for row in divs: | |
tds = row.find_all('span') | |
vote = [] | |
for td in tds: | |
vote.append(td.get_text()) | |
print (vote) | |
with open('votes.csv', 'a') as csvfile: | |
csvwriter = csv.writer(csvfile) | |
csvwriter.writerow(vote) | |
finally: | |
driver.quit() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This scrapes annual vote date/time summaries from the Georgia Legislature's website from 2005-2019, which are found here:
http://www.legis.ga.gov/Legislation/en-US/VoteList.aspx
It's what underlies the chart here: https://datawrapper.dwcdn.net/XYgAq/2/
It writes results to a .csv. If you'd like that csv, it's here.
My next step was summarizing the data via Excel, using pivot tables. Sum up the number of votes on a given day, then assign a legislative day to each date. That's here if you want to review.
Then paste the data into Datawrapper. :)
There's almost certainly an easier way to do this; there's said to be an API under the GGA site, but I'm not sure how to get at it.
Votes counted here are all floor votes: regular votes, attendance, agree/disagree, etc.