Skip to content

Instantly share code, notes, and snippets.

View weiglemc's full-sized avatar

Michele Weigle weiglemc

View GitHub Profile
@weiglemc
weiglemc / grab-cdx.py
Created July 4, 2023 14:03
Python script to grab data from the Internet Archive via the CDX API server, uses function from Sawood Alam's CDXSummary tool
# grab-cdx.py
from requests import Session
from rich.console import Console
from urllib.parse import urlencode
URIR = "https://www.cnn.com/"
FROM = "20150424"
TO = "20220923"
OTHER_PARAMS = "&from=" + FROM + "&to=" + TO + "&collapse=timestamp:8&filter=statuscode:200" # only one entry per day, 200 OK
@weiglemc
weiglemc / capture-html.py
Created January 24, 2023 19:13
Python script to grab raw HTML from Wayback Machine
# run from untracked/html as
# % python3 ../../capture-html.py < ../../cnn-to-request.txt >> ../../cnn-html-list.txt
import sys
import time
import requests
WAIT = 10 # seconds to wait between requests
TIMEOUT = 60 # seconds to wait for timeout
DONE_URI_LIST = "../../cnn-html-list.txt"
@weiglemc
weiglemc / capture-requests.py
Created January 18, 2023 21:36
Python script using selenium-wire to render a webpage and capture specific requests that it generates
# run the script on a set of URI-Ms:
# python3 capture-requests.py < to-request.txt >> requests-log.txt
# process the results and generate a new list of URI-Ms that were requested:
# awk '{if ($1 ~ /cnn\.com(:80)?[\/]+$/ && $2 == "200") print $0}' requests-log.txt | sort -t '/' -k 5 >! requests.txt
# https://pypi.org/project/selenium-wire/#installation
import sys
import time
from seleniumwire import webdriver # Import from seleniumwire
@weiglemc
weiglemc / grab-tco.sh
Created November 18, 2022 18:51
Command-line to grab t.co URLs from Twitter archive
awk -F '\"' '/\"url\" :/ {print $4}' tweets.js
@weiglemc
weiglemc / LSU-wbball-URIs.md
Last active October 7, 2021 00:23
LSU women's basketball URIs
@weiglemc
weiglemc / .block
Last active March 6, 2019 21:07
Line Chart with nest, rollup
license: mit
@weiglemc
weiglemc / .block
Last active February 13, 2019 21:45
Blockbuilder/gist/blocks test
license: mit
@weiglemc
weiglemc / test.md
Created February 4, 2019 14:54
Testing

| Attribute |Channels |Data-type| |:---:|:---:|:---:|:---:| | Letter |horizontal spatial position | categorical| | Frequency of usage | length| quantitative|

Table

Left-Aligned Center Aligned Right Aligned
col 3 is some wordy text $1600
@weiglemc
weiglemc / estab_per_year.csv
Last active February 26, 2019 23:55
CS725-S19 HW4 data files
year count
1900 3
1901 5
1905 10
1910 15
1920 10
1925 13
1930 3
2000 30
@weiglemc
weiglemc / .block
Last active January 25, 2019 18:44
S19 - HW3 - Scatterplot
license: mit