Skip to content

Instantly share code, notes, and snippets.

@sandsfish
Created December 18, 2013 20:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sandsfish/8029055 to your computer and use it in GitHub Desktop.
Save sandsfish/8029055 to your computer and use it in GitHub Desktop.
Very simple example of using Python and Requests to scrape the results from a search interface.
# For non-trivial scraping, better to use Scrapy or Beautiful Soup...
# - http://doc.scrapy.org/en/latest/intro/tutorial.html
# - http://www.crummy.com/software/BeautifulSoup/
import requests
r = requests.get("http://sacbee.com/search_results?aff=1100&q=robot")
r
# <Response [200]>
r.text
# u' \n<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">...
from HTMLParser import HTMLParser
class PrintLinks(HTMLParser):
def handle_starttag(self, tag, attrs):
print "Tag: ", tag
for attr in attrs:
print " attr: ", attr
parser = PrintLinks()
parser.feed(r.text)
# Tag: html
# Tag: head
# Tag: script
# Tag: script
# Tag: script
# attr: (u'type', u'text/javascript')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment