Skip to content

Instantly share code, notes, and snippets.

@jrladd
Created February 3, 2016 19:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jrladd/1a29906ddcf83495f8a2 to your computer and use it in GitHub Desktop.
Save jrladd/1a29906ddcf83495f8a2 to your computer and use it in GitHub Desktop.
Find doubles in Eikon Basilike
#! /usr/bin/env python
from bs4 import BeautifulSoup
from textblob import TextBlob as tb
with open('eikon.xml', 'r') as f:
xml = f.read()
soup = BeautifulSoup(xml, 'lxml-xml')
text = soup.get_text()
blob = tb(text)
print blob.word_counts['and']
print len(blob.words)
tags = blob.tags
for i,t in enumerate(tags):
try:
if t[0] == 'and' and 'NN' in tags[i+1][1] and 'NN' in tags[i-1][1]:
print tags[i-1][0] + ' ' + t[0] + ' ' + tags[i+1][0]
except IndexError:
pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment