Skip to content

Instantly share code, notes, and snippets.

| I'm interested in this logic ("the url contains the root domain as part of the domain or subdomain"):
| https://github.com/amelehy/email_parse/blob/adc7497de476598f743cb0a83e752407cc069ca0/parser.py#L68-L71
| Can you talk me through what it does (what do you expect ROOT_DOMAIN to be? which urls will pass the check and which fail?) and why you chose to make it work that way?
So I expect ROOT_DOMAIN to be the base domain for the initial URL that is passed by the user. So for example if I were to pass “mit.edu”, “jana.com”, or “drive.google.com" as an argument, ROOT_DOMAIN would be “mit”, “jana”, or “google" respectively.
In the section where it checks if the ROOT_DOMAIN is part of the domain or subdomain of each parsed URL, the idea is that it is trying to determine which URLs that are gathered from the page actually belong to (or are related to) the original website that was intended on being crawled and which are “external links."
So for example if I were to pass “www.jana.com”, this script will gathe

#Challenge:

Create a command line program that will take an internet domain name (e.g. “jana.com”) and print out a list of the email addresses that were found on that website. It should find email addresses on any discoverable page of the website, not just the home page.

##Examples:

> python find_email_addresses.py jana.com
Found these email addresses:
sales@jana.com
mystery1 = {'': 0}
mystery2 = {'': 0.}
@parker-jana
parker-jana / index.html
Created March 29, 2016 14:57
A simplified user growth model
<html>
<head>
<title>A simplified user growth model</title>
</head>
</html>
@parker-jana
parker-jana / data.json
Last active January 30, 2016 21:13
Comparison of world flags
{
"rows": [
{
"data": [
0,
0,
0,
0,
1,
0,
@parker-jana
parker-jana / index.html
Last active January 22, 2016 15:15
POC user visualizer
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<link type="text/css" rel="stylesheet" href="style.css"/>
<script src="https://d3js.org/d3.v3.min.js" charset="utf-8"></script>
</head>
<body>
<div id="controls">
from random import seed, choice
from collections import Counter
from math import sqrt
Z = 1.9599 # 95% confidence level
seed(0) # for reproducibility
class Sample(Counter):
"""
@parker-jana
parker-jana / rename_aws_alarm.py
Created December 15, 2014 16:19
This is a quick script to rename an alarm in aws, as the console doesn't allow it yet.
import sys
import boto
def rename_alarm(alarm_name, new_alarm_name):
conn = boto.connect_cloudwatch()
def get_alarm():
alarms = conn.describe_alarms(alarm_names=[alarm_name])
if not alarms: