Skip to content

Instantly share code, notes, and snippets.

Last active October 31, 2019 14:45
Show Gist options
  • Save pqnelson/3efd29483828bfb171b39d088350e915 to your computer and use it in GitHub Desktop.
Save pqnelson/3efd29483828bfb171b39d088350e915 to your computer and use it in GitHub Desktop.
Project Ideas

This is a growing list of project ideas which may be good for a budding programmer to try. The tasks vary in difficulty and length, but I try to make it language agnostic.

Part 1: Rolodex

Create a rolodex program, which stores contact information for people.

  • Use a database to store the data.
    • What database will you use? SQL or NoSQL? For SQL, which one: Postgres, MySQL, SQLite, or something else? For NoSQL: CouchDB, MongoDB, or something else?
    • What will the schema be?
      • What are the possible tables? And what are the relationships among them?
      • What are the primary keys for the tables?
      • What are the foreign keys for the tables?
  • Use VCard format to store data.
    • How to handle a person with multiple twitter handles?
    • How do we differentiate distinct people with identical names?
  • How will you handle loading information into the system? Displaying information?
  • How to handle duplicate entries?
    • Ostensibly, merging duplicates together would be best, but if this is done, how will we "undo" merging?
    • If we delete entries, then will this be "soft deletes" (i.e., a column in the table with a deleted flag indicating the row has been deleted and should be ignored) or hard deletes (removing the entries from the database altogeter)?

Part 2: Extend to include CV

Extend this program to include a person's CV. This would be a list of jobs a person has had, employed by either organizations or people.

Each entry may or may not have start/end dates. They may be partial dates (e.g., only a year, or only a month and year). There may be only one date given.

  • What about extending this to include CV/resume information for individuals?
    • How will you handle fragments of a person's resume? Or incomplete information? (E.g., sometimes we only have years when a person started/ended a job, or a month & year)
    • We also want to store citations, for where we got the data about these relationships.
  • What new tables are needed in the database? What are the relationships?
  • How will we render information for a person? For an organization?
    • We may be interested in asking, "Who has been employed by X at time t?" How would we render this information?
  • How will we store information for an employer? For a job?
  • How will we enter information into this system? How to handle bulk information?
  • Consider what happens to the rows in these new tables when we merge duplicate people together. How do we handle "undoing" a merge?

Part 3: Scraping for Information

Write a scraper to pull this information down from various websites. For example, Politico's "Influencer" newsletter will give us information about when people enter new jobs, leave positions, etc.; this gives us a steady stream of information, in a fairly adequate format. And we can setup an email for our bot, and regularly check it daily for the Politico email.

Create a library to generate UUIDs.

This should include creating a UUID (specifically a v1, v2, v3, or v4 UUID), deleting a UUID, comparing them, parsing a string to a UUID, and print a string version of the UUID.

The implementation should try to be memory efficient (i.e., 128-bit memory footprint, not say a 36 character string).

  • What other methods are needed? What are their contracts?
  • How to determine which version a UUID object is?
  • What unit tests are needed?

Write a script to count the number of words in a file. Not only this, allow the user to ask for additional information. First a couple definitions:

Definition 1. A Contraction is when we append an apostrophe followed by at least one letter to a word or number, i.e., look like "<letter> "'" <letter>" or "<digit> "'" <letter>". (End of definition)

Definition 2. Given a list of numbers, its summary statistics is a tuple: (min, 25% percentile, median, 75% percentile, max, mean, standard deviation, length of input list). (End of definition)

The user may ask for:

  • Count the number of contractions, provide summary statistics
  • Count the sentence length, provide summary statistics
  • Count the number of sentences in each paragraph, plus summary statistics
  • Count the number of words in each paragraph, plus summary statistics
  • Number of words between each contraction, plus summary statistics
  • What are the top N most common words? (Count "they're" as "they", in this case.)
Copy link

The VCard 4.0 specification is worth consulting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment