Skip to content

Instantly share code, notes, and snippets.

@pqnelson
Last active October 31, 2019 14:45
Show Gist options
  • Save pqnelson/3efd29483828bfb171b39d088350e915 to your computer and use it in GitHub Desktop.
Save pqnelson/3efd29483828bfb171b39d088350e915 to your computer and use it in GitHub Desktop.
Project Ideas

This is a growing list of project ideas which may be good for a budding programmer to try. The tasks vary in difficulty and length, but I try to make it language agnostic.

Part 1: Rolodex

Create a rolodex program, which stores contact information for people.

  • Use a database to store the data.
    • What database will you use? SQL or NoSQL? For SQL, which one: Postgres, MySQL, SQLite, or something else? For NoSQL: CouchDB, MongoDB, or something else?
    • What will the schema be?
      • What are the possible tables? And what are the relationships among them?
      • What are the primary keys for the tables?
      • What are the foreign keys for the tables?
  • Use VCard format to store data.
    • How to handle a person with multiple twitter handles?
    • How do we differentiate distinct people with identical names?
  • How will you handle loading information into the system? Displaying information?
  • How to handle duplicate entries?
    • Ostensibly, merging duplicates together would be best, but if this is done, how will we "undo" merging?
    • If we delete entries, then will this be "soft deletes" (i.e., a column in the table with a deleted flag indicating the row has been deleted and should be ignored) or hard deletes (removing the entries from the database altogeter)?

Part 2: Extend to include CV

Extend this program to include a person's CV. This would be a list of jobs a person has had, employed by either organizations or people.

Each entry may or may not have start/end dates. They may be partial dates (e.g., only a year, or only a month and year). There may be only one date given.

  • What about extending this to include CV/resume information for individuals?
    • How will you handle fragments of a person's resume? Or incomplete information? (E.g., sometimes we only have years when a person started/ended a job, or a month & year)
    • We also want to store citations, for where we got the data about these relationships.
  • What new tables are needed in the database? What are the relationships?
  • How will we render information for a person? For an organization?
    • We may be interested in asking, "Who has been employed by X at time t?" How would we render this information?
  • How will we store information for an employer? For a job?
  • How will we enter information into this system? How to handle bulk information?
  • Consider what happens to the rows in these new tables when we merge duplicate people together. How do we handle "undoing" a merge?

Part 3: Scraping for Information

Write a scraper to pull this information down from various websites. For example, Politico's "Influencer" newsletter will give us information about when people enter new jobs, leave positions, etc.; this gives us a steady stream of information, in a fairly adequate format. And we can setup an email for our bot, and regularly check it daily for the Politico email.

Create a library to generate UUIDs.

This should include creating a UUID (specifically a v1, v2, v3, or v4 UUID), deleting a UUID, comparing them, parsing a string to a UUID, and print a string version of the UUID.

The implementation should try to be memory efficient (i.e., 128-bit memory footprint, not say a 36 character string).

  • What other methods are needed? What are their contracts?
  • How to determine which version a UUID object is?
  • What unit tests are needed?

Write a script to count the number of words in a file. Not only this, allow the user to ask for additional information. First a couple definitions:

Definition 1. A Contraction is when we append an apostrophe followed by at least one letter to a word or number, i.e., look like "<letter> "'" <letter>" or "<digit> "'" <letter>". (End of definition)

Definition 2. Given a list of numbers, its summary statistics is a tuple: (min, 25% percentile, median, 75% percentile, max, mean, standard deviation, length of input list). (End of definition)

The user may ask for:

  • Count the number of contractions, provide summary statistics
  • Count the sentence length, provide summary statistics
  • Count the number of sentences in each paragraph, plus summary statistics
  • Count the number of words in each paragraph, plus summary statistics
  • Number of words between each contraction, plus summary statistics
  • What are the top N most common words? (Count "they're" as "they", in this case.)
@pqnelson
Copy link
Author

The VCard 4.0 specification is worth consulting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment