Skip to content

Instantly share code, notes, and snippets.

@r-barnes
Last active August 29, 2015 14:05
Show Gist options
  • Save r-barnes/a450782655edb8271c5d to your computer and use it in GitHub Desktop.
Save r-barnes/a450782655edb8271c5d to your computer and use it in GitHub Desktop.
Richard's Unambiguous URL Shortener
#!/usr/bin/ruby
require 'digest'
#This function takes an input string, a desired hash length, and an attempt
#number. It returns a hash of the specified length encoded in an alphabet which
#is unambiguous in both upper and lower case. The attempt number should
#initially be 0. If the hash is found to conflict with a previous result, then
#the attempt number should be incremented and the function called again with the
#same input string and hashlen. If you have incremened attempt number an
#unreasonably large number of times (you must figure out what this means), then
#you should increase your hash length.
#Author: Richard Barnes (rbarnes@umn.edu)
def ShortenURL(inputstr, hashlen, attempt_number)
#This alphabet has been chosen to be unambiguous in both upper- and lower-
#case across a variety of fonts. To do this, the following characters have
#been removed: L l I i 0 O o
alphabet = "ABCDEFGHJKMNPQRSTUVWXYZ23456789"
#Convert input string into a hexadecimal hash. Repeat as necessary to generate
#a unique short url. The idea here is that a good hash function distributes
#its inputs uniformly and unpredictably across its hashspace. If this is true,
#then the least-significant digits of the output hash are also distributed
#uniformly and unpredictably across the smaller hash space they represent.
for i in 0..attempt_number
inputstr = Digest::MD5.hexdigest( inputstr )
end
#Convert hexadecimal hash into a (very long) integer
inputstr = inputstr.hex
#Use modulus to wrap very long integer into the hashspace addressable by the
#given alphabet and hash length
inputstr = inputstr % (alphabet.length**hashlen)
#Convert the resuling number into an array of integers representing the number
#in base(hashlen)
s = []
while true do
inputstr, r = inputstr.divmod(alphabet.length)
s<<r
if inputstr == 0
break
end
end
#Reverse s so that the digits are in the appropriate order
s.reverse!
#Convert numeric digits into the characters specified in alphabet
output = ""
for i in s
output += alphabet[i]
end
#Ensure that output is equal to hashlen, in case we happen to spit out a small
#number
output = output.rjust(hashlen, alphabet[0])
#Remember to give back to the world
return output
end
#This function demonstrates how your code should call the URL shortener
#function. You will have to adapt this function to your particular use-case.
def GenHash(string, database_of_hashes)
#Check if the URL has already been hashed. If so, return that hash without any
#further look-ups.
if database_of_hashes.include? string
return database_of_hashes[string]
end
#If the URL has not already been hashed, try to hash it. Try again if the hash
#has already been used for a different URL (if there is a hash collision).
#Stop trying after 10 attempts to avoid and infinite loop in a crowded hash
#space.
i = 0
begin
hash = ShortenURL(string, 3, i)
i += 1
end while database_of_hashes.has_value?(hash) and i<10
if i==10 #Too many attempts: give up
return false
else #Success. Make a note that this URL hashes to this hash
database_of_hashes[string] = hash
return hash
end
end
#This finds conflicts for use in examples
def FindConflicts()
a = Hash.new
for i in 0..5000000
hash=ShortenURL(i.to_s(26), 3, 0)
if a.include? hash
puts "Conflict"
puts i.to_s(26)
puts a[hash]
end
a[hash]=i.to_s(26)
end
end
#This runs some example cases to help you understand how to use the code
def RunExample()
database_of_hashes = Hash.new
puts "For 'hello', the example GenHash function gives " + GenHash("hello", database_of_hashes)
puts "Using 'hello' a second time, the example GenHash function still gives " + GenHash("hello", database_of_hashes)
puts "Notice that the same hash is generated each time!"
puts " "
puts "Using the ShortenURL function for 1l0l gives " + ShortenURL("1l0l", 3, 0)
puts "Using the ShortenURL function for agj gives " + ShortenURL("agj", 3, 0)
puts "Notice how these inputs result in a hash collision!"
puts " "
puts "Therefore, when you call ShortenURL, you must handle collisions."
puts "The example GenHash function handles this."
puts "Passing '1l0l' to GenHash gives " + GenHash("1l0l", database_of_hashes)
puts "passing 'agj' to GenHash gives " + GenHash("agj", database_of_hashes)
end
RunExample()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment