What follows is a technical test for this job offer at CARTO: https://boards.greenhouse.io/cartodb/jobs/705852#.WSvORxOGPUI
Build the following and make it run as fast as you possibly can using Python 3 (vanilla). The faster it runs, the more you will impress us!
Your code should:
- Download this ~2GB file: https://s3.amazonaws.com/carto-1000x/data/yellow_tripdata_2016-01.csv
- Count the lines in the file
- Calculate the average value of the tip_amount field.
All of that in the most efficient way you can come up with.
That's it. Make it fly!
Hello,
It was fun to play with, there is a lot of solution but I like this 2.
I think streaming is the future of data and I hate to download big file.
Solution 1
Also I think people should be lazy and use some else computing capacities (I cheated I used boto3 but I don't have the time to rewrite a SDK) and in this case AWS and S3.
Solution 2
I think I will an article on this solutions.
edit I forget the count in solution 2.