I am writing a longer blog post about using my Toshiba Chromebook as the primary computer in the Coursera John’s Hopkins Data Science Specialization, but I really felt this was a valuable nugget of information to get on the web.
Using my $10/month Digital Ocean droplet to train an R randomForest model using caret:
real 3m31.409s user 3m28.461s sys 0m1.875s
Pretty good for $10/month! However, I’m a big fan of Domino Data Labs for cloud based data science. They provide access to super fast servers in the cloud. They also are the first folks to package “data science best practices” into a product. The system snapshots all of your analytic runs, along with the program and the data, in one consistent view. Makes it nice and easy to go back, see what changed, collaborate, and most importantly, it aids in making your research reproducible. But, here’s why I really use them:
real 1m42.314s user 0m4.257s sys 0m0.251s
The same exact code, run on Domino’s servers, ran in roughly half the time. This is using their absolute cheapest instance! This run was billed at $0.0093 cents a minute. Seriously, less than 1 penny a minute to have access to a computer three times as fast as my Digital Ocean droplet, with integrated version control, with built in collaboration tools. Even crazier, of that 1m42s time, they only charge me for 1m12s, because the rest of the time was spent transferring my code and data.
I paid 1 freaking cent for the privilege of running my code on fast hardware on the cloud. Check them out!
Note: After being a client of Domino’s and pushing their systems and processes to the limits, they asked me to help advise them. I am not a completely unbiased source, but I came to this position pretty darned honestly 🙂 Ask me any questions!