You don’t need a fancy PC to get started with data science and machine learning. In fact, you can run all your code in cloud-based notebooks without even worrying about setting up an environment locally.
Even if you’re new to data science, you’ve probably heard of Jupyter Notebooks. It has become the number 1 way to do data science. Jupyter Notebooks make it so much easier to run your code, write commentary and see your output all in 1 place. Almost all cloud platforms use some kind of Jupyter-like environment.
In this blog post I am going to share 5 ways you can do data science in the cloud. Each of these platforms allow you to do this completely for free and they each work really well.
In my opinion, the 2 biggest upsides of using cloud platforms for data science are:
- Speed of set up – You can get set up in just a few minutes and have almost everything you need to do machine learning available to you. You don’t need to go through the hassle of setting up an environment locally before you start writing code and analysing data.
- Collaboration – Being able to share your work and collaborate on projects is a big upside of any kind of cloud platform. However, collaboration is not available for all the platforms listed here. Even when it is offered, the degree of collaboration differs from platform to platform.
5 platforms will be covered in this post:
- Datacamp Workspaces
- Kaggle Notebooks
- Google Colab
- Datalore by JetBrains
- Gradient Notebooks
There are a few things you should note about how I have judged these platforms. I have actually tried each of these platforms myself and am giving my own opinions on what I think of them. My primary use of cloud platforms is to work on personal projects and not for company or enterprise use.
These are the criteria that I am using to compare these platforms:
- Price – they should be free or at least offer a decent free plan (not just a trial)
- Speed of set-up and low admin – I shouldn’t have to ‘babysit’ my projects in case I go over ‘allowed hours’. I’d like to be able to log on, work on a project and log off without worrying about whether I shut the server down.
- Aesthetic and intuitive – the app should look good and it should be intuitive and easy to use
- Collaboration – I’d like to be able to share my work with friends and be able to collaborate on them live. I’d also like to have the option to share securely if I want to, without my project being available publicly.
Great, now that I’ve cleared all that up, lets get into it.
DataCamp recently launched their Workspaces feature that allows you to run code in a Jupyter-like environment. There is currently no collaboration feature but it is something they are working on. Right now, you can just share your notebooks with other people and they are able to view them and add comments.
DataCamp’s philosophy with the initial launch of Workspaces is that they want it to be as easy to do data science as it is to learn it. They have already created an incredible interactive environment for learning data science and Workspaces seem like a natural next step for anyone who now wants to easily apply their skills and start putting together a portfolio.
The Workspaces are completely free and you can choose between R or Python. So far, their notebook editor looks good and it is intuitive to use. There is also no admin involved in running or maintaining the workspace which makes it hassle-free.
To me, Kaggle is like the OG of cloud-based data science platforms, having started way back in 2010. Initially, they started out as a machine learning competition platform. Since then they have expanded and now offer ways of sharing datasets, notebooks and have a huge community in their forums where you can ask questions and get help.
Kaggle notebooks are free to use, with the option to choose between R and Python and they integrate well with other services. You can even connect your Kaggle notebook to Google Cloud Services to beef up the hardware if you need it, although this will come at an additional cost, of course. Kaggle does provide access to a GPU and for personal projects, it is usually more than enough.
Collaboration on Kaggle notebooks is limited. Similar to DataCamp, you can share your notebooks with other people (such as your teammates in a competition) but you effectively work on different versions of the notebook so there is no live collaboration feature.
Google Colab is another one of Google’s products that operate as a natural extension of Google Drive just like Google Docs or Sheets. If you already use these other products then the Colab UI will feel very familiar.
Naturally, sharing on Colab notebooks is built-in. However, it does not seem to be capable of live collaboration (ie. with 2 or more people editing a notebook together in real-time). I find this to be disappointing since Google basically wrote the book on real-time collaboration with Docs and Sheets.
Colab is also not a particularly pretty app, especially when compared with some of the other platforms I’ll be covering on this list. However, since almost everyone interested in data science most likely has at least 1 google account, setup is by far the fastest.
It is free to use Colab but the resource is not guaranteed and there are several usage limits that change depending on demand. Your usage limits could even be different to mine if your code uses more resources.
Deepnote is by far the most good looking, full-featured platform I have come across. The UI looks great, the editing experience with their notebook is amazing, and they have live collaboration! All of this and it is still free to use (up to a point) plus the platform is still in Beta so you know there is much more to come.
I particularly like their publishing feature – you can publish your notebook as an article or as an interactive app or dashboard. I just love the presentation of the articles and dashboards. Your profile on Deepnote also acts as a portfolio and it is a great viewing experience for anyone looking through your work.
You can get up to 750 hours on their standard machines and each notebook comes with a nifty little feature that automatically shuts the machine down after 15 minutes of inactivity. This keeps the admin pretty low on this platform.
Datalore is a platform built by the JetBrains team. The UI looks good and it also comes with live collaboration. There are also a few more options in terms of programming languages with this platform – you can choose between Python, R, Kotlin and Scala.
There is a bit of admin involved with this platform. On the free plan, you get 120 hours/month of basic machine time. However, once you open a notebook, so long as it is an open tab in your browser it will consume resources and eat into your available hours. Because of this, it is important to either close the tab or manually shut down the machine.
Also, if you’ve shared one of your notebooks with someone and they forget to close their browser window down at the end of the day, it’ll eat into your available quota. So that’s something to keep in mind.
Gradient Notebooks are built by Paperspace. The biggest selling point of Gradient is that they offer free GPU’s. For the free plan, Instead of restricting the number of hours on the platform, the only limits they impose is that you can only ever have 1 notebook running at a time and all notebooks must be public.
Getting set up on the platform and launching a notebook can take a good couple of minutes since all that additional hardware that they offer needs to be provisioned. There is also limited functionality for publishing notebooks and building a portfolio of projects using their platform. They are currently working on a public profile feature, so at least it’s in the roadmap.
Gradient is the platform to go to if you need more resources and more compute for your project and you don’t want to pay a cloud provider like Google Cloud or AWS to get it.
Choosing which cloud platform to use depends on your own goals and needs.
If you’re looking for a place to build a portfolio and get involved in a large community while you’re still learning and improving your data science skills then I’d recommend Kaggle or DataCamp.
If you’re looking for a platform that’s got all the bells and whistles, allows you to build a professionally-looking portfolio and offers real-time collaboration then I’d recommend Deepnote.
If you’ve started branching out into deep learning, NLP, and computer vision then I’d recommend giving Gradient a try.
See this resource for a comprehensive list of almost all available data science notebook platforms.
I hope you found this blog post helpful! Let me know in the comments below what other platforms you’d recommend. I’d love to hear your thoughts!