3 years of Google Colab usage — The good, the bad and the ugly (2024)

Published in

Towards AI

8 min read

May 12

3 years of Google Colab usage — The good, the bad and the ugly (3)

The first time I met Google’s Colab was when we searched for a serverless solution to train our models. Until that point, our models were of a smaller size which enabled us to train them on our local machines. But once we encountered a use case where much bigger models were required, a GPU was now mandatory to fulfill that need, and therefore a different solution was required. We started searching, but very quickly, Colab out-shined all the competitors. The main advantage was the ease of use; just add a notebook file to your G-drive (that most of us are using anyway), and you’re ready to roll; (almost) no need for any extra configurations. Later on, what locked us into the Colab platform was the seamless TPU support; at that point, our GPU train cycles were quite long, and as we experimented with hyperparameters tuning, the need to shorten our train cycles was quite acute. Colab enabled us to move our training process from GPU to TPUs with the only modification of a few code lines. Using TPUs significantly reduced the time per training cycle. It was too good to be true. From that point, our binding began; starting from the free offering, we soon continued to colab pro and later to colab pro+, moving more and more of our research efforts into that ecosystem. Unfortunately, it didn’t take long for the enthusiasm to start fading; first, it was due to a lack of important features (which we managed to solve using workarounds), but finally the service support was the straw that broke our camel’s back. This column’s aim is to summarise the journey we had with Google’s Colab. The target audience is new Colab users, ‘nube’ or just ones who are still experimenting with it to decide if it’s worth using at all. Spoiler alert; The bottom line will be that Colab is a unique tool without almost any competitor for specific phases of the research/development lifecycle. But once research meets a critical point, other solutions should be considered. But let’s don’t put the cart before the horses. Let’s start with a brief overview of Colab.

Google, on the very first paragraph of their Colab faq page, gives a very good introduction to the Colab service —

“Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education. More technically, Colab is a hosted Jupyter notebook service that requires no setup to use, while providing access free of charge to computing resources including GPUs”.

Colab was first introduced in 2017 as a research project by Google. It was initially aimed at researchers and students who needed a platform to work on machine learning projects without the need for specialized hardware or software, but soon the platform gained much popularity. Since its launch, Google Colab has undergone several updates and improvements, including the addition of new features such as support for more programming languages, improved hardware options, and integration with Google Drive. While there are many competitors in the serverless notebooks domain (such as Azure Notebooks, IBM Watson Studio, AWS Sagemaker, and Kaggle Kernels), the super low entry barrier and the ease of use make Google Colab a super popular choice for individuals and small teams who want to get started with machine learning and data analysis, a flexible and accessible environment to experiment and learn. Its main drawbacks are the obvious dependence on Google services, the potential data privacy concerns, and the limited resource allocation mentioned by Google at the very beginning of the Colab Faq page.

3 years of Google Colab usage — The good, the bad and the ugly (4)

Now that we’re more familiar with Google Colab characteristics let’s drill down to its key properties, extensive usage experience POV, looking into 3 main sections — the good (why to consider), the bad (why to give it a second thought) and the ugly (why to reconsider).

The key differentiator of Google Colab is its ease of use; the distance from starting a Colab notebook to utilizing a fully working TPUs cluster is super short. Colab's common usage flow relies heavily on G-Drive integration, making complicated actions like authorization almost seamless. For example, the following 3 lines of code are the only ones needed in order to gain access to Google services such as G-Drive and BigQuery. As simple as that.

3 years of Google Colab usage — The good, the bad and the ugly (5)

Colab’s only entry barrier is to have a notebook file on your drive. No need for notebook server instances, hardware provisioning, or user access mgmt. Moreover, the notebook is always available on the drive, enabling one to easily share its content or just to review it offline (similar to any other document on G-drive). A truly serverless notebook. Colab offers a seamless configuration experience; the options to choose from are ‘no GPU’/ GPU/ TPUs and ‘small or big RAM’. Providing a real abstraction, removing the need to explicitly describe the required resources. Colab also enables a seamless authentication flow; once calling a Google service (like asking to mount G-Drive into a notebook session), Colab will pop up a request to authorize this run it. No need to define roles, authorization, permissions, or any other entity commonly required on similar services.

3 years of Google Colab usage — The good, the bad and the ugly (6)

To utilize TPUs, we only need to adjust a few code lines on the net declaration code. The fact that Colab runs as part of the G-Suite ecosystem enables a super easy collaboration of the Colab notebooks output (to share results and gain feedback using Google Sheets, collect user input using Google Forms, or just generate graphs as images and publish them to the team Shared Drives). The bottom line is, the super low barrier from having an idea to exploring it, prototyping, starting a feedback loop, and finally publishing an MVP is just too good to be true.

The issues begin to appear once we start expanding our Colab use; The first main drawback is the limited session duration, or more specifically, the ambiguousness regarding the total resources one can consume. Theoretically, Colab is always available; just start your notebook, and you’re ready to roll. But in reality, Colab limits the duration of the session, especially for free users, especially at peak times, and especially for the pricier resources. Free tier users will commonly face a pop-up message verifying they are real users on the very first minutes of their run. Moreover, Colab will try to verify that the user is interactive and that it’s not just a long processing task (which is quite problematic given that AI applications will commonly include long processing parts).

3 years of Google Colab usage — The good, the bad and the ugly (7)

Colab commonly suggests buying pricier licenses in order to gain a smoother experience; Colab Pro and Pro+ enable getting more resources without the risk of them being taken in the middle of the run. The next main drawback will be the lack of some critical features but with a possible workaround. Such an example can flow pipelining; commonly, we would like to split our processing (especially for non-trivial cases) into a set of sub-tasks. Colab doesn’t directly support such a need. A workaround would be to rely instead on notebook pipelining (a single notebook orchestrating the run and calling the other notebooks in a sync flow). The main issue is the fact that all the triggered notebooks will use the same main notebook run configurations. In case a single notebook is GPU based, all will have to use a GPU backend, regardless of if they truly need it.

3 years of Google Colab usage — The good, the bad and the ugly (8)

It’s important to note that Colab is a work-in-progress project; new features are constantly added. The issue is it makes the overall experience of a work-in-progress project… not a real solution. The run scheduler is a great example of that, available only for pro+ users, it theoretically enables schedule runs. But as it currently lacks the ability to define run parameters or to pre-authorize (which are both quite critical for auto-scheduling), it doesn’t really answer that need. Requires one to come up with workarounds to work it out.

3 years of Google Colab usage — The good, the bad and the ugly (9)

As annoying as it might sound, the issues we mentioned so far are not deal breakers; each has a workaround, enabling one to stick with the platform in case one wants to. Generally speaking, the main reason to decide to switch to a service provider is when it loses our confidence. Commonly it can be due to a lack of transparency or, more specifically, a lack of support. The Colab currently advised way to get support is to submit feedback on the app or to open an issue on their GitHub project. Both are not truly related to that need. Moreover, looking into the GitHub issues page, many issues are being closed as not project related (which makes sense given that ‘it doesn’t work for me’ requests shouldn’t be open on a project issues page, it’s not meant for that).

3 years of Google Colab usage — The good, the bad and the ugly (10)

Generally speaking, our own critical point was when the Colab account was suddenly blocked. What should we do next? Trying to follow the mentioned advice on how to get support didn’t work. Looking into the project issues, many seem to face the same scenario without knowing where or who should assist, whom to talk to, and how. This is when we finally understood it was time to say goodbye and started to re-evaluate the available competitors.

3 years of Google Colab usage — The good, the bad and the ugly (11)

Google Colab is probably still the best serverless notebooks solution that exists out there. Nevertheless, in many cases, it’s just not good enough. My advice for new users is to try it yourselves. Keep in mind, though, the limitations we mentioned in order to constantly verify if it’s not time to move elsewhere.

I'm a seasoned professional with extensive experience in the field of machine learning and cloud-based model training solutions. Over the years, I've actively engaged with various platforms, including Google Colab, and have a profound understanding of the challenges and advantages associated with such tools. My expertise extends to optimizing model training processes, leveraging GPU and TPU resources, and navigating the nuances of serverless notebook environments.

In the article by Ori Abramovsky, the author shares a comprehensive journey with Google's Colab, highlighting both the positive and negative aspects of the platform. Let's break down the key concepts discussed in the article:

Introduction to Google Colab:
- Colab, short for Colaboratory, is a product from Google Research introduced in 2017.
- It serves as a hosted Jupyter notebook service, requiring no setup and providing free access to computing resources, including GPUs.
- Initially aimed at researchers and students, Colab gained popularity for machine learning, data analysis, and education.
Advantages of Google Colab:
- Ease of use is a primary advantage, with the ability to execute Python code through the browser.
- Integration with Google Drive simplifies the workflow, and adding a notebook file to G-drive facilitates quick access.
- Colab offers seamless support for TPUs, enabling a transition from GPU to TPUs with minimal code modification.
- Low entry barrier, serverless notebook experience, and collaboration within the G-Suite ecosystem are highlighted.
Limitations and Drawbacks:
- The article discusses the limitations of session duration, especially for free users, and potential interruptions during peak times.
- Colab Pro and Pro+ are introduced as solutions to obtain more resources and avoid interruptions.
- Lack of certain critical features is acknowledged, with workarounds suggested, such as notebook pipelining.
- Colab is described as a work-in-progress project with constant updates, leading to issues that may require workarounds.
Issues with Colab Support:
- The author points out challenges with Colab support, emphasizing a lack of transparency and effective communication.
- Support is primarily advised through submitting feedback on the app or opening issues on GitHub, but the effectiveness is questioned.
- Instances of account blocking and difficulty in obtaining support lead to a re-evaluation of Colab's suitability.
Conclusion and Recommendations:
- While recognizing Google Colab as a valuable tool, the author advises new users to try it while being mindful of its limitations.
- The decision to switch to another service provider is suggested when confidence is lost, particularly due to support issues.

In summary, the article provides a detailed analysis of the Google Colab platform, shedding light on its strengths, weaknesses, and practical considerations for users.