Google Colab to Databricks

Hello Everyone,

Recently I started using a new tool for Data Analytics (Databricks Community Edition) and this post shares my experience in moving from Google Colab to Databricks.

What is Databricks Community Edition notebook?

It is a powerful platform for collaboration among data analysts, data scientists, and data engineers. You can think of it as a cloud-based Data Analytics Platform which gives you a chance to tap into Spark and other open-source tools.

First impressions on moving from Google Colab to Databricks

At first, it seems similar to Colab, the same notebook environment. But when you start working with it, you’ll notice the differences. The first major difference I noticed was the filesystem. Also, many other notebook features like Spark, SQL, and SQLAnalytics can be accessed for learning at no cost.

Another major difference is that the databricks has two filesystems, one local and another on AWS. If you want to upload any file to the environment, it gets uploaded on AWS and then it has to be copied to the local system using a set of commands (“%fs”, ” file: ”, “dbfs: ”, ”cp”). All the library installation files are stored locally and not on AWS.

Why did I start using it?

Most of the time my first preference for Data Exploration or ETL is Google Colab. But, recently I started working full-time and this was the tool they preferred for data exploration. Personally, the start was a bit bumpy, but eventually, I got used to the tool. After a month of experience, I can say that the environment has more to it than I knew.

Initial challenges

Given the obvious differences in accessing external files, processing the uploaded files in the notebook was a bit of a challenge. I had to look around for the commands to transfer the uploaded files and also had to get a grasp of the basic workings to complete many tasks quickly.

Moreover, another observation was that sometimes the cells either gave inconsistent output or either they won’t run. In most of the cases, I had to log out and then log in to get the work done.

Also, sometimes the runtime environment would get detached from the notebook. So, I had to reattach the environment and wait a couple of seconds for it to be ready.

But, apart from these issues, the overall experience is good and amazing.

Last thoughts

Lastly, I think a comparison between Google Colab and Databricks Community Edition tools would not be fair. Because, one is a cloud data analytics platform with SQL and Spark, while the other is a tool used for solely leveraging the hardware power for completing your task.

My opinion

In my opinion, if you are learning and practicing Machine Learning or Deep Learning, Colab is a good tool for beginners. But, if you are a professional trying out solutions that involve Big Data, then Databricks would be a better choice.

be Technical

Search This Blog