![]() |
Google Colab to Databricks |
Hello Everyone,
Recently I started using a new tool for Data Analytics (Databricks Community Edition) and this post shares my experience in moving from Google Colab to Databricks.
What is Databricks Community Edition notebook?
It is a powerful platform for collaboration among data analysts, data scientists, and data engineers. You can think of it as a cloud-based Data Analytics Platform which gives you a chance to tap into Spark and other open-source tools.
First impressions on moving from Google Colab to Databricks
At first, it seems similar to Colab, the same notebook environment.
But when you start working with it, you’ll notice the differences. The first
major difference I noticed was the filesystem. Also, many other notebook features like
Spark, SQL, and SQLAnalytics can be accessed for learning at no cost.
Another major difference is that the databricks has two
filesystems, one local and another on AWS. If you want to upload any file to
the environment, it gets uploaded on AWS and then it has to be copied to the
local system using a set of commands (“%fs”, ” file: ”, “dbfs: ”, ”cp”). All the library
installation files are stored locally and not on AWS.
Why did I start using it?
Most of the time my first preference for Data Exploration or
ETL is Google Colab. But, recently I started working full-time and this was the
tool they preferred for data exploration. Personally, the start was a bit bumpy, but
eventually, I got used to the tool. After a month of experience, I can say that
the environment has more to it than I knew.
Initial challenges
Given the obvious differences in accessing external files, processing
the uploaded files in the notebook was a bit of a challenge. I had to look
around for the commands to transfer the uploaded files and also had to get a
grasp of the basic workings to complete many tasks quickly.
Moreover, another observation was that sometimes the cells either
gave inconsistent output or either they won’t run. In most of the cases, I had
to log out and then log in to get the work done.
Also, sometimes the runtime environment would get detached
from the notebook. So, I had to reattach the environment and wait a couple of
seconds for it to be ready.
But, apart from these issues, the overall experience is good
and amazing.
Last thoughts
Lastly, I think a comparison between Google Colab and
Databricks Community Edition tools would not be fair. Because, one is a cloud data
analytics platform with SQL and Spark, while the other is a tool used for
solely leveraging the hardware power for completing your task.
My opinion
In my opinion, if you are learning and practicing Machine
Learning or Deep Learning, Colab is a good tool for beginners. But, if you are
a professional trying out solutions that involve Big Data, then Databricks
would be a better choice.
Further readings
- https://databricks.com/product/faq/community-edition
- https://docs.databricks.com/onboarding/index.html
- https://docs.databricks.com/workspace-index.html
Sources
- https://docs.databricks.com/notebooks/index.html
- https://databricks.com/product/faq/community-edition

Comments
Post a Comment