Google launches free Gemini-powered Data Science Agent on its Colab Python platform

0


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

AI agents are all the rage, but how about one focused specifically on analyzing, sorting and drawing conclusions from vast volumes of data?

Google’s data science agent does just that: The new, free Gemini 2.0-powered AI assistant that automates data analysis is now available to users aged 18-plus in select countries and languages for free.

The assistant is available through Google Colab, the company’s eight-year-old service for running Python code live online atop graphics processing units (GPUs) owned by the search giant and its own, in-house tensor processing units (TPUs).

Initially launched for trusted testers in December 2024, data science agent is designed to help researchers, data scientists and developers streamline their workflows by generating fully-functional Jupyter notebooks from natural language descriptions, all in the user’s browser.

This expansion aligns with Google’s ongoing efforts to integrate AI-driven coding and data science features into Colab, building on past updates such as Codey-powered AI coding assistance, announced in May 2023.

It also acts as a kind of advanced and belated rejoinder to OpenAI’s ChatGPT advanced data analysis (previously Code Interpreter), which is now built into ChatGPT when running GPT-4.

What is Google Colab?

Google Colab (short for colaboratory) is a cloud-based Jupyter Notebook environment that enables users to write and execute Python code directly in their browser.

Jupyter Notebook is an open-source web application that enables users to create and share documents containing live code, equations, visualizations and narrative text. Originating from the IPython project in 2014, it now supports more than 40 programming languages, including Python, R and Julia. This interactive platform is widely used in data science, research and education for tasks like data analysis, visualization and teaching programming concepts.

Since its launch in 2017, Google Colab has become one of the most widely-used platforms for machine learning (ML) data science and education.

As Ori Abramovsky, data science lead at Spectralops.io, detailed in an excellent Medium post from 2023, Colab’s ease of use and free access to GPUs and TPUs make it a standout option for many developers and researchers.

He noted that the low barrier to entry, seamless integration with Google Drive and support for TPUs allowed his team to dramatically shorten training cycles while working on AI models.

However, Abramovsky also pointed out Colab’s limitations, such as:

  • Session time limits (especially for free-tier users).
  • Unpredictable resource allocation at peak usage times.
  • Lack of critical features, like efficient pipeline execution and advanced scheduling.
  • Support challenges, as Google provides limited options for direct assistance.

Despite these drawbacks, Abramovsky emphasized that Colab remains one of the best serverless notebook solutions available — particularly in the early stages of ML and data analysis projects.

Simplifying data analysis with AI

The data science agent builds on Colab’s serverless notebook environment by eliminating the need for manual setup.

Using Google’s Gemini AI, users can describe their analytical goals in plain English (“visualize trends,” “train a prediction model,” “clean missing values”), and the agent generates fully-executable Colab notebooks in response.

It supports users by:

  • Automating analysis: Generates complete, working notebooks instead of isolated code snippets.
  • Saving time: Eliminates manual setup and repetitive coding.
  • Enhancing collaboration: Features built-in sharing features for team-based projects.
  • Offering modifiable solutions: Users can adjust and customize generated code.

Data science agent is already accelerating real-world scientific research

According to Google, early testers have reported significant time savings when using data science agent.

For instance, a scientist at Lawrence Berkeley National Laboratory working on tropical wetland methane emissions estimated that their data processing time dropped from one week to just five minutes when using the agent.

The tool has also performed well in industry benchmarks, ranking 4th on the DABStep: Data Agent Benchmark for Multi-step Reasoning on Hugging Face, ahead of AI agents such as ReAct (GPT-4.0), Deepseek, Claude 3.5 Haiku and Llama 3.3 70B.

However, OpenAI’s rival o3-mini and o1 models, as well as Anthropic’s Claude 3.5 Sonnet, both outclassed the new Gemini data science agent.

Getting started

Users can start using data science agent in Google Colab by following these steps:

  1. Open a new Colab notebook.
  2. Upload a dataset (CSV, JSON, etc.).
  3. Describe the analysis in natural language using the Gemini side panel.
  4. Execute the generated notebook to see insights and visualizations.

Google provides sample datasets and prompt ideas to help users explore its capabilities, including:

  • Stack Overflow developer survey: “Visualize most popular programming languages.”
  • Iris Species dataset: “Calculate and visualize Pearson, Spearman and Kendall correlations.”
  • Glass Classification dataset: “Train a random forest classifier.”

Anytime a user wants to use the new agent, they’ll have to navigate to Colab and click “file,” then “new notebook in drive,” and the resulting notebook will be stored in their Google Drive cloud account.

My own brief demo usage was more mixed

Granted, I’m a lowly tech journalist and not a data scientist, but my own usage of the new Gemini 2.0-powered data science agent in Colab so far has been less than seamless.

I uploaded five CSV files (comma separated values, standard spreadsheet files from Excel or Sheets) and asked it “How much am I spending each month and quarter on my utilities?”.

The agent went ahead and performed the following operations:

  • Merged datasets, handling date and account number inconsistencies.
  • Filtered and cleaned the data, ensuring only relevant expenses remained.
  • Grouped transactions by month and quarter to calculate spending.
  • Generated visualizations, such as line charts for trend analysis.
  • Summarized findings in a clear, structured report.

Before execution, Colab prompted a confirmation message, reminding me that it might interact with external APIs.

It did all this very rapidly and smoothly in the browser, in a matter of seconds. And it was impressive to watch it work through the analysis and programming with visible step-by-step descriptions of what it was doing.

However, it ultimately generated an inaccurate graph showing just one month’s utility spending, failing to recognize the sheets included a full year’s worth broken out by months. When I asked it to revise, it gamely tried, but ultimately couldn’t produce the correct code string to answer my prompt.

I tried from scratch with the exact same prompt on a new notebook in Google Colab, and it produced a far better, yet still odd result.

I’ll have to try troubleshooting it some more, and as I said, the initial erroneous result may be due to my own lack of experience using data science tools.

Colab pricing and AI features

While Google Colab remains free, users who need additional compute power can upgrade to paid plans:

  • Colab pro ($9.99/month): 100 compute units, faster GPUs, more memory, terminal access.
  • Colab pro+ ($49.99/month): 500 compute units, priority GPU upgrades, background execution.
  • Colab enterprise: Google Cloud integration, AI-powered code generation.
  • Pay-as-you-go: $9.99 for 100 compute units, $49.99 for 500 compute units.

In addition to data science agent, Google has been expanding AI capabilities within Colab.

Google collects prompts, generated code and user feedback to improve its AI models. While data is stored for up to 18 months, it is anonymized, and deletion requests may not always be fulfilled. Users are advised not to submit sensitive or personal information, as human reviewers may process prompts. Additionally, AI-generated code should be reviewed carefully, as it may contain inaccuracies.

Feedback welcome

Google encourages users to provide feedback through the Google Labs Discord community in the #data-science-agent channel.

With AI-driven automation becoming a key trend in data science, Google’s data science agent in Colab could help researchers and developers focus more on insights and less on coding setup. As the tool expands to more users and regions, it will be interesting to see how it shapes the future of AI-assisted analytics.



Source link

You might also like
Leave A Reply

Your email address will not be published.