Google added more COVID-19 datasets to its free, public repository. The cloud giant also extended its free-querying offer for all of the COVID-19 public datasets for another year through Sept. 15, 2021.

Google’s COVID-19 Public Dataset program is an effort to fight COVID-19 by making more data freely accessible to researchers and data scientists. Google Cloud pays for the storage of the COVID-19-related datasets. Additionally, researchers can use BigQuery ML to train machine learning models with this data inside BigQuery for free.

It includes datasets from The New York Times, European Centre for Disease Prevention and Control, Google, Global Health Data from the World Bank, and OpenStreetMap. Google also added datasets from BroadStreet, including the U.S. Area Deprivation Index, which measures community vulnerability to public health issues at a highly granular level. And it will publish aggregated hospital capacity data from the American Hospital Association, as well as the Immune Epitope Database (Vita et al, Nucleic Acid Research, 2018) to help researchers investigating the immune response to the SARS-CoV-2 virus.

How to Build Predictive Models

In addition to the new data sets, Google also published a series of articles to show how researchers can build predictive models from this dataset using Google Cloud AI Platform. And it created the COVID-19 Open Data dataset, which combines numerous publicly available COVID-19 and related datasets at a geographic level and makes them available both in BigQuery and in CSV and JSON formats. The code used to create this dataset is open source and available on GitHub.

“As we strive to continue supporting our users, we want to help ensure that a lack of resources is not a contributing factor in one’s ability to make sense of this data,” Google Cloud’s Michael Hamamoto Tribble and Donny Cheung wrote in a blog post. “That’s why we’re expanding datasets access, and we hope that this will expand the pool of contributors who are finding solutions to this pandemic, whether that’s students and faculty querying these datasets through distance learning in the fall or public decision makers gauging when their communities can safely reopen. We hope that these datasets continue to provide universally accessible and useful information in the fight against COVID-19.”

Google first announced the public COVID-19 datasets in March.

Also in the blog, Hamamoto Tribble and Cheung write that Google will continue releasing new datasets in four areas: epidemiology and health response, such as case and testing statistics and hospital data; government policy response and effects, such as mobility and mask compliance; social determinants of health and community response; and biomedical and other research data.