Home > Public Domain Data > A review of Google’s search engine for Open Data

A review of Google’s search engine for Open Data

An article in Nature described Google’s new search engine for open data, and since geospatial data is a fundamental part of open data, and after all these years, still challenging at times to find, I was immediately interested in testing it.

The tool, called Google Dataset Search, is accessible on this link.  Like Google Scholar and Google Books both of which I make heavy use of, this is a “specialized search engine.”

The utility of this tool will depend on metadata tagging. Indeed, as the article points out, “those who own the data sets should ‘tag’ them, using a standardized vocabulary called Schema.org, an initiative founded by Google and three other search-engine giants (Microsoft, Yahoo and Yandex)”. The schema.org dataset markup is the standard used here, but others are supported, such as CSVs, imagery, and proprietary formats.  I had to laugh at the open-ness of the last line in the list of what could qualify as a dataset, “Anything that looks like a dataset to you.”

Many data search engines and portals have a vast amount of data but very little geospatial data.  But with this tool, in my test searches, I found many useful geospatial data sets, some of which I knew and some that were new to me.  I have had challenges finding stream gauging data services for Australia recently, and with this tool found some new leads to investigate.  Being Google based, the searches were rapidly returned, with what I considered enough information to decide whether or not to investigate more fully (see screenshot below).  The data format was featured prominently, as was the coverage, both of which I appreciated.  NOAA was an early adopter of the indexing, and so it makes sense that I could find many NOAA data sets using this search engine.

I wonder if data in Github, or in Esri’s Living Atlas, or on state, national, and international portals will be findable.  I also wonder how the sheer importance of Google will influence how organizations tag their data in the future, and the influence this will have on agencies that perhaps did not put as much time on metadata as they perhaps should have.  Time will tell, but if Google Scholar and Google Books are any indication, the Google Dataset Search could indeed prove to be extremely useful for many of us in GIS research and education.

1111

Result of stream gauge search in the new Google data search engine. 

  1. September 17, 2018 at 9:10 am

    We may see vendors offering datasearch engine optimizations.I have heard that some of the repositories that worked with Google on developing their datasearch had to sign a nondisclosure agreement.

    • josephkerski
      October 31, 2018 at 11:01 pm

      Thanks! Very interesting.

  1. September 20, 2018 at 10:25 am
  2. September 27, 2018 at 11:16 pm

Leave a comment