Archive

Archive for November, 2020

A review of the Big Ten Academic Alliance Geospatial Data Portal

November 30, 2020 Leave a comment

Not long ago I had the privilege of presenting at the Big Ten Academic Alliance’s GIS Day event. During the event I became familiar with their geospatial data portal, and after further review, wanted to share my findings with the readers of this blog. This project is collectively managed by librarians and geospatial specialists at a group of research institutions from across the Big Ten Academic Alliance, which is a consortium of innovative universities in the north central part of the USA. I think in part because the project is managed by librarians and geospatial specialists, in other words, people who really understand how data can be used and how it should be accessed, that the portal is so useful to the user. It connects users to digital geospatial resources, including GIS datasets, web services, and digitized historical maps from multiple data clearinghouses and library catalogs.

The geoportal serves as a search tool fostering access to externally-hosted data, saves researcher time by centralizing regional geospatial data discovery into a single interface, provides discovery to the most up-to-date resources, allows users to search by What, Where, and When, without needing to know Who or Why, contains GIS data produced in Illinois, Indiana, Iowa, Maryland, Michigan, Minnesota, Nebraska, Pennsylvania, Ohio, and Wisconsin, and historical scanned maps from all across the globe.

This has quickly become one of my favorite data portals. The data are easily discoverable and organized by topic, location, publisher, creator, type, and scale. I found, for example, numerous historical aerial photos, such as the one below. Many of the thousands of layers in this portal are data services, eliminating the need for downloading the layers, though there is certainly no shortage of layers that you can download. From animal feeding operations to transportation alignments, from historical aerials to quarries, from COVID to dams to sinkholes to cemeteries to population, this portal is a wealth of data. I salute the creators of this portal and I encourage you to give it a try.

Manitowoc, Wisconsin, 1938 aerial photograph.

–Joseph Kerski

Google’s BigQuery Public Datasets Program

November 23, 2020 Leave a comment

Another public data resource to consider are the public datasets hosted through Google’s BigQuery program and made available through the Google Cloud Public Dataset program. Under this arrangement, Google hosts the data and provides access to query the data and display the results, subject to creating a Google Cloud account and project.

Account holders can query the datasets using either SQL queries (through Cloud Console), the bq command-line tool, (Python-based command-line tool) or by making calls to the BigQuery REST API using a client library ( for example, Java or  .NET.). The first 1 TB of data processed per month is free; any additional data processing is subject to costs based on either on-demand or flat-rate pricing models.

There are currently over 200 datasets listed including number of NOAA resources, USGS Landsat 4, 5, 7 and 8 and ESA sentinal-2 data.

Google Cloud Public Datasets

Once you’ve selected your dataset and run your query, you have options to visualise the data either using Data Studio (dashboard for charts, tables graphs and so on) or GeoViz for displaying spatial data. You can also save up to 1 GB of data from your queries to Google Drive.

Google Cloud Platform – BigQuery

GeoViz is a fairly limited tool for displaying the results of a BigQuery spatial query on a map, one query at a time. However, apparently it is also possible to display BigQuery spatial data using Google Earth Engine by exporting the results of your BigQuery data to Cloud Storage and then importing it into Earth Engine. Haven’t tried this yet but will have a go at some point. There’s also a fairly useful BigQuery tutorial for working with geospatial data.

Overall, metadata for the various hosted datasets seems generally good and the hosted data/tools package provide a useful sandbox for getting started and polishing your geospatial data analysis skills.

Spatial Data from the North American Environmental Atlas

November 16, 2020 Leave a comment

The North American Environmental Atlas combines and harmonizes geospatial data from Canada, Mexico and the United States to allow for a continental and regional perspective on environmental issues. The Atlas continues to grow in breadth and depth as more thematic maps are created through their work and partnerships. Scientists and map makers from Natural Resources Canada, the United States Geological Survey, Comisión Nacional para el Conocimiento y Uso de la Biodiversidad, Comisión Nacional Forestal, the Instituto Nacional de Estadística y Geografía and other agencies in Canada, Mexico, and the United States produce the information contained in the Atlas. It is in my judgment an excellent resource for exploring a wide variety of mapped data layers for these three countries. Each data layer can be examined on the Atlas’ interactive mapping interface, and even better, can be downloaded into a GIS in a variety of file formats for further analysis. You may download specific layers from the mapping interface as shown below or go to the data layers page.

The Atlas’ premise, stemming from an agreement on environmental cooperation, is that the issues do not stop at national borders, and that a comprehensive international approach is needed for analysis and assessment, and for protection of natural ecosystems. Hence, the data layers in the atlas thankfully do not stop at political boundaries, eliminating the need for dealing with appending data, map projection, and other GIS related challenges. The atlas themes include climate, biomes, ecosystems, specific species’ extents, land use, and much more. The atlas is in English, French, and Spanish!

I have known about the Atlas stemming from my days at the USGS and have great respect for its mission. I wrote about its educational applications here (https://community.esri.com/community/education/blog/2017/06/29/analyzing-the-environment-in-a-gis-with-the-north-american-environmental-atlas-2). I encourage you to give this resource a try!

North American Environmental Atlas.

–Joseph Kerski

Ties between the FAIR Guiding Principles for scientific data management and stewardship and GIS

November 3, 2020 Leave a comment

An article in Nature magazine about the FAIR guiding principles for scientific data management  by Mark Wilkinson, Michael Dumontier, IJsbrand Jan Aalbersberg, and about 25 other authors I believe has thoughtful implications for us as GIS practitioners–how should we manage and serve our GIS data?  What should be included in that data?  The FAIR guiding principles–Findable, Accessible, Interoperable, Reusable–are good ones to keep in mind when setting up sites such as geodata portals, Hubs, and other means to make data available.

These four principles should serve to guide data producers and publishers as they overcome challenges in serving data.  The article also seeks to identify the value gained by contemporary, formal scholarly digital publishing.  The authors state that the FAIR principles apply not only to ‘data’ in the conventional sense, but also to the algorithms, tools, and workflows that led to that data. Interesting.  The authors make the claim that all scholarly digital research objects–from data to analytical pipelines—benefit from application of these principles, since all components of the research process must be available to ensure transparency, reproducibility, and reusability.

All this makes me wonder–what should we include when we serve data?  Is it only the vectors or rasters and metadata?  Should we also think about including our methods as well?   Including the methods will make our research more replicable and, potentially, used and beneficial.  What are the implications if someone copies our methods and claims them as their own?  Or should we care so much about ownership in the face of the community-to-global problems that we face?  So many questions!  But worthy ones to ask.

Of note is the FAIR webinar series (https://www.ands.org.au/working-with-data/the-fair-data-principles/fair-webinar-series) that, while dated, offer additional information as recordings.

fair

One of my favorite points the authors make is that “Good data management is not a goal in itself, but rather is the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data publication process.”  The authors touch on a problem I have encountered in my own GIS work–that research results are usually published without providing access to data.  Certainly this is understandable when human subjects and other sensitive data are involved, but even then, couldn’t some steps be taken so individual identities are removed?  The authors state that “Partially in response to this, science funders, publishers and governmental agencies are beginning to require data management and stewardship plans for data generated in publicly funded experiments.”  If this were to happen, we would all benefit. Imagine the data we could access to address societal issues and problems if this goal of the authors were realized:  “Beyond proper collection, annotation, and archival, data stewardship includes the notion of ‘long-term care’ of valuable digital assets, with the goal that they should be discovered and re-used for downstream investigations, either alone, or in combination with newly generated data.”

It’s clear to me that the current publishing and scholarly process is increasingly out of date with what society needs from research, particularly if we are going to solve problems in energy, water, human health, climate, economic inequality, biodiversity, agriculture, and other areas.  A research article is valuable, but the data, the methods, the recommendations, are also increasingly needed.  I salute the authors for nudging the community forward in thinking outside the box.

The authors seek to define what good data management actually is, and acknowledge that it is generally left as a decision for the data or repository owner. Therefore, their goal in this article and in the webinar series was to bring “some clarity around the goals and desiderata of good data management and stewardship, and defining simple guideposts to inform those who publish and/or preserve scholarly data, would be of great utility.”  The authors recognize that this isn’t an easy task, because it involves numerous, diverse stakeholders with different interests, and it is intertwined with publishing, credit, data providers, service providers, academics, and others.

Categories: Public Domain Data