Posts Tagged ‘data’

Coupling data with scholarly research

April 26, 2021 4 comments

How many times have you read an article or report and wish you had access to the data that the author used? You’re not alone: A growing concern is being voiced in the research community that data are typically not included with scholarly papers. Why should all this matter? Hasn’t this been the way publishing has always been? Well, in today’s world where serious issues are growing, as recent global health and natural hazards challenges have made starkly clear, the provision of data could provide an immense leap forward for researchers and developers to provide solutions to solving these pressing issues. As we have written about for nearly a decade in this blog, despite crowdsourcing, the Internet of Things, sensors on, above, and below ground, and GIS, statistical, field, and other tools, the task of gathering data, not just spatial data, but any data, still is often a very time consuming endeavor. Research results are important, but should every researcher have to start completely over and gather his or her own data? What if I wanted to take someone’s data and add to it, or use it in another way, in another region, with different models?

Chapter 2 in Open Science by Design: Realizing a Vision for 21st Century Research from the National Academies of Sciences lists several benefits of including data in research studies. The chapter frames these benefits as part of the emerging “open science movement”. The benefits listed by the authors include rigor and reliability, ability to address new questions, faster and more inclusive dissemination of knowledge, broader participation in research, effective use of resources, improved performance of research tasks, and open publication for public benefit. Yet the chapter also recognizes real challenges, including costs and infrastructure, the structure of scholarly communications oftentimes only being available via subscription, lack of supportive culture, incentives, and training, disciplinary differences, and privacy, security, and proprietary barriers to sharing.

The chapter also states that “the ability to automate the process of searching and analyzing linked articles and data can reveal patterns that would escape human perception, making the process of generating and testing hypotheses faster and more efficient. These tools and services will have maximum impact when used within an open science ecosystem that spans institutional, national, and disciplinary boundaries.” Indeed.

An article in Nature also describes the benefits of providing data, beginning with career benefits but touching on societal benefits. Other articles are appearing. And the National Academies chapter above gives practical recommendations as to how to bring about ‘open science.’ The European Union projects FOSTER Plus and OpenAIRE provide training on open science and open data.

But where are the practical examples? I see the following “papers with code” library as one practical response to these concerns, and I salute these efforts:

The above site can be filtered, and does include spatial data that I uncovered during my testing, on water quality and other variables. I hope it is the beginning of many such efforts.

–Joseph Kerski

Categories: Public Domain Data Tags: ,

How much data is out there?

February 1, 2021 4 comments

As this blog is all about data, and about the advent of the truly “Big Data” world, exactly how much data are we talking about? Below is one source of information about how much data actually exists today and how much is projected to exist in the near future.

How much data exists and is projected to exist?

Aydin, O. (2021). Spatial Data Science: Transforming Our Planet [Conference presentation]. 2021 Los Angeles Geospatial Summit, Los Angeles, CA, United States.

Because our blog and book is also about encouraging people to check data sources, I would like to add that the above information came from the following: Seagate’s annual data report: There is an abridged version in this article from Forbes:

These figures are staggering, and from these figures spring many questions: How much of the above data is geospatial data? How much is not geospatial yet, but is potentially mappable? Which data should be mapped? Take a look at the small percentage, say, of tweets that are geotagged. Should more be geotagged? What would we gain by doing so?

More importantly: What will we do with all this data? Will we be able to sort out the important from the trivial to continue to advance society in health, safety, and sustainability? How must geotechnologies evolve to remain viable in the big data world? I look forward to your comments below.

–Joseph Kerski

Curated list of thousands of ArcGIS server addresses

January 19, 2020 1 comment

Joseph Elfelt from recently added many government ArcGIS server addresses to his curated list. The list features over 2,200 addresses for ArcGIS servers from the federal level to the city level. All links are tested by his code once per week and bad links are fixed or flagged, and a new list is posted every Wednesday morning. The list is here,  While we have written about this very useful list in the past, such as here, this is a resource that is worth reminding the community about. And, as a geographer, I find the geographic organization of this list quite easy to follow.

While browsing the list recently, I found, among many other things, an Amtrak train route feature service (shown below), resources at the Wisconsin historical society, and water resources data from the USGS Oklahoma Water Sciences Center.

Joseph is also actively maintaining his “GISsurfer” application, which allows the user community to examine GIS data in a map-centric manner.


Amtrak routes data service, which I found to be fascinating and which I discovered on Joseph Elfelt’s server listing.

I highly recommend that you browse this list if you are in need or anticipate being in need of geospatial data!

–Joseph Kerski

New free course on teaching and learning with the ArcGIS Living Atlas of the World

December 8, 2019 Leave a comment

I am very pleased to announce that the course that my colleague and I created on Teaching with ArcGIS Living Atlas of the World is now available!

The course is:  Free, fun, and rigorous!  It provides skills and perspectives for making effective use of the wonderful resources that are in the Living Atlas in teaching and learning.  We first wrote about the Living Atlas on this blog, here.  Yes, the course is geared toward educators, but could be useful for non-educators who love data, as well.

–Joseph Kerski

livatl.PNGFront page to Living Atlas course.

Climate GIS Data from

Some of the most sought-after GIS data sets are those on climate, and rightly so, given its importance. is one of my favorite sources.  WorldClim’s data sets include minimum and maximum temperature, average temperature, precipitation, solar radiation, wind speed, water vapor pressure, plus 19 bioclimate variables (including such items as minimum temperature of the coldest month).  The following link explains the variables:

The following link provides access to the data, at a variety of spatial resolutions from 30 seconds to 10 minutes, all in grid format, as zipped geoTIFF files:

WorldClim is supported by Feed the Future to the Geospatial and Farming Systems Consortium of the Sustainable Intensification Innovation Lab.  However, you will need to dig for metadata on WorldClim–the site is extremely spartan, and take note – contains some ads – but don’t let that put you off — if you want a no-nonsense, quick way of accessing specific types of climate data, this is a valuable resource.


Speaking of climate, ah! – the skies above Wellington, New Zealand, on the Autumnal Equinox there, March 2019.  Photo by Joseph Kerski. 

–Joseph Kerski

Categories: Public Domain Data Tags: , , ,

Updated Public Domain Data Exercises

January 13, 2016 3 comments

When the GIS Guide to Public Domain Data was published in 2012, we produced an accompanying set of exercises to help illustrate some of the issues that could be encountered when locating, manipulating and analysing public domain spatial data. Among the issues we discussed were the problems of data sources disappearing or data portals that were no longer maintained.

As a number of the online resources we used for the original exercises have not been immune to such changes, we have updated the exercises to provide modified or alternate data resources for the activities. The new exercises and the answer key (.doc format) are available to download from Google Drive (no password required).

PDD Exercises on Google Drive

Exercises on Google Drive

Be Critical of the Data–Especially When it is Your Own!

July 26, 2015 4 comments

A theme running throughout our book The GIS Guide to Public Domain Data is to be critical of the data that you are using–even data that you are creating.  Thanks to mobile technologies and the evolution of GIS to a Software as a Service (SaaS) model, anyone can create spatial data, even from a smartphone, and upload it into the GIS cloud for anyone to use.  This has led to incredibly useful collaborations such as Open Street Map, but this ease of data creation means that caution must be employed more than ever before, as I explain in this video.

For example, analyze a map that I created using Motion X GPS on an iPhone and mapped using ArcGIS Online.  It is shown below, or you can interact with the original map if you prefer.  To do so, access (ArcGIS Online) and search for the map entitled “Kendrick Reservoir Motion X GPS Track” or go directly to  Open the map.  This map shows a track that I collected around Kendrick Reservoir in Colorado USA.  This map was symbolized on the time of GPS collection, from yellow to gradually darker blue dots as time passed.

GPS track around Kendrick Reservoir

GPS track around Kendrick Reservoir.

Note the components of the track to the northwest of the reservoir. These pieces were generated when the smartphone was just turned on and the track first began, indicated by their yellow color.  They are erroneous segments and track points.  Notice how the track cuts across the terrain and does not follow city streets or sidewalks.  Change the base map to a satellite image.  Cutting across lots would not have been possible on foot given the fences and houses obstructing the path. When I first turned on the smartphone, not many GPS satellites were in view of the phone.  As I kept walking and remained outside, the phone recorded a greater number of GPS satellites, and as the number of satellites increased, the triangulation was enhanced, and the positional accuracy improved until the track points mapped closely represented my true position on the Earth’s surface.

Use the distance tool in ArcGIS Online to answer the following question: How far were the farthest erroneous pieces from the lake? Although it depends on where you measure from, some of the farthest erroneous pieces were 600 meters from the lake.  Click on each dot to access the date and time each track point was collected.  How long did the erroneous collection continue?  Again, it depends on which points you select, but the erroneous components lasted about 10 minutes.  At what time did the erroneous track begin correctly following my walk around the lake? This occurred at 11:12 a.m. on the day of the walk.  [Take note of the letters I drew along the southwest shore of the reservoir!]

This simple example points to the serious concern about the consequences of using data without being critical of its source, spatial accuracy, precision, lineage, date, collection scale, methods of collection, and other considerations.  Be critical of the data, even when it is your own!

Karen Payne’s list of Geospatial Data Resources

April 5, 2015 2 comments

In the past, we have written about Robin Smith’s free geospatial data listing. Dr Karen Payne at the University of Georgia has published a geospatial data list which is also quite useful.  Her list of geodata links is published on Google Spreadsheets and contains over 1,000 links to different portals, data types, and services.  Because it is listed in a spreadsheet, make sure you pay attention to and investigate each of the tabs.  Categories include scales (global, regional, country), thematic (disaster, imagery, physical, conservation), data types (web apps, tabular, live services), and more.   The list’s focus is on freely available data sets used in international humanitarian work, which is Dr Payne’s major concentration in her work.  The challenge with all GIS data listings, as we point out in our book, is the updating and curation of such lists, but Dr Payne is committed to updating this one as is evident in the breadth and scope of the listing and in my conversations with her.

Dr Payne is also working with the United Nations, converting their Common Operational datasets into services; specifically, populated places, admin boundaries, their names and codes.  This effort could prove to be very helpful to all of us in the geospatial technology community.

Part of Dr Karen Payne's geospatial data listing

Part of Dr Karen Payne’s geospatial data listing.

Open Data Institute Nodes

December 8, 2014 1 comment

The Open Data Institute (ODI), founded by Sir Tim Berners-Lee and Prof. Nigel Shadbolt, has been working collaboratively with many partners around the globe to develop a network of open data ‘Nodes‘. Nodes, which aim to bring individuals and organisations together to collaborate and promote the use open data in business, government and education, are split into three levels:

  • Country: Independent NGOs building national centres of excellence, working across public and private sectors, NGOs, educational institutions and other Nodes within a country.
  • City or Regional: Deliver projects, and can provide training, research, and development. For example, ODI Dubai, ODI Chicago, and ODI North Carolina, ODI Paris, ODI Trento, ODI Brighton, ODI Manchester, ODI Leeds.
  • Communications: Promoting global open data case studies. For example ODI Moscow, ODI Buenos Aires and ODI Gothenburg.

Although not a data portal, the ODI provides a variety of resources for those work with open data, including research into how open data is used, how it is published and how to certify open data. Given the current plethora of data sites and portals, not all of which are well thought out and useful as we have commented before on this blog, this invaluable resource of data trends and issues provides many useful references for those working with the various types of open data, including location based data. For example, a recent blog post from ODI North Carolina discussed how important quality is for open data.

It is always helpful for others who are considering working with open data, or who are in the process of collecting and publishing open data, to benefit from the experiences of others. Given the ease with which data can be published online these days, the next challenges are to provide data that are easy to find, well documented, current, accurate and ultimately ….. useful. As Charlie Ewen (UK Met Office) remarked, ‘Digital isn’t done once you have a website’.

Playas and Wetlands of the Southern Ogallala Aquifer Data Released

September 1, 2014 2 comments

A new web resource from Texas Tech University of playas and wetlands for the southern High Plains region of Texas, Oklahoma and New Mexico offers a wide variety of spatial data on this key resource and region.  The playa and wetlands GIS data are available for download here, including shapefile, geodatabase, and layer package formats.   The data include 64,726 wetland features, of which 21,893 are identified as playas and another 14,455 as unclassified wetlands; in other words, they appear to be a playa but have no evidence of a hydric soil.   The remaining features include impoundments, riparian features lakes, and other wetlands.

As we discuss in our book, (1) Many spatial data depositories seem to have been created without the GIS user in mind. Not this one.  Careful attention has been paid to the data analyst.  That’s good news!  (2)  Resources such as this don’t appear without a great deal of time and expertise invested.  Here, approximately 5,000 person hours were dedicated to create the geodatabase and website.  This project was made possible by Texas Tech University with funding from the USDA Agricultural Research Service – Ogallala Aquifer Program.

For users who only wish to view playas and other wetlands, a web map application exists and can be launched via the playa viewer.  A “citizen science” feature is that the map viewer allows interactive comments to be added to the map for future consideration.

Southern Ogallala Aquifer Playa and Wetlands Geodatabase

Southern Ogallala Aquifer Playa and Wetlands Geodatabase.