Posts Tagged ‘data formats’

Make scientific data FAIR – article review

In a new article in Nature, author Shelley Stall and her colleagues argue that all disciplines should follow the geosciences and demand best practice for publishing and sharing data.  These authors make bold statements that I believe are long overdue, and the statements touch on many of the themes of this blog and our book, including those below.  I created a video with my thoughts about this, here.

(1) Although the amount of scientific data generated are enormous, and growing each year, these data are “not being used widely enough to realize their potential” and that “most researchers come up against obstacles when they try to get their hands on data sets.”  The authors show evidence that that only 1/5 of published papers typically post the supporting data in scientific repositories.  While I do not have the figures at hand, this seems to be even more of an acute need in the area of research that makes use of GIS and remote sensing–how often are the links to the data sets provided?  Very seldom. The authors give several key reasons why authors do not share data.

(2)  The authors state that “Too much valuable, hard-won information is gathering dust on computers, disks and tapes.”  I spent much of my career in federal data gathering agencies, and while much data has been digitized, not all of it has, and then — what happens when technology changes, including media (such as specific types of physical storage–see our post “Tossing the Floppies” for example) and means of access (such as the demise of most FTP sites where data was stored).  On a related theme, we have documented the demise of useful geospatial data portals in these essays that may sound good in that the new and improved portals perform better, but often, data are not ported over, or are done so in a way where researchers cannot find them.  Two of the many examples include the National Atlas of the USA and the Global Land Cover Facility.  I have spent many hours this year alone trying to obtain data that were on both of these sites.

On a positive note, the central theme of the article is to encourage disciplines to follow the leadership of Earth Sciences and adopt the “Enabling FAIR Data Project’s Commitment Statement in the Earth, Space, and Environmental Sciences for depositing and sharing data” principles.  This helps ensure that data be “findable, accessible, interoperable and reusable”.   Perhaps we will see the day when the majority of articles will provide, for the reader, access to the data behind them.  I do hope, however, that if authors cannot share the data for reasons of confidentiality or safety or for another valid reason, that provisions will be made for the research to still be published if it is deemed by peer review to be of value to the scientific community.


A new article in Nature discusses why data is seldom shared in published scholarly research, and what might be done about the situation.

Maps as representations of reality: The deciduous-coniferous tree “line”

November 5, 2012 5 comments

One of the themes running through our book The GIS Guide to Public Domain Data is that maps are representations of reality.  While almost everyone reading this statement is likely to agree with it, in the fast-paced world that GIS analysis and creating maps has become, it is easy to lose sight of this fact when staring at tables, maps, and imagery.  In a recent video, I discuss just one place where care needs to be made in making decisions based on spatial data.  In the video, observe my surroundings as I stand near the traditional “line” that divides the deciduous forest to the south from the coniferous forest to the north in North America. Is the “line” really a line at all, or is it better described as a gradual change from deciduous to coniferous as one travels north?  Is that vector line then better symbolized as a “zone”, or is vegetation better mapped as a raster data set, with each cell representing the percentage of deciduous and coniferous trees?

How many other data sets do we tend to see as having firm boundaries, when the boundaries are not really firm at all in reality?  How does that affect the decisions we make with them?  Even the boundary between wetlands and open water were originally interpreted based on land cover data or a satellite or aerial image.   As we state in the book, even contour lines were often interpreted originally from aerial stereo pairs.  And each data set was collected at a specific scale, with certain equipment and software, at a specific date, and within certain margins of error that the organization established.  Maps are representations of reality.  They are incredibly useful representations to be sure, but care needs to be taken when using this or any abstracted data.