Be critical of and aware of default settings in GIS software

June 7, 2021 2 comments

We recently wrote about another reason to be critical of the data–especially imagery–when it can be misinterpreted and when it can be deliberately faked. Included in that essay was a brief paragraph encouraging the community to also be aware and critical of default settings in GIS software when rendering and analyzing imagery. Why? Default settings are there to accommodate a wide variety of users, but can lead to conclusions that are at best, not as rigorous or as accurate as they could be, or, at worst, in error.

Below are images from my Esri colleague and one of my favorite people in all of geospatial, cartographer John Nelson, that represent a set of 8 NASA images. Says John, “They are designed to mosaic together and the native images match each other perfectly. But because of the image appearance default settings assigned by, in this case, ArcPro from Esri, you can easily see seams between them. The default “percent clip” stretch type eliminates 5% of each end of the image histogram, throwing out 10% of the data. Because each image histogram is slightly different, this inherently introduces variability between them. The default “gamma” setting is dynamic based on image and is different for each, in an attempt to find an ideal visual contrast. A gamma of 1 renders the image in its native value. Most of the eight images in this map were given different gamma values (ranging from 1.4 to 1.6) so the visual variability between images is especially stark.” See one of John’s videos illustrating the benefits of and how to quickly and powerfully override the defaults in one imagery example, here.

If a GIS user were to manually reset all of these overrides, which in this case are not ideal, the eight images render cohesively, as designed. Says John, “There is no way to opt out of default rendering overrides, and there is no way to multi-select the image layers and re-set their parameters all at once. If (the GIS user) wants to now adjust their stretch and gamma settings in unison, they have to do that individually or create a new mosaic.”

This obviously applies to any GIS and remote sensing software–all software has default settings and those settings need to be understood. Certainly, in many cases, the defaults are useful, time saving, and appropriate. But knowing what they are–from smart mapping symbology or rendering imagery to many more GIS workflows–are a critical part to our central message of our book and this blog–be critical of the data. I would also argue that since part of misinterpretation of imagery is a result of the lack of knowledge of the electromagnetic spectrum and images rendered in specific bands–all the more reason to include remote sensing in educational curricula!

–Joseph Kerski

Categories: Public Domain Data

Faked Satellite Imagery: Another opportunity to be critical of the data

As we have written about frequently in this blog, all geospatial data should be viewed critically. The user needs to carefully assess the attributes, resolution, date, source, and other characteristics before deciding whether that data is fit for use. The same is true with satellite imagery, for reasons we have described here (Be critical of the data–imagery too!) and here (Imagery–It is what it is. Well, not always).

But a new and disturbing reason for critical thinking has appeared more recently, and that is faked imagery. One of a growing number of articles about this issue is entitled A Growing Problem of ‘deepfake geography’: How AI (Artificial Intelligence) Falsifies Satellite Images. In the research article referred to here, entitled Deep fake geography? When geospatial data encounter Artificial Intelligence, by Bo Zhao, Shaozeng Zhang, Chunxue Xu, Yifan Sun, and Chengbin Deng in Cartography and Geographic Information Science, the authors describes their study. The goal of the study was not to show that geospatial data can be falsified, but rather, “the authors hoped to learn how to detect fake images so that geographers can begin to develop the data literacy tools, similar to today’s fact-checking services, for public benefit.” They suggest timely detections of deep fakes in geospatial data and proper coping strategies when necessary, with a goal to cultivate critical geospatial data literacy and “understand the multi-faceted impacts of deep fake geography on individuals and human society.”

Fake satellite images of a neighborhood in Tacoma with landscape features of other cities. (a) The original CartoDB basemap
tile; (b) the corresponding satellite image tile. The fake satellite image in the visual patterns of (c) Seattle and (d) Beijing, from Zhao et al. article in Cartography and Geographic Information Science.

Situating the issue of images that have been purposefully falsified in a broader context is this very useful article by Pierre Markuse, who advocates that a user needs to differentiate between three different ways an image could be understood (or really debunked) as being a fake: 1. Perceived as fake but in fact just a different representation of the data, 2. Perceived as fake but just a misrepresentation of facts, and 3. Actually faked satellite images. Pierre provides excellent illustrations of each of these three ways, including a supposed fire in Central Park in New York City and a “pollution plume” spilling from a river into a sea. Pierre very helpful concluding section on how to determine if an image is faked or out of context focuses on the themes of this blog–providing practical advice on what questions to ask as you examine and work with images. I highly recommend both of these articles for students, instructors, and researchers.

Along these lines, I would also advocate any user of GIS or remote sensing software to pay close attention to the defaults when images are brought into your software and displayed and rendered. These defaults are not nefarious, to be sure, but they are created to encompass the needs of a wide variety of users. Your needs might very well be different, so make sure you understand what the defaults are and how to change them, so that you are not misunderstanding your data or inadvertently leading others into misunderstanding.

These developments are not unexpected, and while the deliberately faked images are unfortunate, they provide more opportunity to assist students and colleagues around us to always be vigilant and critical of the data–including and perhaps especially geospatial data.

Joseph Kerski

Categories: Public Domain Data

Coupling data with scholarly research, Part 2

Recently we wrote about coupling data with scholarly research as a means to enable researchers and practitioners to avoid “starting over” when they wish to tackle a problem with GIS or any other set of tools. This is part of an important and much wider discussion, and any blog essay by its very nature will not do it sufficient justice. But it is worth expanding that discussion at least with one further essay at this juncture, and opening the topic up to the wider community via the comments that you, the reader, can make, below.

I spoke about this with our Esri Chief Scientist, Dr Dawn Wright, who said that in her view, “there are presently two separate but related discussions: (1) the publishing of data either with papers on its own vs. (2) the publishing of software code, or workflows/methods within software, either with papers or on its own. This in turn begs the question of how to properly cite that data or software once it is published on its own or along with a paper. These are long-standing issues as tackled by the Earth Science Information Partners (ESIP; e.g., https://www.esipfed.org/esip-endorsed ); the NSF-funded EarthCube initiative (for example, https://www.esipfed.org/data-help-desk), and the many working and interest groups of the Research Data Alliance (RDA), for example: https://rd-alliance.org/groups/data-citation-wg.html.” In addition, Dr Wright also mentioned her essay that takes these discussions to the practical level, Making story maps citable; for example, with Digital Object Identifiers, focusing on Making Story Maps Citable (e.g., with Digital Object Identifiers) but with implications beyond story maps.

Along these lines, another colleague of mine here at Esri, Dr Kevin Butler, reminded me that the Nature Research journal Scientific Data is a ‘peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data.’  Kevin believes it represents a nice balance between just a bucket full of data (no paper) that only a few will understand vs. taking space from an analysis based paper to describe the data and processing in detail. I agree. Costs are reasonable for publication, and they also host the data, too.  And MDPI has a similar type of journal that fits into this discussion, as does the Data Science Journal (e.g., https://datascience.codata.org/about/research-integrity/ ), and Elementa.

Coupling data with scholarly research, Part 2. Photograph by Joseph Kerski.

–Joseph Kerski

Categories: Public Domain Data

Coupling data with scholarly research

April 26, 2021 3 comments

How many times have you read an article or report and wish you had access to the data that the author used? You’re not alone: A growing concern is being voiced in the research community that data are typically not included with scholarly papers. Why should all this matter? Hasn’t this been the way publishing has always been? Well, in today’s world where serious issues are growing, as recent global health and natural hazards challenges have made starkly clear, the provision of data could provide an immense leap forward for researchers and developers to provide solutions to solving these pressing issues. As we have written about for nearly a decade in this blog, despite crowdsourcing, the Internet of Things, sensors on, above, and below ground, and GIS, statistical, field, and other tools, the task of gathering data, not just spatial data, but any data, still is often a very time consuming endeavor. Research results are important, but should every researcher have to start completely over and gather his or her own data? What if I wanted to take someone’s data and add to it, or use it in another way, in another region, with different models?

Chapter 2 in Open Science by Design: Realizing a Vision for 21st Century Research from the National Academies of Sciences lists several benefits of including data in research studies. The chapter frames these benefits as part of the emerging “open science movement”. The benefits listed by the authors include rigor and reliability, ability to address new questions, faster and more inclusive dissemination of knowledge, broader participation in research, effective use of resources, improved performance of research tasks, and open publication for public benefit. Yet the chapter also recognizes real challenges, including costs and infrastructure, the structure of scholarly communications oftentimes only being available via subscription, lack of supportive culture, incentives, and training, disciplinary differences, and privacy, security, and proprietary barriers to sharing.

The chapter also states that “the ability to automate the process of searching and analyzing linked articles and data can reveal patterns that would escape human perception, making the process of generating and testing hypotheses faster and more efficient. These tools and services will have maximum impact when used within an open science ecosystem that spans institutional, national, and disciplinary boundaries.” Indeed.

An article in Nature also describes the benefits of providing data, beginning with career benefits but touching on societal benefits. Other articles are appearing. And the National Academies chapter above gives practical recommendations as to how to bring about ‘open science.’ The European Union projects FOSTER Plus and OpenAIRE provide training on open science and open data.

But where are the practical examples? I see the following “papers with code” library as one practical response to these concerns, and I salute these efforts:

https://www.paperswithcode.com/datasets

The above site can be filtered, and does include spatial data that I uncovered during my testing, on water quality and other variables. I hope it is the beginning of many such efforts.

–Joseph Kerski

Categories: Public Domain Data Tags: ,

A story map as a data and information one-stop shop

April 12, 2021 Leave a comment

A variety of land management, science, government, and other agencies combined to create this story map “GIS Day at the Capitol 2021”, which I believe provides an excellent model of how a story map can be effectively used to (1) showcase what these agencies do; (2) why they matter to the people living in the lands that these agencies manage (in this case, Oklahoma USA), and (3) how their data can be accessed. As the story map explains, this is a “a virtual tour of some of the innovative ways Oklahoma agencies are using Geographic Information Systems to further their missions.” At the time of this writing, 17 agencies are featured, including federal agencies with a presence in the state, cities, and state agencies. I salute each of the agencies and people here for the work they are doing each day to improve the land and improve the lives of the people living here.

Story maps, a tool that I frequently teach workshops about, can be effectively used to point people to data. As the above Oklahoma story map shows, REST services and data layers can be linked, and web maps and web mapping applications can be embedded, providing the user with a rich experience. An excellent example of embedding is the City of Ardmore’s web mapping application, about 1/5 of the way through the story map. Many dashboards are also included, including one on transportation, one on seismicity, another on environmental variables in each legislator’s district, and more, reflecting the rise of the popularity of dashboards over the past two years, particularly with COVID.

Furthermore, as we have discussed in this blog many times, such as here, using data is often not just a matter of accessing websites, streaming, and downloading: It means understanding the data, including reading the metadata, and often it means contacting the data providers. This site is a wonderful example of agencies being transparent about who provides the data, and very helpfully provides contact information for those data providers–even email addresses, photos of real people, and phone numbers!

Perhaps the most useful resource and starting point for data for the state is the state’s OK Maps, the Geographic Information Clearinghouse for Oklahoma. The site is embedded in the story map but can also be accessed in a separate web browser tab to more easily access the many data layers, including Lidar data, historical aerial photographs, and much more. One of my favorite segments of the story map is the new state GIS warehouse that uses ArcGIS Hub technology, pictured below and on this link.

Overall, the story map gives the distinct message that a great deal of effort is required to serve a diverse jurisdiction such as Oklahoma, that GIS serves as a “common language” upon which problems can be solved, and that dedicated people are the ones who are making these things happen.

As someone who has been actively involved in supporting and promoting GIS Day since 1999, I was also pleased to see that one of the central ways this story map is promoted is through these agencies’ annual “GIS Day at the State Capitol” And as someone who works with social studies educators, I was also pleased to see the legislative districts map included in this story map. As a geographer who works with citizen science programs, I was happy to see highlights of the Blue Thumb public education and outreach program from the Oklahoma Conservation Commission’s Water Quality Division. And finally, as someone who gives many career-oriented presentations each year, I will use this story map to encourage students to investigate the different career paths that the people featured on this page have. For these reasons and many more, I encourage you to do your own exploration of this story map!

The State of Oklahoma’s GIS Data Warehouse, maps and data, sponsored by the Center for Spatial Analysis, University of Oklahoma.
One of the data portals in the GIS Day at the Capitol 2021 story map, that of OKMAPS – the Geographic Information Clearinghouse for the State of Oklahoma.

Field testing of offsets on interactive web maps

March 29, 2021 4 comments

A few years ago we wrote about intentional offsets on interactive web maps. The purpose was to encourage people to think critically about information provided even in commonly used maps such as from Google. Given the interest that this post generated in terms of teaching about data quality, I decided that some field testing would be instructive. Given that the offsets we highlighted in this essay were in China, I enlisted a colleague of mine who is teaching there to take some GPS readings at known-on-the-map locations, to verify the following: Are the vector (streets) data on selected web mapping services offset, is the imagery offset, or are both partially offset and therefore neither is spatially accurate in terms of one’s position on the ground? To recap the situation, see Figure 1 below. The vectors are offset from the imagery by about 557 meters to the southeast.

china-google-maps2
Figure 1: Given that the vector street data and the satellite imagery are offset for the same features, which is spatially accurate–the streets, the imagery, or neither?

For the field test, my colleague stood at the intersection of two roads and collected two points, as follows:

Point #1: 32°05’03.89”N, 118°54’54.02”E, or 32.084414, 118.915006

Point #2: 32°05’02.62”N, 118°55’00.68”E, or 32.084061, 118.916856

I first mapped these points in ArcGIS Online. In ArcGIS Online, the two points above aligned well with the default imagery base in ArcGIS Online, with the Open Street Map layer, and with the world streets layer, as shown below and on the map shared here.

Figure 2: The two collected points on the ground align well with the default imagery, OpenStreetMap, and the default streets layer in ArcGIS Online.

However, when mapped in Google Maps, the following observations were noted: Point 1’s location on the street map is about 1,779.94 feet or 542.53 meters northwest of where my colleague was standing, according to the map.

Figure 3: Position 1 on Google Maps with Streets base.

Point 2’s location on the street map is about 1,747.89 feet or 532.76 meters northwest of where my colleague was standing, according to the map.

Figure 4: Position 2 on Google Maps with street base.

However, the imagery base map in Google matches well with the actual testing sites. The positions, therefore, are “offset” from the streets layer. I also tested Bing Maps and MapQuest maps; results are below.

Figure 5: Position 1 on Google’s imagery base: Note that the position above, at the T intersection of the streets, at the west end of the tree-lined lane, was where the test location was collected.
Figure 6: Position 2 on Google’s imagery base: Note that the position above, at the T intersection of the streets, at the east end of the tree-lined lane, was where the test location was collected.
Figure 7: The 2 collected positions lined up well with the streets base map in MapQuest. The satellite imagery also lined up well and underlaid the streets layer.
Figure 8: Position 1 in Bing Maps: Again, an offset by about 532 m to the northwest. The satellite imagery was similar to Google’s in that it is offset from the streets.

Therefore, (1) different streets and imagery layers are either offset or not offset, depending on the layer(s) used; (2) On Google maps, the offset seems to be in this location the same amount (about 532 meters) and in the same direction (northwest) from the imagery to the streets layer, which was the same distance and direction noted a few years ago; (3) these mapping services change over time and are likely to change in the future.

But let’s not be too hasty to assume that the satellite imagery is correct, either. One cannot assume that the satellite images here, or anywhere, are spatially the most accurate layers available. They often are the most spatially accurate, and they are extremely useful to be sure. However, satellite images are processed tiled data sets, and like other data sets, they need to be critically scrutinized as well.  First, they should not be considered “reality” despite their appearance of being the “actual” Earth’s surface.  They too contain error, may have been taken on different dates or seasons (as we wrote about here), may be reprojected on a different datum; and other issues could also come into play. Second, as you will note just south of the study area, the default satellite imagery is of different dates in ArcGIS Online and Google Maps, with the latter showing (as of the time this essay was being written) a major east-west street being constructed just south of the study area.

Another difference between these maps is a modest amount of variation in the amount of detail in terms of the streets data in China (or anywhere else).  The OpenStreetMap is sometimes the most complete, though not always; the other web mapping platforms offered a varying level of detail. The imagery in each platform is compiled and mosaicked from a variety of sources and reflects different dates of acquisition and sometimes different spatial resolution as well.

It all comes back to identifying your end goals in using any sort of GIS.  Being critical of the data can and should be part of the decision making process that you use and the choice of tools and maps to use.  By the time you read this, the image offset problem may be a thing of the past. But at the time that you are reading this, are there new issues of concern? Data sources, methods, and quality vary considerably among different countries, platforms, and services. Thus, being critical of the data is not just something to practice one time, but rather, fundamental to everyday work with GIS.

We look forward to your comments below.

–Joseph Kerski

Democratizing Data: Environmental Data Access and its Future: Special Issue

March 15, 2021 Leave a comment

Frontiers is a leading Open Access Publisher and Open Science Platform. Given the data-centric theme of this blog, we wanted to let you know about a planned special issue of Frontiers is called, “Democratizing Data: Environmental Data Access and its Future”. Reproducibility is one of tenets of Frontiers and hence, articles will be open access. One of my Esri colleagues is co-editor as well as those from NOAA and the US Integrated Ocean Observing System. I highly recommend that you consider contributing to this special issue or pass this along to a colleague. See this link for the planned structure and planned questions. Many of the themes, including data usability, discoverability, and access, are central themes to this blog and our book.

An intriguing notion brought up in the call for papers is data equity: “Improved access to data also supports data equity – ‘The term “data equity” captures a complex and multi-faceted set of ideas. It refers to the consideration, through an equity lens, of the ways in which data is collected, analyzed, interpreted, and distributed.’ By making data more easily accessed and used we also make the ability to use data more equitable.” I suspect this theme is something that will appear again in this blog in the future.

Joseph Kerski

Categories: Public Domain Data

Government removes hurdles for mapping and serving of geospatial data in India

A recent essay describes the announcement by the government of India to remove hurdles for mapping and serving of geospatial data in India. Firms can now acquire, collect, generate, disseminate, store, share, distribute and create geospatial data with fewer regulations and approvals, though certainly some will exist especially surrounding sensitive information. The fact that many sectors of society can benefit from geospatial technologies–from agriculture to finance and construction–formed some of the chief arguments used to support this development. The Department of Science and Technology was behind the announcement, according to this post in Geospatial World.

We have always focused on practical applications of using geospatial data in our book and in this blog, and, in keeping with our theme of “being critical of the data”, we have described numerous instances where a bold proclamation has been made and the impact on the data user has been minimal. I truly hope that is not the case here. I hope that at long last, geospatial data for India will be much more open than it was in the past. The GIS community will have to wait and see if the above announcement translates into eventual standing up of geospatial data portals. And certainly, once restrictions are lifted, it will take awhile–possibly several years, for data to be acquired and served. We have written about similar developments elsewhere in the world (in Europe, for example) on numerous occasions. But no matter what happens, the above announcement is a welcome addition to the gradual loosening of restrictions around GIS data acquisition around the globe,

Joseph Kerski

Categories: Public Domain Data

Creative Commons data licensing : 2021 – 2025

February 22, 2021 1 comment

Among the main themes in The GIS Guide to Public Domain Data were the issues of copyright, publicly available data versus public domain data, and the range of licensing arrangements available to data publishers.

Public Domain Mark

Public Domain Mark

Our review of the Creative Commons (CC) licensing options in 2012 concluded that although the CC licenses were becoming increasingly popular for geospatial data, the various license categories were never intended for combined geospatial data stores and the volumes of derived data generated from those stores. Many geospatial data publishers opted for the copyright protections provided under such arrangements as the Open Database Licence from the Open Knowledge Foundation or the Open Government Licence (OGL) in the UK.

In December 2020 CC announced a new strategy for 2021 – 2025, with the primary emphasis on better, rather than just simply more, knowledge sharing, and a comprehensive approach to open sharing. CC has recognised the need to consider economical and ethical issues in addition to the existing copyright licensing arrangements.

Although the new strategy is based on a sector-by-sector analysis of content sharing requirements, there’s no specific mention of support for geospatial data publishers and the diverse, integrated data sources they manage. In the absence of tailored licensing arrangements for geospatial data, it’s hard to see at present the new strategy significantly changing how geospatial data publishers license their data.