Archive

Posts Tagged ‘data quality’

Creating fake data on web mapping services

February 16, 2020 2 comments

Aligned with our theme of this blog of “be critical of the data,” consider the following recent fascinating story:  An artist wheeled 99 smartphones around in a wagon to create fake traffic jams on Google Maps.  An artist pulled 99 smartphones around Berlin in a little red wagon, in order to track how the phones affected Google Maps’ traffic interface.  On each phone, he called up the Google Map interface.  As we discuss in our book, traffic and other real-time layers depend in large part on data contributed to by the citizen science network; ordinary people who are contributing data to the cloud, and in this and other cases, not intentionally.  Wherever the phones traveled, Google Maps for a while showed a traffic jam, displaying a red line and routing users around the area.

It wasn’t difficult to do, and it shows several things; (1) that the Google Maps traffic layer (in this case) was doing what it was supposed to do, reflecting what it perceived as true local conditions; (2) that it may be sometimes easy to create fake data using web mapping tools; hence, be critical of data, including maps, as we have been stating on this blog for 8 years; (3) the IoT includes people, and at 7.5 billion strong, people have a great influence over the sensor network and the Internet of Things.

The URL of his amusing video showing him toting the red wagon around is here,  and the full URL of the story is below:
https://www.businessinsider.com/google-maps-traffic-jam-99-smartphones-wagon-2020-2

I just wonder how he was able to obtain permission from 99 people to use their smartphones.  Or did he buy 99 photos on sale somewhere?

–Joseph Kerski

 

 

 

A graphical aid in deciding whether geospatial data meets your needs

November 26, 2018 Leave a comment

The following graphic from an Esri course may be helpful when you are deciding whether or not you should use a specific GIS data set in your analysis.  Though simple, it contains several key elements in deciding fitness for use, a key topic in our blog and book, including metadata, scale, and currency.

Another helpful graphic and essay I have found helpful is Nathan Heazlewood’s 30 checks for data errors.  Another dated though useful set of text and graphic is from the people at PBCGIS here, where they review the process from abstraction of a situation of a problem, to considering the data model, fitness of data, understanding information needs, and examining the dichotomies of concise vs. confusing, credible vs. unfounded, and useful vs. not useful.  PBCGIS created a more detailed and useful set of considerations here.  My article published in Directions Magazine about search strategies might also be helpful.

Do you use graphical aids when making decisions about data, or when teaching this topic to others? If so, which are the most useful for you?

decision_tree_graphic_on_how_to_use_data

Best Available Data: “BAD” Data?

August 14, 2017 3 comments

You may have heard the phrase that the “Best Available Data” is sometimes “BAD” Data. Why?  As the acronym implies, BAD data is often used “just because it is right at your fingertips,” and is often of lower quality than the data that could be obtained with more time, planning, and effort.  We have made the case in our book and in this blog for 5 years now that data quality actually matters, not just as a theoretical concept, but in day to day decision-making.  Data quality is particularly important in the field of GIS, where so many decisions are made based on analyzing mapped information.

All of this daily-used information hinges on the quality of the original data. Compounding the issue is that the temptation to settle for the easily obtained grows as the web GIS paradigm, with its ease of use and plethora of data sets, makes it easier and easier to quickly add data layers and be off on your way.  To be sure, there are times when the easily obtained is also of acceptable or even high quality.  Judging whether it is acceptable depends on the data user and that user’s needs and goals; “fitness for use.”

One intriguing and important resource in determining the quality of your data can be found in The Bad Data Handbook, published by O’Reilly Media, by Q. Ethan McCallum and 18 contributing authors.  They wrote about their experiences, their methods and their successes and challenges in dealing with datasets that are “bad” in some key ways.   The resulting 19 chapters and 250-ish pages may make you want to put this on your “would love to but don’t have time” pile, but I urge you to consider reading it.  The book is written in an engaging manner; many parts are even funny, evident in phrases such as, “When Databases attack” and “Is It Just Me or Does This Data Smell Funny?”

Despite the lively and often humorous approach, there is much practical wisdom here.  For example, many of us in the GIS field can relate to being somewhat perfectionist, so the chapter on, “Don’t Let the Perfect be the Enemy of the Good” is quite pertinent.   In another example, the authors provide a helpful “Four Cs of Data Quality Analysis.”  These include:
1. Complete: Is everything here that’s supposed to be here?
2. Coherent: Does all of the data “add up?”
3. Correct: Are these, in fact, the right values?
4. aCcountable: Can we trace the data?

Unix administrator Sandra Henry-Stocker wrote a review of the book here,  An online version of the book is here, from it-ebooks.info, but in keeping with the themes of this blog, you might wish to make sure that it is fair to the author that you read it from this site rather than purchasing the book.  I think that purchasing the book would be well worth the investment.  Don’t let the 2012 publication date, the fact that it is not GIS-focused per se, and the frequent inclusion of code put you off; this really is essential reading–or at least skimming–for all who are in the field of geotechnology.

baddatabook.PNG

Bad Data book by Q. Ethan McCallum and others. 

 

Data Quality on Live Web Maps

June 19, 2017 6 comments

Modern web maps and the cloud-based GIS tools and services upon which they are built continue to improve in richness of content and in data quality.  But as we have focused on many times in this blog and in our book, maps are representations of reality.  They are extremely useful representations, to be sure, particularly so in the cloud, but still are representations.   These representations are dependent upon the data sources, accuracy standards, map projections, completeness, processing and rendering procedures used, regulations and policies in place, and much more.  A case in point are offsets between street data and the satellite image data that I noticed in mid-2017 in Chengdu in south-central China.  The streets are about 369 meters southeast of where they appear on the satellite image (below):

china-google-maps

Puzzled, I panned the map to other locations in China.  The offsets varied, but they appeared everywhere in the country; for example, note the offset of 557 meters where a highway crosses the river at Dongyang, again to the southeast:

china-google-maps2

As of this writing, the offset appears in the same cardinal direction and only in China; indeed; After examining border towns with North Korea, Vietnam, and other countries, the offset appears to stop along those borders.  No offsets exist in Hong Kong nor in Macao.  Yahoo Maps Bing Maps both show the same types of offsets in China (Bing maps example, below):

china_bing

MapQuest, which uses an OpenStreetMap base, showed no offset.  I then tested ArcGIS Online with a satellite image base and the OpenStreetMap base, and there was no offset there, either (below).  This offset is a datum issue related to national security that is documented in this Wikipedia article.  The same data restriction issues that we discuss in our book and in our blog touch on other aspects of geospatial data, such as fines for unauthorized surveys, lack of geotagging information on many cameras when the GPS chip detects a location within China, and seeming unlawfulness of crowdsourced mapping efforts such as OpenStreetMap.

But furthermore, as we have noted, the satellite images are processed tiled and data sets, and like other data sets, they need to be critically scrutinized as well.  They should not be considered “reality” despite their appearance of being the “actual” Earth’s surface.  They too contain error, may have been taken on different dates or seasons, may be reprojected on a different datum, and other data quality aspects need to be considered.

china-agol

Another difference between these maps is the wide variation in the amount of detail in terms of the streets data in China.  The OpenStreetMap was the most complete; the other web mapping platforms offered a varying level of detail; some of which were seriously lacking, surprisingly especially in the year 2017, in almost every type of street except major freeways.  The streets content was much more complete in other countries.

It all comes back to identifying your end goals in using any sort of GIS or mapping package.  Being critical of the data can and should be part of the decision making process that you use and the choice of tools and maps to use.  By the time you read this, the image offset problem could have been resolved.  Great!  But are there now new issues of concern? Data sources, methods, and quality vary considerably among different countries. Furthermore, the tools and data change frequently, along with the processing methods, and being critical of the data is not just something to practice one time, but rather, fundamental to everyday work with GIS.