Home > Public Domain Data > Best Available Data: “BAD” Data?

Best Available Data: “BAD” Data?

You may have heard the phrase that the “Best Available Data” is sometimes “BAD” Data. Why?  As the acronym implies, BAD data is often used “just because it is right at your fingertips,” and is often of lower quality than the data that could be obtained with more time, planning, and effort.  We have made the case in our book and in this blog for 5 years now that data quality actually matters, not just as a theoretical concept, but in day to day decision-making.  Data quality is particularly important in the field of GIS, where so many decisions are made based on analyzing mapped information.

All of this daily-used information hinges on the quality of the original data. Compounding the issue is that the temptation to settle for the easily obtained grows as the web GIS paradigm, with its ease of use and plethora of data sets, makes it easier and easier to quickly add data layers and be off on your way.  To be sure, there are times when the easily obtained is also of acceptable or even high quality.  Judging whether it is acceptable depends on the data user and that user’s needs and goals; “fitness for use.”

One intriguing and important resource in determining the quality of your data can be found in The Bad Data Handbook, published by O’Reilly Media, by Q. Ethan McCallum and 18 contributing authors.  They wrote about their experiences, their methods and their successes and challenges in dealing with datasets that are “bad” in some key ways.   The resulting 19 chapters and 250-ish pages may make you want to put this on your “would love to but don’t have time” pile, but I urge you to consider reading it.  The book is written in an engaging manner; many parts are even funny, evident in phrases such as, “When Databases attack” and “Is It Just Me or Does This Data Smell Funny?”

Despite the lively and often humorous approach, there is much practical wisdom here.  For example, many of us in the GIS field can relate to being somewhat perfectionist, so the chapter on, “Don’t Let the Perfect be the Enemy of the Good” is quite pertinent.   In another example, the authors provide a helpful “Four Cs of Data Quality Analysis.”  These include:
1. Complete: Is everything here that’s supposed to be here?
2. Coherent: Does all of the data “add up?”
3. Correct: Are these, in fact, the right values?
4. aCcountable: Can we trace the data?

Unix administrator Sandra Henry-Stocker wrote a review of the book here,  An online version of the book is here, from it-ebooks.info, but in keeping with the themes of this blog, you might wish to make sure that it is fair to the author that you read it from this site rather than purchasing the book.  I think that purchasing the book would be well worth the investment.  Don’t let the 2012 publication date, the fact that it is not GIS-focused per se, and the frequent inclusion of code put you off; this really is essential reading–or at least skimming–for all who are in the field of geotechnology.

baddatabook.PNG

Bad Data book by Q. Ethan McCallum and others. 

 

Advertisements
  1. August 26, 2017 at 12:06 pm

    a certain degree of national security concern is associated with geospatial data in most of the countries. is there any study which gives some understanding on the issue as to what type of geospatial data should not be put in public domain. look fwd to your views

  1. September 12, 2017 at 5:51 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: