Archive

Archive for November, 2022

Book Review: Stat Spotting: A Field Guide to Identifying Dubious Data

November 21, 2022 Leave a comment

I recently read the book Stat Spotting: A Field Guide to Identifying Dubious Data (University of California Press, 2013). As the focus of the book is to encourage the reader to think critically about the information they receive, with a particular focus on statistical data, and as this focus is shared by this blog, I believe that the book will be useful particularly to the readers of Spatial Reserves. The book’s author, Dr Joel Best, sociology and criminal justice professor from the University of Delaware, adheres to the book’s “field guide” focus; that is, providing short, useful examples. The book’s focus is that data should be examined critically. That theme should sound quite familiar to the readers of this blog as indeed we have examined this theme from multiple angles for over a decade. The book’s examples alternate between topics that are grim and topics that are fun, but all are incredibly relevant despite the book being nearly a decade old. All of the topics make for excellent case studies that one could use with one’s colleagues or a professor could use with students for class discussions. These include topics such as, “How important are family dinners?” (in increasing student achievement and reducing risky behavior), “Is autism an epidemic?”, and “Do we have runaway health spending?”

One of my favorite tenets running throughout the book is the author’s words that “all statistics are the products of people’s choices. If they’d made different choices, the figures would be different, but with enough information, we should be able to evaluate those choices.” Another thing I love about the book is the author’s counsel that, “we need to be very careful when we can’t tell who produced the figures, why or how, and when we can’t be sure whether consistent choices were made in the measurements at different times and places.” Indeed!

One of the most useful sections of the book to the readers of this blog I believe is the section entitled “common signs of dubious data.” Dr Best advises to search for the background of the data, when the numbers seem inconsistent with benchmark figures, or when severe examples are used for supposedly common problems. He also says to look for blunders, where the numbers just seem too high or too low. Again germane to this blog is the author’s advice to “Look for the sources of the data.” Dr Best also says to examine the definitions: Broad definitions lead to big numbers. Furthermore, what if the definitions change over the course of the issue being studied? That matters, too. Examine how the measure was created in the first place. Surveys may use loaded questions that encourage particular responses. Look at the packaging of the results: Generalizations may be based on a biased or misleading sample. Be on the lookout for rhetoric, sometimes evident in phrases such as “long term trends.” Finally, examine debates: Rival explanations should be looked at. Identify different causes of the problem.

The Stat Spotting book. As is evident in this photo, I checked it out from a library but you could also buy the book. Either way, it is an insightful and relevant encouragement to think critically about data.

I highly recommend instructors and GIS practitioners, or anyone working with data – to read and use this book.

–Joseph Kerski

Categories: Public Domain Data

The Water Point Data Exchange

November 7, 2022 1 comment

Data about water is critical to urban and rural planning, sustainability, human health, and much more. Water data is often disparately gathered, managed, and served. The mission of the water point data exchange: https://data.waterpointdata.org/ (WPdx) is to solve these challenges by unlocking “the potential of water point data to improve rural water services through evidence-based decision-making.” At the time of this writing, the repository had over 406,566 records from 54 countries. More than just a data repository, the WPdx seeks to gather a community of contributors and collaborators. The WPdx has three main components:

WPdx Data Standard

The WPdx Data Standard was collaboratively designed for data collection from rural areas at the water point or small water scheme level. The core parameters included in the WPdx Data Standard are parameters which are commonly measured by governments, non-governmental organizations, and researchers to enable easy sharing without changing the types of data typically collected. The WPdx Data Standard is managed and updated on an as-needed basis by a Global Working Group. The site provides a link to the entire WPdx Data Standard.

Global Data Repositories

The WPdx Data Repository is a cloud-based data library that enables sharing of global data that is compliant with the WPdx Data Standard. Data is fully open and free to access. Data is machine readable via an API. The repository includes an online data playground for analysis and visualization. I tested the visualization tools with success; to access the WPdx Global Data Repository, click here. An enhanced subset of the data, WPdx+ is also available. Please click here to learn more about two datasets.

Decision Support Tools

WPdx currently offers 4 decision support tools to improve evidence-based decision making. The tools were designed in response to the most frequently asked questions in a survey of government water experts. To learn more about the tools, click here. The tools include:

  1. Measure Water Access By District. How many people lack basic access per district? In which districts should investments be focused?
  2. Prioritize Locations for Rehabilitation. Which rehabilitation would reach the most people? Where are people currently unserved due to a broken water point?
  3. Prioritize Locations for New Construction. Which new locations would reach the most people? Where are people currently unserved by an existing water point?
  4. Predict Current Water Point Status. Which water points are at highest risk of failure? Why? Where should preventative maintenance be focused?
The Water Point Data Exchange.

As the theme of this data blog is being critical of information, readers of this blog will be interested in the following statements posted on the site:

Functionality datasets represent a snap-shot at one point in time. They do not indicate whether sources identified as non-functional will be fixed the next day, the next week or never – sources identified as non-functional are not necessarily permanently out of service.  Further, it is difficult to arrive at a useful comparison of different data sets without considering the context at play in each country/district where data sets originate. See “What factors can influence the long-term functionality of water points?” for more on this. Data is provided “as-is” from data contributors, noted in the “source” attribute. Any attributes reformatted by the WPdx team are noted in the “converted” field. No additional validation or verification is done by WPdx. Lastly, data on WPdx has been uploaded by multiple sources and may not be statistically representative of national water point functionality.

The data site is straightforward–tables are fairly easily filter-able and download-able. There is a mapping tool on the site, but I suspect most users will want to filter and download the data tables and bring the data into their own GIS.

I encourage you to give this resource a try.

Joseph Kerski

Categories: Public Domain Data