Search Results

Keyword: ‘big data’

Era of Big Data is Here: But Caution Is Needed

September 25, 2017 Leave a comment

As this blog and our book are focused on geospatial data, it makes sense that we discuss trends in data–such as laws, standards, attitudes, and tools that gradually helping more users to more quickly find the data that they need.  But with all of these advancements we continue to implore decision makers to think carefully about and investigate the data sources they are using.  This becomes especially critical–and at times difficult–when that data is in the “big data” category.  The difficulty arises when big data is seen as so complex that often it is cited and used in an unquestioned manner.

Equally challenging and at times troublesome is when the algorithms based on that data are unchallenged, and when access to those algorithms are blocked to those who seek to understand who created them and what data and formulas they are based on.  As these data and algorithms increasingly affect our everyday lives, this can become a major concern, as explained in data scientist Cathy O’Neil’s TED talk,  who says “the era of blind faith in big data must end.”

In addition, the ability to gain information from mapping social media is amazing and has potential to help in so many sectors of society.  This was clearly evident with the usefulness of social media posts that emergency managers in Texas and Florida USA mapped during the August-September 2017 hurricanes there.  However, with mapping social media comes an equal if not greater need for caution, as this article that points out the limitations of such data for understanding health and mitigating the flu.  And from a marketing standpoint, Paul Goad cautioned here against relying on data alone.

It is easy to overlook an important point in all this discussion on data, big data, and data science. We tend to refer to these phenomena in abstract terms but these data largely represent us – our lives, our habits, our shopping preferences, our choice of route on the way to work, the companies and organisations we work for and so on. Perhaps less data and data science and more humanity and humanity science.  As Eric Schmidt, CEO of Google, has said, “We must remember that technology remains a tool of humanity.  How can we, and corporate giants, then use these big data archives as a tool to serve humanity?”

Understanding your data

Use caution in making decisions from data–even if you’re using “Big Data” and algorithms derived from it.    Photograph by Joseph Kerski. 

Advertisements
Categories: Public Domain Data

Findings of the Big Data and Privacy Working Group

May 11, 2014 2 comments

US White House senior counselor John Podesta recently summarized an extensive review of big data and privacy that he led.  Over 90 days, he met with academic researchers, privacy advocates, regulators, technology industry representatives, advertisers, and civil rights groups.  The findings were presented on 1 May 2014 to the President and summarized in Mr Podesta’s report but the full 79-page report is also available.  In the report, geospatial data is recognized as an important contributor to big data but does not receive special attention over other types of data.  Nevertheless, the report provides a useful overview of the current opportunities of big data and the challenges it poses to privacy.

After discussing some of the technological trends making big data possible, the report then details the opportunities it presents:  Saving lives (through monitoring infections in newborns), making the economy work better (through sensors in jet engines, monitoring of peak electrical demands), and making government work better (by being able to predict reimbursement fraud in insurance, for example).   Next, the report raises some of the serious concerns that accompany big data, such as how to protect our privacy and how to make sure that it does not enable civil rights protections to be circumvented.

Recommendations from the report include advancing the proposed Consumer Privacy Bill of Rights, passing National Data Breach legislation, extending privacy protections to non-US persons, ensuring data collected on students in school is used for educational purposes, expanding technical expertise to stop discrimination, and amending the electronic communications privacy act.  In short, the report recognizes the immense benefit that big data brings, but also the challenges, and makes specific recommendations for governments to deal with those challenges.

Categories: Public Domain Data

Geospatial Advances Drive the Big Data Problem but Also its Solution

In a recent essay, Erik Shepard claims that geospatial advances drive the big data problem but also its solution:  http://www.sensysmag.com/article/features/27558-geospatial-advances-drive-big-data-problem,-solution.html.  ”  The expansion of geospatial data is estimated to be 1 exabyte per day, according to Dr. Dan Sui.  Land use data, satellite and aerial imagery, transportation data, and crowd-sourced data all contribute to this expansion, but GIS also offers tools to manage the very data that it is contributing to.

We discuss these issues in our book, The GIS Guide to Public Domain Data.  These statements from Shepard are particularly relevant to the reflections we offer in our book:  “Today there is a dawning appreciation of the assumptions that drive spatial analysis, and how those assumptions affect results.  Questions such as what map projection is selected – does it preserve distance, direction or area? Considerations of factors such as the modifiable areal unit problem, or spatial autocorrelation.”

Indeed!  Today’s data users have more data at their fingertips than ever before.  But with that data comes choices about what to use, how, and why.  And those choices must be made carefully.

Categories: Public Domain Data Tags:

Best Available Data: “BAD” Data?

August 14, 2017 3 comments

You may have heard the phrase that the “Best Available Data” is sometimes “BAD” Data. Why?  As the acronym implies, BAD data is often used “just because it is right at your fingertips,” and is often of lower quality than the data that could be obtained with more time, planning, and effort.  We have made the case in our book and in this blog for 5 years now that data quality actually matters, not just as a theoretical concept, but in day to day decision-making.  Data quality is particularly important in the field of GIS, where so many decisions are made based on analyzing mapped information.

All of this daily-used information hinges on the quality of the original data. Compounding the issue is that the temptation to settle for the easily obtained grows as the web GIS paradigm, with its ease of use and plethora of data sets, makes it easier and easier to quickly add data layers and be off on your way.  To be sure, there are times when the easily obtained is also of acceptable or even high quality.  Judging whether it is acceptable depends on the data user and that user’s needs and goals; “fitness for use.”

One intriguing and important resource in determining the quality of your data can be found in The Bad Data Handbook, published by O’Reilly Media, by Q. Ethan McCallum and 18 contributing authors.  They wrote about their experiences, their methods and their successes and challenges in dealing with datasets that are “bad” in some key ways.   The resulting 19 chapters and 250-ish pages may make you want to put this on your “would love to but don’t have time” pile, but I urge you to consider reading it.  The book is written in an engaging manner; many parts are even funny, evident in phrases such as, “When Databases attack” and “Is It Just Me or Does This Data Smell Funny?”

Despite the lively and often humorous approach, there is much practical wisdom here.  For example, many of us in the GIS field can relate to being somewhat perfectionist, so the chapter on, “Don’t Let the Perfect be the Enemy of the Good” is quite pertinent.   In another example, the authors provide a helpful “Four Cs of Data Quality Analysis.”  These include:
1. Complete: Is everything here that’s supposed to be here?
2. Coherent: Does all of the data “add up?”
3. Correct: Are these, in fact, the right values?
4. aCcountable: Can we trace the data?

Unix administrator Sandra Henry-Stocker wrote a review of the book here,  An online version of the book is here, from it-ebooks.info, but in keeping with the themes of this blog, you might wish to make sure that it is fair to the author that you read it from this site rather than purchasing the book.  I think that purchasing the book would be well worth the investment.  Don’t let the 2012 publication date, the fact that it is not GIS-focused per se, and the frequent inclusion of code put you off; this really is essential reading–or at least skimming–for all who are in the field of geotechnology.

baddatabook.PNG

Bad Data book by Q. Ethan McCallum and others. 

 

Connections between Geospatial Data and Becoming a Data Professional

September 25, 2016 Leave a comment

Dr. Dawn Wright, Chief Scientist at Esri, recently shared a presentation she gave on the topic of “A Geospatial Industry Perspective on Becoming a Data Professional.”

How can GIS and Big Data be conceptualized and applied to solve problems?  How can the way we define and train data professionals move the integration of Big Data and GIS simultaneously forward?  How can GIS as a system and GIS as a science be brought together to meet the challenges we face as a global community?   What is the difference between a classic GIS researcher and a modern GIS researcher?   How and why must GIS become part of open science?

These issues and more are examined in the slides and the thought-provoking text underneath each slide.  Geographic Information Science has long welcomed strong collaborations among computer scientists, information scientists, and other Earth scientists to solve complex scientific questions, and therefore parallels the emergence as well as the acceptance of “data science.”

But the researchers and developers in “data science” need to be encouraged and recruited from somewhere, and once they have arrived, they need to blaze a lifelong learning pathway.  Therefore, germane to any discussion on emerging fields such as data science is how students are educated, trained, and recruited–here, as data professionals within the geospatial industry.  Such discussion needs to include certification, solving problems, critical thinking, and ascribing to codes of ethics.

I submit that the integration of GIS and open science not only will be enriched by the immersion of issues that we bring up in this blog and in our book, but is actually dependent in large part on researchers and developers who understand such issues and can put them into practice.  What issues?  Issues of understanding geospatial data and knowing how to apply it to real-world problems, of scale, or data quality, of crowdsourcing, of data standards and portals, and others that we frequently raise here.  Nurturing these skills and abilities in geospatial professionals is a key way of helping GIS become a key part of data science, and our ability to move GIS from being a “niche” technology or perspective to one that all data scientists use and share.

data_professional.PNG

This presentation by Dr. Dawn Wright touches on the themes of data and this blog from a professional development perspective.

 

2015 and Beyond: Who will control the data?

November 17, 2015 1 comment

Earlier this year Michael F. Goodchild, Emeritus Professor of Geography at the University of California at Santa Barbara, shared some thoughts about current and future GIS-related developments in an article for ArcWatch. It was interesting to note the importance attached to the issues of privacy and the volume of personal information that is now routinely captured through our browsing habits and online activities.

Prof. Goodchild sees the privacy issue as essentially one of control; what control do we as individuals have over the data that are captured about us and how that data are used. For some the solution may be to create their own personal data stores and retreat from public forums on the Internet. For others, an increasing appreciation of the value of personal information to governments and corporations, may offer a way to reclaim some control over their data. The data could be sold or traded for access to services, a trend we also commented on in a previous post.

Turning next to big data, the associated issues were characterised as the three Vs:

  • Volume—Capture, management and analysis of unprecedented volumes of data
  • Variety—Multiple data sources to locate, access, search and retrieve data from
  • Velocity—Real-time or near real-time monitoring and data collection

Together the three Vs bring a new set of challenges for data analysts and new tools and techniques will be required to process and analyse the data. These tools will be required to not only better illustrate the patterns of current behaviour but to predict more accurately future events, such as extreme weather and the outbreak and the spread of infectious diseases, and socio-economic trends. In a recent post on GIS Lounge Zachary Romano described one such initiative from Orbital Insights,  a ‘geospatial big data’ company based in California. The company is developing deep learning processes that will recognise patterns of human behaviour in satellite imagery and cited the examples of the number of cars in a car park as an indicator of retail sales or the presence of shadows as an indicator of construction activity. As the author noted, ‘Applications of this analytical tool are theoretically endless‘.

Will these new tools use satellite imagery to track changes at the level of individual properties? Assuming potentially yes, the issue of control over personal data comes to the fore again, only this time most of us won’t know what satellites are watching us, which organisations or governments control those satellites and who is doing what with our data.

 

Spatial Agent: Highlighting Public Domain Datasets

October 19, 2015 1 comment

The World Bank recently announced the release of a new Spatial Agent app for iOS and Android (web version also available). The app curates an already impressive collection of public domain spatial datasets in a variety of formats from over 300 web services, with the developers promising to add more iconic datasets. App users can choose between the following data sources:

  • Indicators (for example % of female employees in agriculture or % of forested land areas)
  • Map layers
  • Other (for example the Nepalese major river system or hydro power plants in Malawi)

The data can be displayed against a back-drop of one of four base map sources:

  • Shaded relief (NOAA)
  • Street map
  • Topographic map
  • World imagery

with the option to set the area of interest by Country, Basin or Region.

In this example a layer of CIESIN’s earthquake hazard frequency and distribution data is displayed against a backdrop of world imagery.

Spatial Agent: Earthquake Hazards

Spatial Agent: Earthquake Hazards

Each dataset is accompanied by a short description of the source and intended purpose and as the datasets are public domain, they may be shared through email and/or social media.

The World Bank hope that the app will help spread the news about public domain data and go some way to organising the ‘current big data cosmos’.

 

 

 

 

 

 

 

 

Data Drives Everything (But the Bridges Need a Lot of Work)

September 14, 2014 1 comment

A new article in Earthzine entitled “Data Drives Everything, but the Bridges Need a Lot of Work” by Osha Gray Davidson seems to encapsulate one of the main themes of this blog and our book.

Dr Francine Berman directs the Center for a Digital Society at Rensselaer Polytechnic Institute, in Troy, New York, and as the article states, “has always been drawn to ambitious ‘big picture’ issues” at the “intersection of math, philosophy, and computers.”  Her project, the Research Data Alliance (RDA), has a goal of changing the way in which data are collected, used, and shared to solve specific problems around the globe.  Those large and important tasks should sound familiar to most GIS professionals.

And the project seems to have resonated with others, too–1,600 members from 70 countries have joined the RDA as members.  Reaching across boundaries and breaking down barriers that make data sharing difficult or impossible is one of the RDA’s chief goals.  Finding solutions to real-world problems is accomplished through Interest Groups, which then create more focused Working Groups.  I was pleased to see Interest Groups such as Big Data Analytics, Data In Context, and Geospatial, but at this point, a Working Group for Geospatial is still needed.  Perhaps someone from the geospatial community needs to step up and lead the Working Group effort.   I read the charter for the Geospatial Interest Group and though brief, it seems solid, with an identification of some of the chief challenges and major organizations to work with into the future to make their vision a reality.

I wish the group well, but simple wishing isn’t going to achieve data sharing for better decision making.  As we point out in our book with regards to this issue, geospatial goals for an organization like this are not going to be realized without the GIS community stepping forward.  Please investigate the RDA and consider how you might help their important effort.

Research Data Alliance

Research Data Alliance.

Open Government Data book by Joshua Tauberer

February 2, 2014 1 comment

An online e-book entitled Open Government Data by Joshua Tauberer is, according to the author, “the culmination of several years of thinking about the principles behind the open government data movement in the United States.”  In the book, he “frame[s] the movement as the application of Big Data to civics. Topics include principles, uses for transparency and civic engagement, a brief legal history, data quality, civic hacking, and paradoxes in transparency.”

The author is the creator of the US Congress-tracking tool GovTrack.us, which launched in 2004, helping to spur the national open government data community. He was also a co-founder of POPVOX, a platform for advocacy, providing a means for citizens to communicate with Congress about the issues they care about.

Tauberer mentions GIS data in part 2.2 where he uses Google Transit Feed Specification data as an example (three-quarters of the way down the page, in Figure 8) to visualize ridership in the Washington DC area.  But despite the lack of overt GIS references, I believe this book could be useful to the readers of our book and this blog.  Its chapters include “Big Data Meets Open Government”, “Civic Hacking by Example”, “Applications to Open Government”, “A Brief Legal History of Open Government Data”, “Paradoxes in Open Government”, and “Example Policy Language”.  In particular, the chapter on “A Brief Legal History of Open Government Data” provides useful additional reading after reading Chapter 1 of our book, The GIS Guide to Public Domain Data.  Through reading Tauberer’s book, one can better understand how spatial data can and should fit into larger open data and open government initiatives.

Open Government Data book

Open Government Data book.