Archive

Archive for July, 2018

Using Kaggle big data in a GIS

Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective. [Wikipedia].   Over a half million people are in the Kaggle community, from nearly every country in the world.   Kaggle was acquired by Google a few years ago.  You can also learn about R, SQL, machine learning, and other topics on the site.  Why mention Kaggle in our geospatial data blog?  Kaggle hosts data sets on their site, some of which are spatial in nature, and some of which are truly “big data” (such as 9 million open images URLs), and as such, it represents a source of information for the GIS analyst, researcher, and instructor.

Because the data posted to Kaggle comes from a global community with diverse interests, expect an unusual array of data sets, from chest x-rays, superheroes, air quality, to birdsongs.  Some data are from surveys.  Many intriguing gems exist; for example, one of the data sets of interest to me as a geographer on the Kaggle site is the world happiness data .  It is available as a CSV for three different years.  The only unfortunate aspect of these tables is the lack of a country code; and relying only on name of country could present problems in joining the data to a map.

One can also learn about data sources by spending time on the Kaggle site.  For example, I learned about Uber Movement that contains data from selected cities and points of departure, Sports Reference that someone used to scrape 120 years of Olympic history data from, and a cancer imaging archive that someone used to obtain disease type and location.    Given the nature of the site, expect all sorts of oddities: My search on mountains of the world resulted in lots of “404 Not Found” errors; some data is documented and others not so much; and obtaining some of the data requires the user to be a programmer.  Still, Kaggle is a useful and unusual source worthy of attention, and given the rapid evolution in big  data and crowdsourcing, as we frequently write about on this blog, I expect that we will be seeing many more sites like this in the future.

kaggle

A section of the Kaggle listing of data sets, showing the diversity of themes, scales, and sizes. 

Advertisements

Historical Imagery for the entire world now available via Wayback Service in ArcGIS from Esri

I know that many of you regularly want to examine changes-over-space-and-time with imagery and GIS for research or instruction purposes.   As of last week, 81 different dates of historical imagery for the past 5 years now reside in ArcGIS via the World Imagery Wayback service.   For more information, see: https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/imagery/wayback-81-flavors-of-world-imagery/

You can access this imagery in ArcGIS Online, ArcMap, and ArcGIS Pro.  A great place to start is the World Imagery Wayback app – just by using a web browser  – https://livingatlas.arcgis.com/wayback/    A fascinating and an incredible resource for examining land use and land cover change, changes in water levels of reservoirs, coastal erosion, deforestation, regrowth, urbanization, and much more.  This resource covers the entire globe.

However, in keeping with the theme of our book The GIS Guide to Public Domain Data and this blog of being critical of the data, caution is needed.  The dates represent the update of the Esri World Imagery service.  This service is fed by multiple sources, private and public, from local and global sources.  Thus, the date does not mean that every location that you examine on the image is current as of that date.  I verified this in several locations where my ground observations in my local area show construction as of June 2018, for example, but that construction does not appear on the image.  In addition, several other places I examined from wintertime in the Northern Hemisphere were clearly “leaf-on” and taken during the summer before, or even from the summer before that.  Therefore, as always, know what you are working with.  Despite these cautions, the imagery still represents an amazing and useful resource.

wayback1wayback2

Sample from this imagery set for 30 July 2014 (top) and four years later, 27 June 2018 (bottom) for an area outside Denver, Colorado USA.