A group of people at the Civic Analytics Network recently wrote “An Open Letter to the Open Data Community” that focuses on topics central to this blog and to our book. The Civics Analytics Network, is “a consortium of Chief Data Officers and analytics principals in large cities and counties throughout the United States.” They state that their purpose is to “work together to advance how local governments use data to be more efficient, innovative, and in particular, transparent.”
The letter contained 8 guidelines the group believed that if followed, would “advance the capabilities of government data portals across the board and help deliver upon the promise of a transparent government.” The guidelines included the following:
- Improve accessibility and usability to engage a wider audience.
- Move away from a single dataset centric view.
- Treat geospatial data as a first class data type.
- Improve management and usability of metadata.
- Decrease the cost and work required to publish data.
- Introduce revision history.
- Improve management of large datasets.
- Set clear transparent pricing based on memory, not number of datasets.
It is difficult to imagine a letter that is more germane to what we have been advocating on the Spatial Reserves blog. We have been open about our praise of data portals that are user friendly–and critical of those that miss the mark–over the past five years. We have noted the impact that the open data movement has had on the data portals themselves–becoming in many cases more user friendly and encouraging adoption of GIS beyond its traditional departmental boundaries. The principles we have adhered to are also mentioned in this letter, such as being intuitive, data-driven, and with metrics. The letter highlights a continued need, the ability to tie together and compare related data sets, which is at times challenging given “data silos.”
One of my favorite points in the letter is the authors’ admonition to “treat geospatial data as a first class data type.” The authors claim that geospatial data is an underdeveloped and undervalued asset; and it “needs to be an integral part of any open data program”, citing examples from Chicago’s OpenGrid and Los Angeles’ GeoHub as forward-thinking models.
On the topic of metadata, the authors call for portals and managers to allow “custom metadata schemes, API methods to define and update the schema and content, and user interfaces that surface and support end-user use of the metadata.” Hear, hear! Equally welcome is the authors’ call to decrease the cost and work required to publish data. Through their point #6 about revision history, they advocate that these data sets need to be curated and updated but also allow historical versions to be accessed.
What are your reactions to this letter? What do we need to do as the geospatial community to realize these aims?
In this blog, we have reviewed many international, national, regional, and local data portals over the past 5 years, those that are useful and those that still need “some work.” One of the oldest USA state data portals is from Utah’s Automated Geographic Reference Center (AGRC). I remember as a young US Census Bureau geographer, working with the AGRC on building the TIGER system back in the late 1980s, and they were thinking about data distribution even back then. As GIS has evolved, so has the AGRC, and they remain one of the organizations I respect most in GIS. Besides the wide variety of raster and vector data sets they offer for download, the AGRC also provides geocoding and point-in-polygon map queries via their own APIs, from api.mapserv.utah.gov. In addition, the AGRC provides access to Utah’s TURN high resolution GPS base station network.
Utah has also established an open data site with a wide variety of data sets, in a multitude of formats, with extensive metadata, for download and also in GeoJSON and GeoService formats. In short, the Utah portal is everything a geodata portal should be, modern and responsive, with links to web based GIS services, designed with the data user in mind. I am not surprised by this, as I have long had a high regard for the way that those in academia, nonprofit organizations, government agencies, private companies, and even primary and secondary schools work closely together in the Utah GIS community, as I document in this video on one of my trips there.
This Utah story map created by my colleague at Esri shows how some of these data sources can be used to tell the story of demographic change and the natural resources of Utah. Scroll down to the links at the end of the map to explore the data sources behind the map. I encourage you to give the Utah AGRC data portal a try.
Open Data continues to make progress as manifested in data portals, organizations adopting it, and associated literature. Are private companies also involved in Open Data? Yes. As early as two years ago, we wrote about Esri’s initiatives in ArcGIS Open Data. Imagery and geospatial data company DigitalGlobe have created DigitalGlobe’s open data portal, as part of their efforts to provide “accurate high-resolution satellite imagery to support disaster recovery in the wake of large-scale natural disasters”. This includes pre-event imagery, post-event imagery and a crowdsourced damage assessment. Associated imagery and crowdsourcing layers are released into the public domain under a Creative Commons 4.0 license, allowing for rapid use and easy integration with existing humanitarian response technologies. For example, their imagery for areas affected by Hurricane Matthew in 2016 is available here.
On a related note, I have worked with DigitalGlobe staff for years on educational initiatives. They provided me with high resolution imagery for an area in Africa I was conducting a workshop in, and more recently with imagery in Southeast Asia that I needed in conjunction with helping Penn State prepare exercises for their GEOINT MOOC (Massive Open Online Course in Geointelligence). They have always been generous and wonderful to work with and I salute their Open Data Portal initiative. In the MOOC we also used their Tomnod crowdsourcing platform with great success and interest from the course participants.
Despite the growing volume of geospatial data available, and the ease of use of much of this data, finding and using data remains a challenge. To assist data users in these ongoing challenges, I have written a new activity entitled “Key Strategies for Finding Content and Understanding What You’ve Found.” The goal of this activity ” Key Strategies for Finding and Using Spatial Data” is to enable GIS data users to understand what spatial analysis is, effectively find spatial data, use spatial data, and become familiar with the ArcGIS platform in the process. I tested the activity with a group of GIS educators and now would like to share it with the broader GIS community.
The document makes it clear that we are still in a hybrid world–still needing to download some data for our work in GIS, but increasingly able to stream data from online data services such as those in ArcGIS Online. But these concepts don’t make as much sense unless one actually practices doing this–hence the activity.
In the activity, I ask the user to first practice search strategies in ArcGIS Online, using tags and keywords. Then, I guide the user through the process of downloading and using a CSV file with real-time data. After a brief review of data types and resources, I guide the user of the activity through the process of downloading data from a local government agency to solve a problem about flood hazards. The next step asks users to compare this process of downloading data with streaming the same data from the same local government’s site (in this case, using data from Boulder County, Colorado) into ArcGIS Online. The activity concludes with resources to discover more about these methods of accessing data.
Jill Clark and I have created other hands-on activities on this theme of finding and understanding data as well, available here. We look forward to hearing your comments and I hope this new activity is useful.
One of the exercises in our book involves accessing Boulder County Colorado’s GIS site to make decisions about flood hazards. We chose Boulder County for this activity in large part because their data covers a wide variety of themes, is quite detailed, and is easy to download and use. Recently, Boulder County went even further, with the launch of their new geospatial open data platform. This development follows other essays we have written about in this blog about open data, such as the ENERGIC OD, ArcGIS Open Data, EPA flood risk, Australian national map initiative, and open data institute nodes. Other open data nodes are linked to a live web map on the ArcGIS Open Data site.
Accessible here, Boulder County’s open data platform expands the usability of the data, such as providing previews of the data in mapped form and in tabular form. The new platform allows for additional data themes to be accessed; such as the lakes and reservoirs, 2013 flood channel, floodplain, and streams and ditches, all accessible as a result from a search on “hydrography” below. Subsets of large data sets can also be accessed. In addition, the services for each data set are now provided, such as in GeoJSON and GeoService formats, which allows for the data to be streamed directly to such portals such as ArcGIS Online, and thus avoid downloading the data sets altogether.
Why did the county do this? Boulder County says they are “committed to ensuring that geospatial data is as open, discoverable and usable as possible in order to promote community engagement, stimulate innovation and increase productivity.” The county is providing an incredibly useful service to the community through their newest innovative efforts, and I congratulate them. I also hope that more government agencies follow their lead.
The UK Government’s Department for Environment, Food and Rural Affairs (Defra), recently announced the release of a LIDAR point cloud, the raw data used to generate a number of digital terrain models (DTMs) that were released last year. In addition to providing terrain models for flood modelling and coastline management, the LIDAR data have also been revealing much about long-buried Roman roads and buildings, such as the Vindolanda fort just south of Hadrian’s Wall in northern England.
Environment Agency/Defra LIDAR data
The point cloud data have been released as part of the #OpenDefra project, which aims to make 8,000 datasets publicly available by mid 2016. The first release of point cloud data contains over 16,000 km 2 of survey data and is available to download from:
The data are licensed under version 3.0 of the Open Government Licence.
Data discover-ability, accessibility, and integration are frequent barriers for scientists and a major obstacle for favorable results on environmental research. To tackle this issue, one that is raised in our book and in this blog, the Group on Earth Observations (GEO) is leading the development of the Global Earth Observation System of Systems (GEOSS), a voluntary effort that connects Earth Observation resources world-wide, acting as a gateway between producers and users of environmental data.
Barbara Ryan, Director, GEO Secretariat, says that, “The primary goal is the assurance of Earth observations so that we can address society’s environmental problems. While many of our activities are targeted toward monitoring global change, we’re actually more concerned about the assurance, continuity, sustainability and interoperability of observing systems, so that monitoring across multiple domains can be done. Governments, research organizations and others actually do the monitoring. We just want to make sure that the assets are in place, and that the data from these monitoring efforts is shared broadly. One of GEO’s primary objectives is to advocate broad, open data sharing, particularly if the data was collected at taxpayer expense—the citizens of the world should have access to that information”
“In this regard, during the first part of GEO, 2004-2009, we looked at the GEO mission as a massive cataloging effort. Then, about two years ago, we changed strategies. We transitioned to a brokering approach whereby interoperability agreements were established with institutions that have datasets and/or databases, rather than us seeking out individual datasets. An example of this approach is illustrated with our agreement with the World Meteorological Organization (WMO). WMO
members have generally registered their data in the WMO Information System (WIS). So we worked on an interoperability arrangement between GEOSS and the WIS resulting in data from one system being discovered by the other system. We are now hearing, particularly from some members in the developing world, that they are getting access to information that they didn’t know existed.”
“WMO members are getting biodiversity and ecosystem information that wouldn’t normally be delivered through the WIS that focuses on weather, climate and water, and GEO members are gaining increased visibility to information in the WIS. It’s a win-win story, and we’d like to have interoperability brokering agreements with any institution that wants its environmental information broadly viewed and accessible throughout the world.”
“Many of the 25 countries that produce 80% of the world’s crops have global forecasting capabilities. GEO is advocating that information from these countries be shared more broadly and openly, and that algorithms be harmonized so that forecasts are improved around the world. Global transparency will help create more stability and a more food-secure world. A related aspect of the security issue is that governments do not want another government having easy access to what is happening over their domain with the fear that this information will be used against them. While this concern is recognized, most of the information that GEO is interested in transcends national boundaries. Atmospheric, oceanic and many terrestrial processes do not respect national boundaries, and actions in one part of the world often have wide-spread consequences. The benefits of broader data sharing almost always outweigh the risks associated with not sharing data.”
These are welcome words to us here as authors of Spatial Reserves and also most likely will be welcome words for the entire geospatial community. I look forward someday soon to be able to search for and use data using the GEOSS.
Over the last three years we’ve written about a few of the problems associated with some data portals, which although well-intentioned, haven’t always provided the level of access to geospatial information that they promised. Interoperability issues, interface design and a lack of on-going support have contributed to many such initiatives failing to deliver. With the experience gained from those earlier efforts and perhaps the benefit of hind-sight, new initiatives are being developed to provide better access to the plethora of public domain and open data geospatial information that is available online.
Among those new initiatives is the ENERGIC OD project (European NEtwork for Redistributing Geospatial Information to user Communities – Open Data). Launched at the end of 2014, the project aims to address some of the problems that have resulted from the evolution of disparate and heterogeneous GI systems and technologies by providing what are referred to as Virtual Hubs. These hubs will provide a single point of access to geospatial datasets, including access to INSPIRE compliant systems and Copernicus satellite and sensor data (Copernicus was previously known as GMES). The brokering framework at the centre of the solution will allow the hubs to connect to a wide range of European data sources making it easier for end users, public authorities and private organisations, and developers alike to access the data without having to resolve the interoperability and standardisation issues themselves.
The ENERGIC OD project will run for three years and deploy five national virtual hubs in France, Germany, Italy, Poland and Spain.
According to Esri’s 2014 Open Data year in review, over 763 organizations around the world have joined ArcGIS Open Data, publishing 391 public sites, resulting in 15,848 open data sets shared. These organizations include over 99 cities, 43 countries, and 35 US states. At the beginning of 2015, the organizations represent 390 from North America, 157 from Europe, 121 from Africa, 39 from Asia, and 22 from Oceania. Over 42,000 shapefiles, KML files, and CSV files were downloaded from these sites since July 2014. Recently, we wrote about one of these sites, the Maryland Open Data Portal, in this blog. Another is the set of layers from the city of Launceton, in Tasmania, Australia.
While these initiatives are specifically using one set of methods and tools to share, that of the ArcGIS Open Data, the implications on the data user community are profound: First, the adoption of ArcGIS Open Data increases availability for the entire user community, not just Esri users. This is because of the increased number of portals that result, and also because the data sets shared, such as raster and vector data services, KMLs, shapefiles, and CSVs, are the types of formats that can be consumed by many types of GIS online and desktop tools. Second, as we have expressed in our book and in this blog, while there were noble attempts for 30 years on behalf of regional, national, and international government organizations to establish standards, to share data, and to encourage a climate of sharing, and while many of those attempts were and will continue to be successful, the involvement of private industry (in this case, Esri), nonprofit organizations, and academia will lend an enormous boost to government efforts.
Third, the advent of cloud-based GIS enables these portals to be fairly easily established, curated, and improved. Using the ArcGIS Open Data platform, organizations can leave their data where it is–whether on ArcGIS for Server or in ArcGIS Online–and simply share it as Open Data. Esri uses Koop to transform data into different formats, to access APIs, and to get data ready for discovery and exploration. Organizations add their nodes to the Open Data list and their data can then be accessed, explored, and downloaded in multiple formats without “extraneous exports or transformations.” Specifically, organizations using ArcGIS Open Data first enable the open data capabilities, then specify the groups for open data, then configure their open data site, and then make the site public.
I see one of the chief ways tools like ArcGIS Open Data will advance the open data movement is through the use of tools that are easy to use, and also that will evolve over time. Nobody has an infinite amount of time trying to figure out how to best serve their organization’s data, and then to construct the tools in which to do so. The ability for data-producing organizations to use these common tools and methods represents, I believe, an enormous advantage in the time savings it represents. As more organizations realize and adopt this, all of us in the GIS community, and beyond, will benefit.
The signing of the Open Data Charter by G8 leaders in 2013 promised to make public sector data open, free of charge and available to all in re-usable formats. However, despite the attention open data subsequently received, a recent report by the World Wide Web Foundation (featured in a BBC article) highlighted some ongoing problems making the pledges enshrined in the Open Data Charter a reality. Many countries have failed to deliver what the report referred to as a policy framework for open data.
Although the UK and USA were at the top of the global rankings for countries providing access to open data, they and many other countries still have a lot of work before they can claim to have fully open government. Of particular note in the UK is the ongoing debate over access to the Royal Mail’s Postcode Address File (PAF). Although the PAF dataset is cited as the ‘definitive source of postal address information’ in the UK and used in many digital mapping applications, the current charges and licensing arrangements deter many potential users of the dataset. Many commentators have argued that the PAF dataset could become the standard address resource for commercial and non-commercial uses in the UK if it was made available in an easy to use and open format. This would encourage much wider adoption of the dataset and prevent the further proliferation of alternatives sources of address information. With the spotlight back on open access to address data, will 2015 be the year the PAF joins the growing list of open, and free of charge, spatial datasets?