The Biodiversity Information Serving Our Nation (BISON) resource from the USGS was re-launched recently and, as the name implies, provides access to biodiversity records. Researchers collect species occurrence data, records of an organism at a particular time in a particular place, as a primary or ancillary function of many biological field investigations. These data reside in numerous distributed systems and formats (including publications) and are consequently not being used to their full potential. As a step toward addressing this challenge, and to serve the data through a single portal, the Core Science Analytics and Synthesis (CSAS) program of the USGS developed BISON, an integrated and permanent resource for biological occurrence data from the United States.
BISON currently provides access to more than 110 million U.S. primary biodiversity records, many of which are made available through over 300 data providers of the Global Biodiversity Information Facility. The list of data providers is indeed impressive, ranging from organizations to universities to government agencies, and more, which contribute millions of additional records to those that the USGS began with.
The primary search mechanism is an easy-to-use map interface. The Help file is concise and useful. The legends take awhile to understand, but much of the data can be visualized on the map. I had just a bit of trouble selecting and downloading information on a selected species from the site, but was very pleased to see that the output formats include CSV, KML, and zipped SHP. About 20 data fields are provided through the data download, including, of course, latitude and longitude. My search for prairie dog netted me point locations where the species had been collected, while I was expecting to find locations where they occurred. My search for the western red-tailed hawk forced me into state polygons rather than points, and the choropleth map shading interfered with my map visualization. My download choices all were too large to download from this interface. It directed me to a different interface in this case when I selected too many records (though I would have thought 485 records were not excessive).
I found the portal helpful for small samples but finding out how to download larger amounts of data will take further investigation. With persistence and time, this could be a valuable resource for people needing information that previously required digging through many databases held by different agencies.
The Open Geoportal (OGP) project is ‘…. a collaboratively developed, open source, federated web application to rapidly discover, preview, and retrieve geospatial data from multiple organizations‘. The project, lead by Tufts University in conjunction a number of partner organisations including Harvard, MIT, Stanford and UCLA, was established to provide a framework for organizations to share geospatial data layers, maps, metadata, and development resources through a common interface. Version 2.0 of the OGP was released in April 2013, providing an improved interface and interoperability for a number of web mapping environments.
OGP currently supports four production geoportal instances:
- Harvard Geospatial Library: Geospatial data catalog from the Harvard University libraries
- UC Berkeley Geospatial Data Repository: Geospatial data from UC Berkeley Library
- MIT GeoWeb: Geospatial data from the MIT Geodata Repository, MassGIS, and Harvard Geospatial Library
- GeoData@Tufts: Geoportal developed and maintained by Tufts University Information Technology, providing search tools for data discovery and for use in teaching, learning, and research.
The data may be streamed, downloaded or shared as required. Although many of the data layers are publicly available, access to some of the layers is restricted and requires registration with the geoportal.
A number of geoportals are currently in development including those from the universities of Colombia, Washington and Yale.
In our book on GIS and Public Domain Data, we describe several court cases that illustrate the ongoing debates and ways of thinking about the value of public domain spatial data, who should pay for it, and who should have access to it. One of the most famous cases is that of the Sierra Club vs. Orange County California. To recap, the Sierra Club is suing Orange County for access to its GIS-compatible digital parcel basemap database under terms of the California Public Records Act that include paying no more than the direct cost of duplication. Orange County has been requiring users of its “OC Landbase” to pay USD $475,000, plus sign a license that restricts sharing or redistribution of its database.
Although Orange County abruptly reduced its price late in December 2011, the case has been going on since 2009. At stake is whether the public has unfettered access to the GIS-compatible data that its government agencies use to conduct “the public’s business,” in the same geodatabase format that the agencies themselves use, or whether the government can license, restrict and charge high prices for such access. As more and more governmental decisions and actions are based on GIS analysis, the issue is central to governmental transparency and accountability to citizens.
The California Public Records Act states in §6253.9 that any agency that has information which constitutes identifiable public records in electronic format, shall make the information available in the electronic format in which it holds the information, and that the agency shall provide a copy of the electronic records if the requested format is one that has been used by the agency to create copies for its own use, or for provision to other agencies. Further, the section states that the cost of duplication shall be limited to the direct cost of producing a copy of the records in the electronic format. The crux of Orange County’s argument is that its GIS-formatted database is exempted under §6254.9, the so-called “software exemption.”
Sierra Club, joined by 212 individual GIS professionals and 23 professional GIS organizations who co-signed one amicus brief among seven supportive amicus briefs, contend that “computer mapping systems” refers only to software, not to the data on which the software operates. Further, it has asserted that .pdf files are not equivalent to a GIS-compatible database, and that the public’s right to inspect and review the exact same data that Orange County uses to make its decisions would be curtailed by .pdf-only data.
Keep watching this blog for updates on this and other issues in the rapidly changing landscape of public domain spatial data. How do you think this case will turn out?
In The GIS Guide to Public Domain Data we devoted one chapter to a discussion of the Free versus Fee debate: Should spatial data be made available for free or should individuals, companies and government organisations charge for their data? In a recently published article Sell your data to save the economy and your future the author Jaron Lanier argues that a ‘monetised information economy‘, where information is a commodity that is traded to the advantage of both the information provider and the information collector, is best way forward.
Lanier argues that although the current movement for making data available for free has become well established, with many arguing that it has the potential for democratising the digital economy through access to open software, open spatial data, open education resources and the like, insisting that data is available for free will ultimately mean a small digital elite will thrive at the expense of the majority. Data, and the information products derived from them, are the new currency in the digital age and those who don’t have the opportunity to take advantage of this source of re-enumeration will lose out. Large IT companies with the best computing facilities, who collect and re-use our information, will be the winners with their ‘big data‘ crunching computers ‘... guarded like oilfields‘.
In one vision of an alternative information economy, people would be paid when data they made available via a network were accessed by someone else. Could selling the data that are collected by us, and about us, be a viable option and would it give us more control over how the data are used? Or is the open approach to data access and sharing the best way forward?