Home > Public Domain Data > Coupling data with scholarly research

Coupling data with scholarly research

How many times have you read an article or report and wish you had access to the data that the author used? You’re not alone: A growing concern is being voiced in the research community that data are typically not included with scholarly papers. Why should all this matter? Hasn’t this been the way publishing has always been? Well, in today’s world where serious issues are growing, as recent global health and natural hazards challenges have made starkly clear, the provision of data could provide an immense leap forward for researchers and developers to provide solutions to solving these pressing issues. As we have written about for nearly a decade in this blog, despite crowdsourcing, the Internet of Things, sensors on, above, and below ground, and GIS, statistical, field, and other tools, the task of gathering data, not just spatial data, but any data, still is often a very time consuming endeavor. Research results are important, but should every researcher have to start completely over and gather his or her own data? What if I wanted to take someone’s data and add to it, or use it in another way, in another region, with different models?

Chapter 2 in Open Science by Design: Realizing a Vision for 21st Century Research from the National Academies of Sciences lists several benefits of including data in research studies. The chapter frames these benefits as part of the emerging “open science movement”. The benefits listed by the authors include rigor and reliability, ability to address new questions, faster and more inclusive dissemination of knowledge, broader participation in research, effective use of resources, improved performance of research tasks, and open publication for public benefit. Yet the chapter also recognizes real challenges, including costs and infrastructure, the structure of scholarly communications oftentimes only being available via subscription, lack of supportive culture, incentives, and training, disciplinary differences, and privacy, security, and proprietary barriers to sharing.

The chapter also states that “the ability to automate the process of searching and analyzing linked articles and data can reveal patterns that would escape human perception, making the process of generating and testing hypotheses faster and more efficient. These tools and services will have maximum impact when used within an open science ecosystem that spans institutional, national, and disciplinary boundaries.” Indeed.

An article in Nature also describes the benefits of providing data, beginning with career benefits but touching on societal benefits. Other articles are appearing. And the National Academies chapter above gives practical recommendations as to how to bring about ‘open science.’ The European Union projects FOSTER Plus and OpenAIRE provide training on open science and open data.

But where are the practical examples? I see the following “papers with code” library as one practical response to these concerns, and I salute these efforts:

https://www.paperswithcode.com/datasets

The above site can be filtered, and does include spatial data that I uncovered during my testing, on water quality and other variables. I hope it is the beginning of many such efforts.

–Joseph Kerski

Categories: Public Domain Data Tags: ,
  1. Mark Parsons
    April 26, 2021 at 10:11 pm

    You may find this paper interesting
    Parsons, M. A., Duerr, R. E., & Jones, M. B. (2019). The History and Future of Data Citation in Practice. Data Science Journal, 18. https://doi.org/10.5334/dsj-2019-052

    • josephkerski
      April 26, 2021 at 10:16 pm

      Thank you Mark! Your paper indeed adds much breadth and depth to this discussion. Congratulations on it and keep in touch. I hope this data blog provides some ongoing discussion for the community you are impacting.

  2. June 16, 2021 at 4:06 pm

    https://www.tandfonline.com/doi/abs/10.1080/24694452.2020.1806027 is an article – Motivations and Methods for Replication in Geography: Working with Data Streams
    – that exemplifies this approach.

  1. May 25, 2021 at 8:43 am

Leave a comment