Published 15 February 2018
Scientific projects spend billions of dollars in research each year and yet most of the resulting data is not easily re-usable, findable or share-able. To help remedy this situation, NIH published a set of FAIR principles for research discoveries moving forward: Findability, Accessibility, Interoperability, and Reusability. This of course requires new systems and tools for adhering to these principles.
Leading the charge, the Informatics Systems Research Division (ISRD) of the Information Sciences Institute of the USC Viterbi School of Engineering, helmed by Dr. Carl Kesselman, has been developing a pioneering, comprehensive management system for managing scientific informatics data called DERIVA that focuses on organizing data to conform with FAIR principles. (learn more about DERIVA here: isrd.isi.edu/deriva/)
One of DERIVA’s newest features is “Data Collections”. ISRD has been building new DERIVA collections in collaboration with USC Stem Cell at the Keck School of Medicine of USC (directed by Dr. Andy McMahon) to enhance the findings from three new groundbreaking papers on embryonic development of the human kidney published in the Journal of the American Society of Nephrology (JASN) in March:
A kidney organoid on Day 16 of differentiation. The staining depicts the segmentation of nephron-like structures in the organoid at an early developmental stage. Image courtesy of Tracy Tran/Andy McMahon Lab, USC Stem Cell
Each paper will cite a Document Object Identifier (DOI) - a permanent link that takes the reader directly to a page on the GUDMAP repository that refers and links to all of the supporting and referenced data in its original, high-resolution state. In one click, readers have access to everything they need to explore the results and reproduce the experiments themselves.
Citations are based on the Nature scientific data citation format. An example of a citation for a DERIVA collection would be:
McMahon, A. GUDMAP Consortium. https://doi.org/10.25548/BURB-6P44 (2018)
In a nutshell, Collections are a customized grouping of data elements within a repository (or database) that are organized for a scientific purpose. For each DERIVA deployment, different slots are designated for types of data that may be gathered, much like a customized organizer for a drawer that can be configured for the items you want included.
The most immediate use cases have been for scientific publications. For these new kidney papers, instead of searching individually for each figure and data element, scientists find all of the data referenced, in its full quality, ready to view, download, cite and share.
Another important use case is informal collaborations - in this case, there may or may not be a DOI associated with a collection. However, a member of GUDMAP can create a collection of data for easy sharing and reproducing with other collaborators.
Thanks to DERIVA collections, scientists can now spend less time hunting for the elements they need from work that’s already been done and more time making new scientific discoveries.