SWAT4HCLS_provenance.pdf (258.83 kB)
Download fileA model for capturing provenance of assertions about chemical substances
dataset
posted on 2018-12-04, 10:26 authored by Michel DumontierMichel Dumontier, Amrapali ZaveriAmrapali Zaveri, 0000-0001-5666-1658 Moodley, 0000-0002-2629-6124 WuChemical substance resources on the Web are often made accessible to researchers through public APIs (Application Programming Interfaces). A significant problem of missing provenance information arises when extracting and integrating data in such APIs. Even when provenance is stated, it is usually not done with any prescribed templates or terminology. This creates a burden on data producers and makes it challenging for API developers to automatically extract and analyse this information. Downstream, these consequences hinder efforts to automatically determine the veracity and quality of extracted data, critical for proving the integrity of associated research findings. In this paper, we propose a model for capturing provenance of assertions about chemical substances by systematically analyzing three sources: (i) Nanopublications, (ii) Wikidata and (iii) selected Minimal Information Standards (MISTS) for reporting biomedical studies\footnote{Reported in FAIRsharing.org \url{https://fairsharing.org}}. We analyse provenance terms used in these sources along with their frequency of use and synthesize our findings into a preliminary model for capturing provenance.
Funding
NCATS Biomedical Data Translator
National Center for Advancing Translational Sciences
Find out more...