While data are becoming increasingly easy to find and access on the Web, significant effort and skill is still required to process the amount and diversity of data into convenient formats that are friendly to the user. Moreover, these efforts are often duplicated and are hard to reuse. Here, we describe Data2Services, a new framework to semi-automatically process heterogeneous data into target data formats, databases and services. Data2Services uses Docker to faithfully execute data transformation pipelines. These pipelines automatically convert target data into a semantic knowledge graph that can be further refined to conform to a particular data standard. The data can be loaded in a number of databases and are made accessible through native and autogenerated APIs. We describe the architecture and a prototype implementation for data in the life sciences.
Funding
NCATS Biomedical Data Translator
National Center for Advancing Translational Sciences