A Semantic Data Integration Methodology for Translational Neurodegenerative Disease Research
journal contributionposted on 05.12.2018 by Sumit Madan, Maksims Fiosins, Stefan Bonn, Juliane Fluck
Any type of content formally published in an academic journal, usually following a peer-review process.
The advancement of omics technologies and execution of large-scale clinical studies have led to the production of heterogeneous and big patient datasets. Researchers at DZNE (German Center for Neurodegeneration Diseases) and Fraunhofer SCAI (Fraunhofer Institute for Algorithms and Scientific Computing), located at several sites, are focusing on generation, integration, and analysis of such data, especially related to the field of neurodegenerative diseases. In order to extract meaningful and valuable biological insights, they analyze such datasets separately and, more importantly, in a combined manner. Blending of such datasets, which are often located at different sites and lack semantical traits, requires the development of novel data integration methodologies. We use the concept of federated semantic data layers to disseminate and create a unified view of different types of datasets. In addition to the semantically-enriched data in such data layers, we propose another level of condensed information providing only derived results that is integrated in a central integration platform. Furthermore, the implementation of a semantic lookup platform encloses all semantic concepts needed for the data integration. This eases the creation of connections, hence, improves interoperability between multiple datasets. Further integration of biological relevant relationships between several entity classes such as genes, SNPs, drugs, or miRNAs from public databases leverages the use of existing knowledge. In this paper, we describe the semantic-aware service-oriented infrastructure including the semantic data layers, the semantic lookup platform, and the integration platform and, additionally, give examples how data can be queried and visualized. The proposed architecture makes it easier to support such an infrastructure or adapt it to new use cases. Furthermore, the semantic data layers containing derived results can be used for data publication.