Visual Analytics and Guided Search Ampowerd by Semantic Data Ingestion (SWAT4LS 2018)

Pattyn, Filip; Constandt, Hans

doi:10.6084/m9.figshare.7411277.v1

SWAT4HCLS_2018___visual_analytics_and_data_ingestion.pdf (801.82 kB)

Visual Analytics and Guided Search Ampowerd by Semantic Data Ingestion (SWAT4LS 2018)

poster

posted on 2018-12-04, 08:32 authored by Filip PattynFilip Pattyn, Hans Constandt

Semantic web technologies are gaining renewed interest since data indexing, layered on top of a traditional semantic triplestore, has been adopted. This greatly improved the speed of semantic applications and opened new opportunities.

DISQOVER (http://www.disqover.com) is a web-based semantic search, exploration and analysis platform for linked data sources. The platform allows to ingest and harmonize a wide spectrum of public, private and third-party data which are glued together via an overarching DISQOVER conﬁguration ontology. The system supports data federation between diﬀerent DISQOVER installations and is capable to prepare and create visual analytics dashboards directly based on the data.

We will zoom in on binning and quantifying semantiﬁed data resulting in lightning fast visual analytics. Slicing and dicing a multitude of data sets in an easy way became simple. The latest 5.0 release extends traditional text search with alternatives like chemical or protein structure search.

In this demonstration we introduce another development focusing on lowering the threshold for semantically integrating, enriching and linking new data sources.

ONTOFORCE ﬁled a patent on a new data ingestion engine as this is a novel concept in the ﬁeld and allows data conversion processes to not only be managed in a visually attractive web-based application but eliminates the need to write and maintain data conversion scripts.

This new semantic data ingestion framework comes with a huge gain in speed, stability and scalability. It consists of data processing components that perform atomic data manipulations traditionally done via scripting or as an inferencing step via a SPARQL insert statement in a triple-

store. Combining these components allows to easily create and maintain data conversion pipelines. This engine is especially designed to process complex many-to-many relationships more eﬃciently compared to traditional ETL (Extraction, Transformation and Loading) pipeline tools

that process line per line. Every process step can be fully inspected and allows monitoring of upstream or downstream processing impact. This level of transparency contributes to eﬃciently handle data provenance and full data processing quality assessment.

The plugin and pipeline architecture strategically ﬁt the need to enhance the ability of life science companies to harness their data assets better and more easily and to create a structured data foundation ready for artiﬁcial intelligence (AI) applications. A typical life science use case will be presented integrating a set of data and metadata to create a

data catalog of research data with the capacity to be combined with a downstream AI application.