BioKB - Text Mining and Semantic Technologies for Biomedical Content Discovery

The ever-increasing number of publicly available biomedical articles calls for automatic information extraction from digitized publi- cations. We have implemented a pipeline which, by exploiting text min- ing and semantic technologies, helps researchers easily access semantic content of thousands of abstracts and full text articles from PubMed and Elsevier. The text mining component analyzes the articles content and extracts relations between a wide variety of concepts, extending the scope from proteins, chemicals and pathologies to biological processes and molecular functions. Moreover, the relations are extracted along with the context which specifies localization of the detected events, precon- ditions, temporal and logic order, mutual dependency and/or exclusion. Extracted knowledge is stored in a knowledge base publicly available for both, human and machine access, via web interface and SPARQL end- point. To address the data accessibility, reusability and interoperability, all the extracted relations are standardized using unique resource iden- tifiers (URIs) and a custom ontology based on Genia ontology.