vec2sparql.pdf

Recent developments in machine learning have led to a rise of large
number of methods for extracting features from structured data. The
features are represented as vectors and may encode for some semantic
aspects of data. They can be used in a machine learning models for
different tasks or to compute similarities between the entities of the
data.
SPARQL is a query language for structured data originally developed
for querying Resource Description Framework (RDF) data. It has been in
use for over a decade as a standardized NoSQL query language. Many
different tools have been developed to enable data sharing with
SPARQL. For example, SPARQL endpoints make your data interoperable
and available to the world. SPARQL queries can be executed across
multiple endpoints.
We have developed a Vec2SPARQL, which is a general framework for
integrating structured data and their vector space representations.
Vec2SPARQL allows jointly querying vector functions such as computing
similarities (cosine, correlations) or classifications with machine
learning models within a single SPARQL query. We demonstrate
applications of our approach for biomedical and clinical use cases.