A Provenance Assisted Roadmap for Life Sciences Linked Open Data Cloud

Ali Hasnain, Qaiser Mehmood, Syeda Sana E Zainab, Stefan Decker

6th International Conference, KESW 2015
Conference Paper
Multiple datasets that add high value to biomedical research have been exposed on the web as a part of the Life Sciences Linked Open Data (LSLOD) Cloud. The ability to easily navigate through these datasets is crucial for personalized medicine and the improvement of drug discovery process. Different initiatives have been proposed for navigating through these datasets with or without vocabulary reuse. The significance of provenance information regarding life sciences data is great as compared to any other domain. In the previous work, we proposed an approach for the creation of an active Linked Life Sciences Data Roadmap, that catalogue and link concepts and properties from 137 public SPARQL endpoints. In this work we extend the Roadmap with the provenance information collected directly by querying datasets. This extended Roadmap is used to dynamically assemble queries for retrieving data along with the provenance from multiple SPARQL endpoints simultaneously. We also demonstrate its use in conjunction with other tools for selective SPARQL querying, semantic annotation of experimental datasets and the visualization of the LSLOD cloud. We have evaluated the performance of our approach in terms of the time taken and entity capture.