Biology’s Data Barrier : Europe’s à la carte solution

Confidentiality, sense of proprietorship and exclusivity has long been a bottleneck in research dissemination. Gladly, recent years’ open science movements have managed to make some impact on that front.

Thanks to various open science initiatives (i.e. ISCB Policy statement) an increasing number of funding bodies are now enforcing a requirement to publish open access articles and accessible data. New publishers and research consortia are emerging with offers to support the hosting of large data sets, which otherwise would be a burden on author’s part. Despite these motivations or otherwise pressures, data liberty is not yet a common practice.

data-sharing
Image: Data Sharing Concept (Image is based on Puzzly Sharing artwork from Wikimedia Commons. Distributed under CC-BY-SA)

One initiative led by the KU University of Leuven is hosting an “a la carte” solution in addressing this issue. In a highlight track presentation in the recently concluded ISMB/ECCB 2015, KU Leuven’s researcher Amin Ardeshirdavan presented a scheme called “NGS-Logistics” where researchers can freely share the results to start with, without compromising the confidentiality of the data.

He identifies that, many researchers can be reluctant to publish or make their data accessible while others remain in complete darkness about mere existence of the data. (see video talk)

If we can’t share the data, can we at least share the results? (a slide from NGS Logistics talk at the ISMB/ECCB 2015)

The framework was published last year and can be accessed from this link.

NGS-Logistics components(Source: Ardeshirdavani et al. doi:10.1186/s13073–014–0071–9)

In the above schema (borrowed from original paper), we can see that the core data does not need to travel since a query manager takes care of the task for querying and result fetching. I think this can reduce the load of data administration on the end-users(researcher) part.

Currently, often when we receive data from other labs or collaborators, varying format could enforce us to apply several methods to parse the data in a useful manner. This is primarily because, first we need to go through a (often) painstaking process of understanding the datasets before we can use it. In most cases this is a repetitive process as somebody at another lab may have done the same job. NGS-Logistics framework surely promises to cut this unnecessary overhead (in terms of processing-hours). In this framework, we only request information or results which we need and the data owner can contribute to our research by sharing the information.

For example, suppose, I would like to know about the relationship between a particular SNP (Single Nucleotide Polymorphism) and it’s resultant phenotype. Also, let’s assume that to know true relationship, we need a large sample size (i.e. 1 million people). For a single lab such large datasets may not be affordable. However, if a number of collaborating labs agree to do mining of their local databases and just share that specific information, this can remove the bottleneck obstructing the research outcome.

The schema has already been implemented through a collaboration of several universities from Belgium. Anyone with a valid institutional email ID can request an account. Web link is: https://ngsl.esat.kuleuven.be/

I have registered for an account and it was promptly activated and from what I have seen, the system appears to be a neat one. I understand that, current collation of the datasets on their system may or may not be useful for all of us. Nevertheless, the concept can assist a multitude of national and regional (if not Global), collaborations on various front. I for one, am very impressed by the concept.

I would highly recommend researchers working with Next Generation Sequence based data, to have a look at the system.

Leave a Reply

Your email address will not be published. Required fields are marked *