Integrated tools

For maximizing the opportunities of valorization and technology transfer, a strong focus has been put in the project on the design of use cases, and the integration of the different WPs (data collection, sharing infrastructure, knowledge discovery) into an online portal, the BridgeIris portal.

The portal provides access to the projects’ main products, which include

  • Three databases:
  1. Highlander for genomics data,
  2. CliniPhenome for clinical and phenotypic data,
  3. DIDA for digenic diseases
  • A series of tools:
  1.  DiGeST: Variant/gene ranking,
  2. GeVaCT: Variant/gene classifier,
  3. Protein prioritization tool,
  4. TuneSim: Pipeline comparison/simulator

All products are designed to work independently, and are integrated by means of seamless RESTful interfaces in the portal. This modular design allows to maximize the reusability of each component of the portal.


Product overview and links


BRiDGEIris portal:

The BRiDGEIris Portal is the gateway to access the databases and tools developed within the BRiDGEIris project. To access the portal, the user needs to be registered with the portal validation system. Post the credential validation, a user can access the following databases and tools: Highlander Client (Genomic Database), CliniPhenome (Clinical/Phenotypic Database), DIDA (Digenic Diseases Database), DiGeST (Gene/Variant Ranking Tool), GeVaCT (Variant Pathogenic Classification Tool) and Messaging utility. A few of these resources have their own credential management system, to which a user needs to register for using its features. BridgeIris Portal has been developed in PHP and is running on Apache server.


CliniPhenome is a web based application coupled to a database, developed primarily for collection of data for patient's being analyzed for Cardiac Arrhythmia Syndromes. Nevertheless, the database schema has been designed in a generalized form, keeping in consideration to be used for patient's for other disease too. To access its features, a user needs to be registered in CliniPhenome. This application has three kinds of user based on the "Label Based Access", and defined as per the Health Level 7 (HL7) Healthcare Privacy Guidelines. Its application interface provides an interactive platform for users to varied features depending on the user type, like, a privileged user can insert information along with clinical measures (obtained from varied clinical diagnostic processes) or can access the clinical history of a patient. In addition to, based on the resultant observations, a patient can be annotated against combination of phenotypic ontology - HPO, OMIM, Orphanet; and clinical ontology - SNOMED. Further, there are API's that integrates it to HIGHLANDER database and analytical tools like DiGeST. This application has been developed in PHP (CodeIgniter Web MVC framework), MySQL (version 5.5) and runs on a Apache server.


Highlander, a Java software coupled to a local database, in order to centralize all variant data and annotations from the lab, and to provide powerful filtering tools that are easily accessible to the biologist. Data can be generated by any NGS machine, (such as Illumina's HiSeq or MiSeq, or Life Technologies' Solid or Ion Torrent) and most variant callers (such as Broad Institute's GATK). Variant calls are annotated using DBNSFP (providing predictions from 6 different programs, splicing predictions, prioritization scores from CADD and VEST, and MAF from 1000G and ESP), ExACGoNL and SnpEff, subsequently imported into the database. The database is used to compute global statistics, allowing for the discrimination of variants based on their representation in the database. The Highlander GUI easily allows for complex queries to this database, using shortcuts for certain standard criteria, such as "sample-specific variants", "variants common to specific samples" or "combined-heterozygous genes". Users can browse through query results using sorting, masking and highlighting of information. Highlander also gives access to useful additional tools, including visualization of the alignment, an algorithm that checks all available alignments for allele-calls at specific positions, and a module to explore the ‘variant burden’ gene by gene.



DIDA (DIgenic diseases DAtabase):

DIDA is a novel database that provides for the first time detailed information on genes and associated genetic variants involved in digenic diseases, the simplest form of oligogenic inheritance.

DIDA is published in the Nucleic Acids Research Database issue 2016 and has been selected as a NAR 2016 Breaktrough paper. The manuscript can be accessed here.



CGS (Central Genomics System):

The CGS project stands for Centralized Genomics System, which allows to efficiently manage data from VCF files. It provides an API to upload VCF files to a Big Data ecosystem (Hadoop) which will use multiple technologies after to retrieve variants as fast as possible while allowing multi-criteria searches. The variants from the VCF files are annotated with multiple databases (gonl, dbnsfp, …) then stored into HBase and Impala into different data structures, and an archive file is stored in a Avro format. Note that the inner working of the Big Data ecosystem and the data structure of them is not visible by the different users.
The API follows the structure of Google Genomics API: while also providing an interface for raw SQL queries for the Highlander software.

DiGeST (Distributed Gene/variant Scoring Tool):

DiGeST is a gene/variant scoring tool, that takes as input two populations (case/control) of SNPs, and outputs a ranking of gene/variant according to their suspected pathogeneticity. It is composed of a front-end which allows a user to browse, filter and create groups of case and control SNPs, and a back-end that performs scoring and ranking in a distributed way, using the Map/Reduce programming model. Scalable storage and computing rely on Parquet file and Spark distributed computing framework, respectively.




GeVaCT (Genomic Variant Classifier Tool):

GEVACT is a Java based tool coalesced to a SQLite database, that can be used for pathogenic classification of genomic (single nucleotide and short insertion/deletion) variants. The underlying algorithm proposes two varying approaches: one to classify missense and another to classify nonsense/frameshift variants. The tool executes in two phases: pre-processing and classification. In the pre-processing phase, the annotated tab-delimited variant file (.vcf.ann) from the Alamut batch, is refined based on the gene list for the disease-of-interest followed by the filtering step, so as to reduce the number of variants to be classified. Thereafter, in the classification phase, the filtered variants are subjected to parametric cumulative scoring. Conclusively, based on the score each variant is classified into one of the five categories: Class I - Non-Pathogenic; Class II - VUS1 (unlikely pathogenic); Class III - VUS2 (unclear); Class IV - VUS3 (likely pathogenic); Class V - Pathogenic.


GitHub: (private)



TuneSim allows you to vizualize the results of Escaliere et al. (ref tba) in a user-friendly format (from Panel Simulations cov10 to Panel Illumina Platinum Filtered). TuneSim allows you to test your own pipeline through a very simple procedure (see step-by-step procedure on the Tutorial Panel).

You can evaluate your own pipeline on FastQs from :

  • simulated 100 bp paired-end reads with realistic 1KG and dbSNP haplotypic variants insertion and realistic base quality scores (through a meliorated version of dwgsim)
  • curated human data from the NA12878 individual (through GiAB or illumina Platinum projects)

The simulated data have different mean coverage allowing you to dynamically stress your pipeline and reveal its possible strengths and weaknesses on 4 different types of variants.