LEON - B I S
A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertainty in the alignment can lead to significant bias in the subsequent inferences.
LEON-BIS is a new method that uses a robust Bayesian framework to estimate the homologous relations between sequences in a protein multiple alignment. Sequences are clustered into sub-families and relations are predicted at different levels, including conserved 'core blocks', 'regions' and the full-length proteins. The accuracy and reliability of the predictions has been demonstrated in large-scale comparisons using well annotated alignment databases, where the homologous sequence sections were detected with very high sensitivity and specificity.
LEON-BIS can be used to distinguish sections in multiple sequence alignments that are conserved across the whole family or within subfamilies, and should be useful for automatic, high-throughput genome annotations, 2D/3D structure predictions, protein-protein interaction predictions etc.
Download an archive of the test set of protein sequences.
Download an archive of the source code.