By "protein family" is generally understood a bunch of protein sequences sharing a same function. It has also be shocn that the protein structures inside a family share also a common fold. Such structural and fucntional characteristics are reflected in the conservation of residue when studying a multiple sequence alignment of the protein family. The traces of the function and the structure features is embedded in the conservation of residues common to all sequences, or stretch of residues defining motifs or patterns. Conserved residues may also specify sub-families of protein, or by changing some properties of the whole family, adding new features. It is then essential to develop tools able to identify such residues. As more and more sequences are available from very various origin, the variability observed in a given protein family increases, and simple correlation conserved residues means identical residues doesn't stand anymore.
Several prediction methods have been developped in order to identify conserved residues. Many of them are based on entropy calculation, similarity matrix comparison, evolutionnary tree trace, free energy based methods for example. In Ordalie, two types of methods are implemented, score based methods and a physico-chemical based one.
A special method, called Three Dimensional Cluster, associates three scores to each columns: the free energy score, the mean distance score, and the norm ratio score. The clustering is then made in a three-dimensioanl space, and has been proven to give the best results.
The Threshold method is based on the physico-chemical nature of amino acids appearing in a column. Basically, given a conservation threshold cut-off x, a column is considered to be conserved if x% of the residues, including gaps, are identical in the column. Three types of conservations are considered :
When the alignment has been shared into sub-families, all the above methods will try to identify conserved residues inside each sub-family. A background color will be associated to each sub-family. In a sub-family context, only the first cluster of highly conserved residue is deisplayed. For the Threshold method, residues inside a family that are 100% present, or 80% present if an other sub-family display also a conserved residue, will be shown.