recherche NCBI : multiple sequence alignment editor


trois points :
- multi-scale analysis : i) taxa -- cluster - all sequences, ii) large regions - motifs - residue, iii) alignment - structure - phylogeny environment.
- alignment manamgement porvides recording of alignments snapshots 
- ease of use, multi-plateform -> benefits to all researchers even students 


ABSTRACT
OBJECTIVE:
Multiple sequences alignments play a key role in modern bioinformatics as becoming cornerstones of several types of investigations like protein family analysis, evolutionary inferred studies or comparative genomics. Depending on the study, the alignment is used to provide or host information. These information range from domain, motifs, protein secondary structure, hydrophobicity, phylogenetic tree, etc ... Numerous standalone programs can access these information and generate disconnected output files.

FINDINGS :
We present here ORDALIE, an integrated workbench for manipulating and exploring the informational content of a multiple sequence alignment. ORDALIE is arranged around an internal SQLite database \cite{sqlite} that allows storage and retreival of information. User interactions with ORDALIE creates mdified alignment (different clustering, different ways of aligning sequences) that can be stored and retreived at any time. Analysis tools such as tree builder, 3D model viewer, feature mapping, sequence clustering, sequence conservation computation, are also provided to decipher the informational content of the alignment and their associated data are also stored into the database.


CONCLUSION:
Although many software already exist to handle multiple sequence alignment or extract information out of them, these programs remain separated entities and do not give an access to the global information sphere attached to a given alignment. By embedding a database system with alignment manipulation and exploration tools, the ORDALIE platform gives a persistence of data decephered from alignment analysis. It also gives interconnecting tools allowing a broad variety of information mining for alignment exploitation.


lafin

ORDALIE takes advantage of its underlaying database to store any alignment snapshot. The "snapshot" table contains the name of the snapshot, its description. 

An alignment consists in a description, a set of sequences, a set of features associated with it, one or several clusterings, one or several conservation scores calculations. When an alignment is loaded the first time, a hard copy of it is inserted in the database. This copy can not be changed in any ways and represent the reference alignment. A second copy is also created as a working copy. The user can create as many copies as desired that may corresponds to different analysis. The "Alignment" combobox on the main window allows to switch between registered alignment.


ORDALIE's core consists in a SQLite database. The first time ORDALIE is run with a given alignment, a read-only copy of this alignment and all associated features if any is stored, and a working copy is generated. 


###########################################


ABSTRACT
BACKGROUND :
Multiple sequences alignments play a key role in modern bioinformatics as becoming cornerstones of several types of investigations like protein family analysis, evolutionary inferred studies or comparative genomics. As a result, their analysis escapes from bioinformaticians to enter in every wet labs. Although many tools exist for visualising, editing, building phylogenetic trees, sequence clustering, these tools remain in general disconnected from each other and hard to handle for researchers or students outside the bioinformatics world.

FINDINGS :
We present here ORDALIE, an integrated workbench for manipulating and exploring the informational content of a multiple sequence alignment. ORDALIE is meant to provide tools in a user-friendly environment in order to 
All information is kept inside a SQL database that 

CONCLUSIONs :
ORDALIE has been tailored for non-bioinformatician users and will benefit to researchers or students willing to decipher the information content of their protein family. Several investigation landscapes are provided, ranging from structural to phylogenetic tree contexts. 


KEYWORDS :


FINDINGS :
BACKGROUND :
Since some decades now, Multiple Sequence Alignment (MSA) plays a crucial role in many aspects of modern bioinformatics studies like protein family annotation, evolutionary analysis, comparative genomics or orthology studies. Indeed, as a MSA is mode of homologous genes it intrinsically contains sequence - structure - function - evolution relationships that should be decipher by the biologist. We can enlight some points that arise to achieve MSA information exploration.
Firstly, although algorithms get more and more accurate at building MSAs, a manual curation is still required to ensure a maximal quality. The higher the MSA quality, the best information would be retrieved from it.

Secondly, mining information inside a MSA can be a multi-scale and multi-context search. Along the sequence level important information ranges from domain presence or absence, as for comparative genomics, up to the residue level when investigating point mutation impacts for example. At the taxa level, the search may concern all taxas, or groups of taxa depending on the study type. For example, sequences can be grouped according to their phylogeny (eukaryots, bacteria, ...) or by their physicochemical nature (thermophiles, psychrophiles, mesophiles), or any ways defined by the investigator. Finally, the MSA information exploitation can be done in the context of the alignment, or in the context of the structure of the sequences when known, in order to find sequence - structure - function relationships like spatial functional patches.

Thirdly, MSA exploitation nowadays escapes from the bioinformaticians world to become part of the bench for a broad audience. Indeed, MSA are used in secondary school to introduce phylogeny, at the University, as well as by researchers to get an overall view of a protein under study. 

FINDINGS :
We present here ORDALIE (ORDered ALignment Information Explorer), an integrated platform to manageMSA, extract and add information. ORDALIE tries to fullfill the three points enlighted above. A description of the alignment management, edition, and tools availables is provided below


Mining MSA can reveal for example conserved patterns or motifs, presence or absence of domains or regions that can be implied in molecular recognition or function. Mapping information retrieved at the sequence level upon 3D structure may also give insights about recognition or functional mechanisms of the protein.


Numerous tools already exists to visualize, edit, and infer information from a MSA, to compute a phylogenetic tree, to map MSA features onto a 3D structure, to calculate residues conservation or to cluster sequences inside a MSA. 

Such tools were prior developped by bioinformaticians and usually achieve an accurate task. Since the -omic era, bioinformatics spread into biology labs and such tools became part of the biologist bench.


MANAGING MSA


 The main philosophy of ORDALIE relies in three points : i) giving a user-friendly access to a bioinformatic toolbox inside the context of a given MSA. The software is arranged around "modes" each one dedicated to a special task. The information deduced from the MSA or imported inside the software can be accessed in many mode to help understanding the protein family under study.


There are nowadays more than 40 MSA viewers/editors (https://en.wikipedia.org/wiki/List_of_alignment_visualization_software) harboring more or less features and capacities.


The Modes :
Following is a brief description of some of the most important modes available in ORDALIE.

Alignment editor:
Clustering mode:
Conservation analysis mode:
Tree mode:
Structure mode:
Other tools:

ORDALIE focuses on the ease of use and the interconnection of the available tool.

ORDALIE is a desktop application written in Tcl/Tk and C, available for Linux, Windows and MacOS operating systems. Installers can be downloaded at http://www.lbgi.fr/ordalie. It does not require a web connection to run although accession to the internet is compulsory for some functionalities.

Editing facilities :
The "Editor" mode is an extended emulation of the famous SeqLab editor that was part of the GCG Wisconsin package. 

Tree mode : 
ORDALIE allows building of phylogenetic tree based on all or part of the aligned sequences, and all or part of the alignment columns. ORDALIE computes a distance matrix using the selected set of sequences and columns. The phylogenetic tree is infered from this distance matrix using the FastME program. Although likelihood-based tree are more reliable, the speed and reliability of FastME is enough to have a first insight into the protein phylogeny. The robustness of the tree nodes can be assessed through bootstrap scoring. 
The computed tree is then displayed in a separate window. The tree can be viewed as a dendrogram or as a radial tree. The user can re-root the tree, swap branches, display bootstrap values, show nodes abave a bootstrap threshold, change branc labelling, print the tree and more.

Clustering mode :
The analysis of the differences between sequences inside a MSA is usually an important souce of information. Some sequence clustering may be obvious, as partitioning sequences according to their catalytic activities if several ones exists, or partitioning according to the life domain to which the sequences blongs to. ORDALIE provides a clustering mode allowing clustering sequences on all or part of the columns, with several criterions and 4 clustering algorithms. The criterions are identity percentage, isoelectric point, sequence length, hydrophobicity and aminoacid composition. These criterions can be associated. The clustering algorithms are the ones provided by the Cluspack package, i.e. hierarchical clustering using secator, kmieans clustering with DPC (Dendity Point Clustering), and mixture model clustering with AIC or BIC criteria for group definition. The special "Life Domain" criterion clusters sequences into Eukaryota, Archaea, Bacteria, viruses or unknown. The clustering can then be saved, and retrieved later.

Other tools:
The Overview tool represents the current alignment as a pixel map onto which features can be drawn. This gives a schematic representation of the features distribution along the alignment
The Search tool allows to find motifs inside the alignment. The motif follows the FindPattern syntax and may be degenerated.
The Fetch Information tool will request the UniProt and Refseq databases using the sequence IDs to retrieve relevant information, like organism, description, lineage, etc...


1)McLachlan, G.J.; Peel, D. (2000). Finite Mixture Models. Wiley. ISBN 0-471-00626-2.

2)Secator : A Program for Inferring Protein Subfamilies from Phylogenetic Trees. N. Wicker, G.R. Perrin, J.C. Thierry and O. Poch. Molecular Biology Evolution 18(8): 1435-1441, 2001.

3)Akaike, H. (1974), "A new look at the statistical model identification", IEEE Transactions on Automatic Control, 19 (6): 716723, MR 0423716, doi:10.1109/TAC.1974.1100705.
4)Gideon E. Schwarz, « Estimating the dimension of a model », Annals of Statistics, vol. 6, no 2,‎ 1978, p. 461-464