5 Alignment Tools

Ordalie contains a collection of tools that can be called at any time, and that can stay alive whatever happens in the main window.

5.1 Snapshot Overview

The 'Snapshot Overview' is one of the available schematic representation of a snapshot. When launched, a window appears with the schematic alignment at the top and a control panel at the bottom. Any number of Overviews can be launched for a given snapshot.

Figure 7: The Overview tool window
Image overview

5.1.1 The snapshot frame

In this frame, the alignment is schematized by replacing every residues by a grey pixel over a white background. It is then possible to map any feature on top of this scheme. Clicking anywhere on the scheme will automatically centre the main snapshot window on the corresponding position. On the scheme, a stippled rectangle encompasses the region shown by the main window.

5.1.2 Control panel

Below the alignment frame, a control panel allows to interact with the scheme.

The combobox on the left is a feature selector. By default, it is set to 'automatic', meaning that all features drawn or removed in the main window will automatically be drawn or removed in the Overview window. Selecting any feature in the combobox will simply display it on the Overview.

The '+' and '-' buttons will zoom in and out the scheme.

The 'Print' button will output a PNG file of the schematic alignment in its current state. The user is prompted to give an output file name.

The 'Close' button will close the 'Overview' window.

5.2 Editor

A fruitful multiple sequence alignment exploitation ensuring a high quality of data usage by tools and feature mapping is directly dependent on the accuracy of the alignment. Although the research on algorithms dedicated to aligning sequences is still intensive and the outcoming softwares are more and more accurate, the need of manual MSA inspection, curation and editing is still necessary. This is the reason why Ordalie integrates a high performance sequence editor. It is written in C to ensure speed and fluidity, and is inspired by SeqLab [11], the editor GCG Wisconsin Package.

Entering the Editor tool will first clear the sequence from any displayed feature and colour the sequences according to physico-chemical properties. The default colouring scheme is :

Figure 8: Amino acids colouring scheme in the Editor
Image aa_colouring_scheme

(The default scheme can be changed inside the 'Edit -> Preferences' menu item).

Image attention
As the sequences are changing upon edition, some functionalities will not be available in the Editor.

5.2.1 Control panel

At the bottom of the Ordalie window, the Control panel will display the following buttons from left to right.

Figure 9: The control panel of the Editor tool.
Image editor_controlpanel Clear

Clears any current sequence selection. Group

This will group the selected sequences. The names of the grouped sequences will be coloured in a different colour, a unique colour for a given group. If one or more sequences already belong to a group, the user should decide through a dialog box, if the sequences should be merged with an existing group, or if a new group should be created.

Grouped sequences will behave as a single sequence. Ungroup

The selected sequences will be removed from any group. If a group consists of only one sequence, the group is automatically destroyed. Lock/Unlock

By default, only gaps ('.') can be deleted or inserted, it is not allowed to insert/delete amino acids. Unlocking the sequences allows the insertion/deletion of residues. Rem. Col. Gap.

Ordalie runs through all alignment columns and removes those containing only gaps. A 'Rem. Col. Gap.' is automatically done when leaving the 'Editor' mode. Temp. Save

This creates a TFA file copy of the current alignment under edition. The user is asked for a file name the first time, and successive 'Save' will use this file name to output the alignment. Save & Return

Leaves the 'Editor' with all the changes made. A dialog box will asked the user if the current alignment should be overwritten, or if a new alignment should be created with the current changes. Cancel

Leaves the 'Editor' and restore the original alignment.

5.2.2 Edition actions

Clicking inside the alignment frame will show a yellow blinking cursor. The following actions are then available :

In order to speed up editing, pressing keypad or keyboard digits [0-9] stores the number in a buffer, i.e. pressing '1' then '2' will store number 12 in the buffer. Pressing then the <Left> arrow will move the cursor 12 characters to the left, and will empty the buffer. All the actions previously described can take advantage of this mechanism.

5.3 Clusters

There are two main ways for making groups/clusters in Ordalie.

5.3.1 Manual Clustering

The first method consists in adding/deleting empty lines, called 'separators' hereafter, between sequence names. Separators can be added using the corresponding icon on the Icons bar, or through the 'Add separator' item of the 'Alignment' menu and deleted using the corresponding icon or item menu. The separator is added just below the selected sequence. Similarly, the separator just below the selected sequence will be removed if requested. Sequences enclosed by separators constitute then a new group. The 'Remove all separators' item of the 'Alignment' menu removes all groups in the current snapshot.

5.3.2 Clustering Tool

The 'Clustering' tool allows the creation of clusters (or groups) based on numerical criterions characterizing the sequences to be clustered. The computation can be done using all or part of the sequence as well as all or part of the snapshot columns. The user chooses one or more numerical criterions as the basis for the computation and a clustering algorithm. The computation can then be launched and the newly created sequence clusters are automatically displayed in the main window.

5.3.3 Criterions

At present, the available criterions are :

5.3.4 Algorithms

Ordalie clusters and automatically defines the number of groups. The clustering algorithms along with the algorithms that define the number of clusters are taken from the Cluspack package. The available methods are :

5.3.5 Control panel

Figure 10: The control panel of the Clustering tool.
Image clusters_controlpanel Selections

The left part of the Control panel deals with sequence and residue range selection.

If no sequence names are selected, the clustering will use ALL sequences. If some sequences are selected (more than 3), then the clustering will only apply to these selected sequences. The remaining ones will be kept as a separated group. Clustering criterions

The pull-down menu allows the selection of the criterias to be used for the computation. Several criterias can be selected at the same time.

Image attention
The 'Life domain' criterion clusters the sequences into Eukaryota, Archaea, Prokaryota and Other groups. This criterion cannot be associated with an other one. Clustering methods

The 'Method' pull-down menu allows to choose the algorithm to be used for clustering computation.

The ``Compute'' button will launch the computation. The newly computed sequence clusters are directly displayed in the main ordalie window. Other buttons

The 'Reset' button will erase any clustering done so far and show the original clustering if any. The 'No Clusters' buttons removes all groups and leaves all the sequences as a single group.

The ``Clusters Names'' button will pop up a window allowing to give a name to each cluster. this cluster name may be used in subsequent analysis to identify the clusters, like in the ``Tree'' display, or the ``Barcode'' tool.

The 'Save' button will leave the clustering tool and the current clustering will be saved. The user is prompted whether to overwrite the current snapshot or to create a new one. The 'Return' button leaves the Clustering tool and displays the snapshot in its original state.

5.4 The Tree tool

The 'Tree' tool can be divided in two part. The first part consists in the tree building, which is done through the main Ordalie window. Once the tree is computed, its exploitation will be done in a dedicated new window.

The tree is computed using the FastME program using default parameters. Ordalie computes first a distance matrix based on identity percentages calculated over the selected residue range. Although Bayesian based algorithms seem to produce more accurate trees, FastME is a good compromise between speed and accuracy.

5.4.1 Control panel for tree building

The tree can be computed on a subset of sequences and on a given residue range, for example, a region or a domain.

Figure 11: The Control panel of the Tree building tool
Image tree_controlpanel Selections

The left part of the Control panel deals with sequence and residue range selection. Options

The following buttons can be used to control the tree computation : Draw / Return

The 'Draw' button launches the computation, and draws the resulting tree in a separate and dedicated window. The 'Return' button leaves the Tree tool.

5.4.2 The Tree window

Each newly computed tree will appear in a new and dedicated window, that allows the exploration of the tree characteristics.
Ordalie is able to render two types of trees : dendrograms and radial trees. Some of the following options are specific to one or the other tree representation (see below).

Figure 12: The Tree rendering window displaying a radial tree. The circles at each nodes indicate whether the bootstrap value for the node is higher (green) or lower (red) than the defined threshold. The sequequences are colored according to their cluster.
Image TreeWindow

The upper part of the tree rendering window is the drawing area, and the bottom part the control panel area. The Drawing area

The drawing area displays the current tree. The tree can be moved in all directions by simply dragging the mouse with <Button-1> down. A radial tree (see below) can be scaled by using the mouse wheel, while using the mouse wheel on a dendrogram will scroll up and down the tree. Finally, a right click <Button-3> will make a contextual menu appear which allows changing the dimensions of the tree. If the tree is a dendrogram, then the branch length and the height separating branches can be changed. If the tree is a radial tree, the tree can be rotated. The Control panel

The control panel is divided in several parts. From left to right:

Adding information to the tree representation :

Tags and tree annotation :

Leaf labels :

Buttons :

5.5 The Conservation tool

Traces of the evolution pressure that maintains the structure and function of a protein family can be found while examining the residue conservation along the alignment. Both global and group conservations may help in deciphering functional sites like binding sites, interaction patches, or specialization coupled with intra-groups organization.

Ordalie offers several methods to compute conservation. Within this tool the user can try several methods to compute residue conservation. The results are temporarily kept until they are saved. A saved residue conservation computation becomes then a new feature attached to the current snapshot and can, as any other feature, be used in any tool allowing feature display.

5.5.1 Methods

Many methods exist to compute conservation, and they have been tested and compared extensiveley [12,4]. Ordalie implements some of them, as well as two home-made conservation methods. The 'Threshold' method

This method is essentially a counting method. Two thresholds allow to define different levels of conservation. At a global level, a 100% ('identity threshold') conserved residue column is assigned to 'identity conservation', a column being >= 80% conserved ('global threshold') is considered to be a 'conserved' column. Inside a group, only 'identity conservation' are considered. The thesholds can be changed through the 'Preferences' menu. The automatic methods.

In these methods, only columns containing more than 5 residues are considered, and the computation proceeds through two steps. First, all the columns are scored with the choosen method. In a second step the columns with their associated scores are clustered, and the clusters are ranked according to their mean conservation scores. The two clusters containing the highest scores are considered to contain the columns corresponding to 'strictly conserved' and 'globally conserved' residues. The same computation is repeated for each group, but only the cluster with the highest scores is taken.

The available automatic mathods are :

5.5.2 The Control panel

Figure 13: The control panel of the Conservation tool
Image conservation_controlpanel

From left to right in the Control panel :

5.6 Superposition tool

One of the strengths of Ordalie resides in its ability to link/map features to the 3D models (when available) of proteins. To exploit at best the feature mapping it is essential to proceed in the scope of the structural differences observed between proteins. To achieve that, Ordalie allows to superpose the strucure according to feature, and/or user defined residue range.

5.6.1 The superposition algorithm

A protein structure can be made of several chains, which may be identical or not. A chain is usually composed of an amino acid polymer and ligands (in Ordalie, water molecules are considered as ligands). It is important here to understand that, although superposition computation are done using the polymer sequences, the entities that are moved (superposed) in Ordalie are the entire chains.

Image attention
When applying a superposition to a chain, all residues of this chain (polymer AND ligands) are moved.

The chain superposition is done in three steps :

  1. Selection of the superposition zones. Depending of the structure, the zones may consist in a domain, or some selected seconddary structures for example.
  2. Selection of the chains that would be superposed.
  3. Selection, between the chains selected for superposition, of the reference chain. The reference chain will not move, all the other selected chains will be superposed onto it.

The detailed superposition algorithm is presented in Appendix 7.5.

5.6.2 Control panel

Figure 14: The Control panel of the Superposition tool.
Image superposition_controlpanel Selection

From left to right the superposition Control panel is made of :

Image attention
The 'All Helices' and 'All Strands' selections will take, for each secondary structure type position, the minimal common part of all existing secondary structures present at that position. Control

5.6.3 Example : dealing with a homodimer.

Suppose the loaded alignment concerns a protein known to be a homodimer (an $\alpha_2$ structure) under biological conditions, and for which several 3D structures of some proteins coming from different organism have been solved. By investigating PDB ID (say 1abc and 1def), it is also known that all structures are made of two chains, A and B.

When loading the alignment, Ordalie will recognize the two PDB ID through the sequence names PDB_1abc_A and PDB_1def_B and will then download from the PDB web site the two structures with atomic coordinates, and store them inside a dedicated database. Note that Ordalie knows the coordinates for all the atoms of ALL chains of the structure, not only A. Several cases may be encountered when performing a superposition : Only one chain present in the snapshot

The snapshot contains the sequence named PDB_1abc_A and PDB_1def_A. When superposing PDB_1def_A on PDB_1abc_A, only atoms of 1def chain A will change. Thus in the 3D Viewer, the whole structure of 1abc will be correct (its the non moving molecule), and 1def will have chain A on top of 1abc chain A, and 1def chain B somewhere in space. The symetry of the dimer is broken, as only chain A as moved. Moving a dimer.

Ordalie doesn't know anything about monomers, dimers, multimers in general. It is up to the user to provide the information, by giving Ordalie the sequences of the chains of interest.

Image ampoule2
To manipulate a multimer in Ordalie, all the sequences corresponding to the chains of the reference AND the sequences of the chains of the target structure should be present in the alignment.

If the alignment contains 'PDB\_1abc\_A', PDB_1abc_B, and PDB_1def_A, PDB_1def_B, it is then possible to superpose the two dimers. A first superposition step where only PDB_1abc_A and PDB_1def_A are selected will bring PDB_1def_A on top of PDB_1abc_A. A second superposition step where only PDB_1abc_B and PDB_1def_B are selected will bring PDB_1def_B on top of PDB_1abc_B.

5.7 3D Viewer

The 3D Viewer is one of the most useful tool in Ordalie. Although it does not offer all the features and functionalities that would a proper Molecular Visualization program like VMD or PyMol, it can be of great help in understanding protein features in the framework of protein structures.

5.7.1 Molecules and Objects

The Ordalie 3D Viewer is organized around the 'Molecule' and 'Object' notions. A 'Molecule' consists in all the chains (and consequently residues and atoms) that are present in a given PDB entry. An 'Object' belongs to a Molecule, and can be a composition of several elements (full chains, parts of chain, residues, ligands, etc ...) belonging to that molecule. Objects can be painted with several colours and can contain several kinds of representation. Feature mapping only applies to objects.

By default, Ordalie will create 3 objects per molecule :

At present, Ordalie does not handle hydrogen atoms.

5.7.2 Representation types

Ordalie is able to represent a structure in several ways.

5.7.3 The 3D Viewer window

The 3D Viewer window can be divided in 4 parts. The top of the window is used to display information about picked atoms. Below is the 'Quick Mapping' panel. Below this panel, from left to right are the 'Molecular Objects' frame, the main 3D window, and the 'Actions' panel at the right. The 3D window can be maximized by hittin the <F1> on the keyboard, and hitting <F1> again gives the window its original geometry. All panels may be switched on or off by hitting the <F2> key. Quick Mapping

The four comboboxes of this panel allow to make a quick mapping of features on a given molecular object. The left outmost combobox selects the molecule, then the object onto which the feature will be mapped. There are then two features selectors. It is possible to map two features on a same object by selecting one feature with 'Feature 1' combobox, and a second feature with the 'Feature 2' combobox. The features are drawn in order, feature 1 before feature 2. Care should be taken when selecting Feature 1 and Feature 2 as Feature 2 can completly cover Feature 1. For example, if Feature 1 is set to conservation, which implies residues colouring, and Feature 2 is set to PFAM-A , a lot of conservation won't be seen as a PFAM domain extends to a large range of residues. In this case, Feature 1 should be set to PFAM, and Feature 2 to conservation. Molecular Objects frame

Below the 'All On' and 'All Off' buttons that switch on and off all objects that have been defined in all Molecules is the list of all 3D molecules present in the alignment. Aside the molecule name is the 'New' button that allows the definition of new objects for that molecule (see section 'Object Editor' 5.7.4 below). Clicking on a molecule name will open/close the list of the objects defined for that molecule. An object coloured in green is switched on and is displayed on the screen, a red object name means the object is switch off. Each object name is followed by the 'Edit' and 'Del' buttons, used to redefine and delete the object respectively. The 3D window

This window contains the 3D objects themselves. The objects can be manipulated by the mouse through an arcball system, that is a virtual trackball. All the objects of the scene are enclosed in a sphere, and the objects are moved by dragging the sphere up and down and left to right, the mouse mimicking the hand that would roll the sphere. The mouse wheel is used to zoom in and out the scene. A right drag with <Button-3> will translate the scene in the x-y plane. A <Control-B1> click will show the label of the atom being below the mouse pointer. The Actions panel

Although the Ordalie 3D Viewer tool is not intended to be a complete Molecular graphics program, it still offers some functionalities which are, from top to bottom :

5.7.4 The Object Editor

An object is an ensemble of residues and/or ligands belonging to one or several chains, and displayed in given styles with given colours. The Object Editor can be invoked to create a new object (the 'New' button) or to edit an existing object (the 'Edit' button). Making / Editing an object

In case of a new object creation, the new object name should be entered in the top entry box. Two objects can not have the same name.

The object edition can then be done in a five step process :

  1. select the chain of interest and the type of residues in the chain : polymer residues (amino acids or nucleotides) or the ligands,
  2. select a representation type,
  3. select residues onto which to apply the selected style,
  4. select a color,
  5. select residues onto which to apply the selected colour.
This process is iterated until all pieces of the object are setup.

Finally, it is possible to add the molecular surface surrounding the object atoms. Residue selection

When applying a color or a representation style, the user should specify the residues it should apply to. There are three ways to do so :

The object is then finished by clicking on the 'OK' button. The new object will be added to the object list on the corresponding molecule.

5.8 The Features Summary

This representation can render several selected sequences and features on the same page. The sequences are not schematized as in the Snapshot Overview representation, but are shown as they appear in the alignment window.

Figure 15: The Features Summary window, with the Drawing Area at the top and the Control panel at the bottom
Image featsummary_window

5.8.1 Drawing Area

The top of the 'Feature Summary' window is made of a listbox containing the sequence names on the left, and a drawing area on the right. Clicking on a name selects or unselects the corresponding sequence. Multiple sequences can be selected by holding the <Control> key down while clicking on their names with the mouse.

In the drawing area, for each sequence the sequence ID is written on the left, followed by the position of the first residue in the current sequence line, the amino acid sequence itself as present in the alignment, and the position of the last residue in the line. When a feature is selected, each sequence line is followed by a feature line, with the feature name beneath the sequence ID, and rectangles below sequence positions where the feature is present.

The Feature Summary can be moved around by dragging the mouse while holding <Button-1>.

5.8.2 Control panel

Below the Drawing Area is the Control panel. On the left is a spinbox that selects the type of name the sequence should be referenced with, i.e. its sequence name, its accession number or its bank ID, when available. This choice applies in both the listbox and the drawing area. Follows the font size selector, and then the 'Features' selector. Any number of features can be selected by checking the button corresponding to the desired feature. The 'Notes as Balloon' checkbutton renders or not the note attached to each feature as a flying balloon when the mouse pointer is over the feature. The 'Print' button will ask for a file name that will contain a PNG image of the current drawing area, and the window will disappear by clicking the 'Close' window.

5.9 Barcode alignment

The Barcode alignment tool is an other schematic representation of the alignment.

moumou 2019-03-25