Subsections

7 Appendix


7.1 The command line options

Option Values Description
-convert <tfa|msf|xml|ord> Converts the alignment into the format indicated by -convert. The converted output file name will have the form <alignemnt file>.<format>
-precompute <0|1> :precompute clustering and conservation for each PFAM domain
-threshold <x> Set conservation threshold level. <x> should be set between 51 and 100
-batch <0|1> Run Ordalie without windows and exit when finished


7.2 The Ordalie database scheme

The core of Ordalie is build around a in-memory SQLite database [3] which scheme is given in figure XXX. Ordalie takes advantage of this underlaying database to store snapshots of alignments and their associated features. The "ordalie" table contains settings parameters saved at exit allowing the user to find the same state when launching Ordalie again. The "seqinfo" table contains sequence information that are not linked to aminoacids positions (length, molecular weight, isoelectric point, ...) The "seqfeat" table is used to store features data mapped onto the residue sequence. Upon loading of a new alignment file, Ordalie creates a first snapshot as being a read-only copy of this alignment stored as the "original alignment" in the snapshot table. This table contains all snapshots created so far along with their name and description. The "seqali" table records the amino acid sequences as they appear in the snapshots. A link table "ln_snapshot_seqali" binds a given set of sequences to a given snapshot. Accordingly, the "featali" table stores features attached to aligned sequences in a given snapshot. A link table "ln_seqali_featali" couple this two tables. The "clustering" and "cluster" tables define a given clustering attached to a snapshot with its name, the method and residue eones used to compute it, and the resulting clusters with their names respectively. The set of sequences defining a given cluster is available through the "ln_seqali_clluster" link table. The "colmeasure" and "colscore" tables correspond to conservation computations (column measurements) with their name and used method, and the conservation groups with name, value for each column of the group respectively. The conservation score for a given cluster is available through the link table "ln_cluster_colscore". Finally, the "annotation" table contains all information relative to annotation the user adds to a given snapshot. The Ordalie (.ord file extension) consists in a database dump.

7.3 The Vector Norm scoring method

This method is based on a vectorial representation of the 20 amino acids. This representation can be the same as the one used in the VRP represetnation, or can be for example, a volume/polarity couple. The score for a given column k can be computed then by :

S(k) = nc/nt * |sum_i=1^nV|/sum_1=1^n|Vi|

where nc is the number of residues in the column, nt is the total number of sequences.
This function is bounded by 0. and N, where N is the number of sequences in the alignment.


7.4 The Feature File Format

It is possible to import features into Ordalie through a features file. It is so posssible to add items to an existing feature, or to create a completly new ones.

The feature file format looks like :

# This is an example of a feature file format
#
# Declare the feature
FEATURE MyFeat ?PROPAGATE? ?all|group?

# A line starting by \# is a comment line that can be inserted everywhere
#
# Structure of the feature item :
# seq. name;coord. system; start; stop; color; score , note
Q65P3D;LOCAL;23;57;red;0.0;first item
Q65P3D;GLOBAL;212;345;blue;0.0;second one
FLK14Q;local;123;234;red;0.0;one more

# Then go to an other feature
FEATURE STRUCT
P12345;global;2112;2541;green;0.0;add one

To add some items to an existing feature, the feature name should be exactly identical to the one already present in the alignment as feature names are case-sensitive.


7.5 The superposition algorithm

Given two sets of atomsic coordinates A and B, the following algorithm will try to minimize the rms distance between A and B by moving B onto A. The algorithm can be separated in the following steps :

The output of this algorithm provides along with the RMS, the orientation matrix, translation vector, and rotations between the two molecules in different forms.