Julie Thompson

Julie Thompson thompson@unistra.fr Fed DBGS /~julie /wikili/index.php/Julie_Thompson

Responsible of following 6 workpackages :

Alexsys : Alexsys

    Responsible : Julie Thompson
    Participants : Julie Thompson
    Description : The last decade has provided access to a large amount of data resulting from high throughput genomic technologies, such as transcriptomics, proteomics or interactomics. However, this large scale accumulation of data is only a necessary preliminary in the understanding of the principles and fundamental mechanisms of life. Comparative sequence analyses and phylogenetic inferences play an essential role in the biological systems studies aimed at understanding this new data.
In this context, we are developing an expert system, AlexSys, for the construction, refinement, analysis and exploitation of multiple sequence alignments, combining diverse, complementary algorithms and incorporating additional information (structural, functional and evolutionary). A prototype platform is now available, built on the Unstructured Information Management Architecture (UIMA). The alignment methodology is based on ClustalW, but integrates several more powerful algorithms that significantly increase the efficiency and the quality of the multiple alignment. AlexSys also includes a number of diagnostic tests that allow us to automatically select the most suitable alignment algorithm, depending on the set of sequences to be aligned and the biological application.
The modular design of AlexSys facilitates the incorporation of new algorithms and will allow its continued evolution. In the future, we will incorporate more diverse components, covering aspects of genomic and protein data mining, validation and integration of structural/functional data, integrated with a set of different algorithms.

EvolHHuPro : EvolHHuPro

    Responsible : Julie Thompson
    Participants : Benjamin Linard, Olivier Poch, Julie Thompson
    Description : The genetic information encoded in the genome sequence contains the blueprint for the potential development and activity of an organism. This information can only be fully comprehended in the light of the evolutionary events (duplication, loss, recombination, mutation…) acting on the genome, that are reflected in changes in the sequence, structure and function of the gene products (nucleic acids and proteins) and ultimately, in the biological complexity of the organism.
The recent availability of the complete genome sequences of a large number of model organisms means that we can now begin to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. This is illustrated by the evolutionary analyses and phylogenetic inferences that play an important role in most functional genomics studies, e.g. of promoters (‘phylogenetic footprinting’), of interactomes (notion of ‘interologs’ based on the presence and degree of conservation of counterparts of interactive proteins), and also, in comparisons of transcriptomes or proteomes (notion of phylogenetic proximity and co-regulation/co-expression).
At the same time, theoretical advances in information representation and management have revolutionised the way experimental information is collected, stored and exploited. Ontologies, such as Gene Ontology (GO) or Sequence Ontology (SO), provide a formal representation of the data for automatic, high-throughput data parsing by computers. These ontologies are being exploited in the new information management systems to allow large scale data mining, pattern discovery and knowledge inference.
Unfortunately, the vast number and complexity of the events shaping eukaryotic genomes means that a complete understanding of evolution at the genomic level is not currently feasible. At the lowest level, point mutations affect individual nucleotides. At a higher level, large chromosomal segments undergo duplication, lateral transfer, inversion, transposition, deletion and insertion. Ultimately, whole genomes are involved in processes of hybridization, polyploidization and endosymbiosis, often leading to rapid speciation.
We will characterise and study the evolutionary histories of the human proteome, defined as the impact in the human proteins (extensions, insertions, deletions…) of the cascade of genetic events (duplication, lateral transfer, inversion, transposition, deletion, insertion…) that occurred during the evolution of the vertebrate genomes. This ambitious objective is now possible thanks to the emergence of formal descriptions of biological data and to the recent developments of accurate phylogenetic reconstruction and genome analyses (Partner 1: Figenix platform) and of automated reliable and exploitable protein sequence alignments (TCOFFEE, PipeAlign, MAO, MACSIMS…). These methodologies will be combined into a multi-agent, expert system for the construction of evolutionary histories. In order to facilitate the automatic definition of the important genetic events shaping a single protein and their potential causalities at the genome level, a new ontology will be developed. In a subsequent step, the evolutionary histories of the complete human proteome will be reconstructed, followed by their classification into protein sets sharing typical evolutionary histories, and the functional analysis of these sets. An analysis at the genomic level will be realized for a specific number of proteins identified in the classification and functional analysis step.

MyoNet : MyoNet

    Responsible : Julie Thompson
    Participants : Julie Thompson
    Description : Our team will characterize of the total set of mouse proteins involved directly or indirectly in the transcriptional processes. This will require an in depth sequence, structural, evolutionary (SSE) and functional analysis of the mouse proteome with the major objective of defining and delineating any conserved domains or regions that might be associated to known transcriptional modules. This work will be performed in collaboration with M. Andrade’s team (Ottawa, Canada) in the context of the International Regulome Consortium (http://www.internationalregulomeconsortium.ca/). In the framework of the proposed Decrypthon project, the SSE analysis of the entire human/mouse proteome (~60 000 proteins including splice variants and the human or mouse specific proteins) will involve a pipeline of processes starting with homology identification, multiple sequence alignment, structural and functional subfamily classification, orthology/paralogy analysis and phylogenetic reconstruction. We will take advantage of the previous developments performed on the Decrypthon grid, notably those concerning the MACSIMS (Multiple Alignment of Complete Sequence Information Management System) functional annotation and new protocols will be developed including PSI-Blast searches to detect distantly related proteins, recent multiple alignment algorithms implementation and phylogenetic tree algorithms. Protocols ensuring automated updating and storage in a relational database, hosted by the Decrypthon, will be developed. The results will be combined with the data from the transcriptomal analysis performed in vivo. This complementary approach is expected to help us to identify and characterise the transcriptional networks involved in muscle development, specification, regeneration and myogenic progression. In vivo functional validation will be done using mouse molecular genetics and expertise in muscle biology in the laboratory of F. Relaix.

Evolutionary Informatics : Evolutionary Informatics

    Responsible : Julie Thompson
    Participants : Julie Thompson
    Description : The term "Evolutionary Informatics" emphasizes the informatics aspects of evolutionary analysis, which is a kind of comparative analysis. We are participating in the EvoInfo working group funded by NESCent (http://www.nescent.org/).

The growth of bioinformatics and genomics presents a wealth of opportunities for expanded application of evolutionary methods-- expanded with respect to both the amount and the variety of analyses. Powerful tools for evolutionary analysis already exist, but integrating evolutionary methodology into biological data analysis does not depend so much on the power of tools as it does on infrastructure. To address these infrastrutural needs, the EvoInfo working group will develop community cohesion on issues of standards and interoperability, and will facilitate (directly and indirectly) development of interoperable software and data standards.

In this context, we are participating in the development of a Comparative Data Analysis Ontology (CDAO). (https://www.nescent.org/wg_evoinfo/CDAO)

WorkGroup Inférence Phylogénétique : WorkGroup Inférence Phylogénétique

    Responsible : Julie Thompson
    Participants : Julie Thompson
    Description : The development of new strategies for automatic, reliable phylogenetic inference in large-scale projects.

The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogenetic inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of trancriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression).

We are building on our past experience in the contruction and exploitation of Multiple Alignments of Complete Sequences (MACS). In the past, we have developed a number of tools aimed at constructing high quality MACS (DbClustal, RASCAL, LEON, NorMD). An important aspect is the objective evaluation of our tools based on the BAliBASE benchmark suite. This research axis is now continuing, with the development of a new alignment expert system AlexSys. More recently, we have also addressed the problems of the automatic integration of heterogeneous information in the context of a protein family alignment, with the development of a Multiple Alignment Ontology (MAO) and a new MACS-based Information Management System (MACSIMS).

All these tools are exploited in a number of different biological applications, including genome annotation and analysis (e.g. Mycobacterium Smegmatis, Alvinella Pompejana), structure/function/evolution analysis (e.g. Muscular Interactome, MyoNet), construction of evolutionary histories for the human proteome (EvolHHuPro).

Instruct Bioinformatics : Instruct Bioinformatics

    Responsible : Olivier Poch, Raymond Ripp, Julie Thompson
    Participants : Olivier Poch, Raymond Ripp, Julie Thompson
    Description :

Participates to following workpackages :

WP-LBGI : Laboratoire de BioInformatique et Genomique Intégratives : WP-LBGI : Laboratoire de BioInformatique et Genomique Intégratives

    Responsible : Olivier Poch
    Participants : Alexis Allot, Carlos Bermejo-Das-Neves, Kirsley Chennen, Arnaud Kress, Odile Lecompte, Luc Moulinier, Jean Muller, Yannis Nevers, Anne Ney, Olivier Poch, Laetitia Poidevin, Wolfgang Raffelsberger, Raymond Ripp, Raphaël Schneider, Julie Thompson, Renaud Vanhoutreve, Catherine Guth, Pierre Collet, Anne Jeannin, Pierre Parrend
    Description : Ce workpackage est tout LBGI BioInformatique et Génomique Intégatives