Difference between revisions of "BIRD"
(→Decrypthon Data Center Implementation) |
|||
Line 1: | Line 1: | ||
− | BIRD : Biological Integration and Retrieval Data est développé par [[Hoan Nguyen]] | + | BIRD : Biological Integration and Retrieval Data est développé par [[Hoan Nguyen]] at LBGI laboratoire (IGBMC Strasbourg) |
==What is BIRD System== | ==What is BIRD System== | ||
Revision as of 16:56, 12 February 2008
BIRD : Biological Integration and Retrieval Data est développé par Hoan Nguyen at LBGI laboratoire (IGBMC Strasbourg)
Contents
What is BIRD System
BIRD (Nguyen et al, CORIA 2008, Hermes Edition)is designed to manage collections of biological data. A generic configurable data model has been designed and allows the simultaneous integration of genomics, transcriptomics and ontology datasets using a limited number of product mapping rules provided by the user (operator or system administrator). The integration rules allow the easy creation of the database according to semantic topics and real requirements. BIRD is driven with a high level query engine, based on SQL and a full text engine allowing the biologist to quickly extract knowledge without programming. Thanks to such an engine, the system is capable to generate the sub-bank of data in accordance with the real requirement.
The hosted data can be accessed by the community using various methods such as a Web interface, Http Service, an API Java or a BIRD-QL Engine Query (via HTTP service or API Java).
BIRD is developed with the Java technology. BIRD uses IBM DB2 for data server; Websphere Federtion Server for virtual databases and Miner Intelligent for KDD. The web application is hosted by a Tomcat Server or by a WebSphere Application Server.
Server at Decrypthon: [[1]]
Server at IGBMC: [[2]]
DATABASES List
GENBANK, REFSEQ, PDB, UNIPROT, UCSC, INTERPRO, GO, TAXONOMY, MACSIM, EVI-GENORET, STRING (local user), UMD Data (local user), ...
BIRDQL Biological Query Language
BIRDQL in few words
The heterogeneous data integrated in BIRD System are represented by several relational tables. The exploitation of these data by SQL queries is not obvious except for developers or computer scientist experts. Building queries with SQL in this context is not easy with because that requires to use joins (terme technique) to select data in multiple tables. This complexity must be hidden by HTML forms but a lot of queries can not be setup with HTML forms. We proposes own query language (BIRDQL), there is new standard biological query language allowing the biologist or clinician to create data retrieval protocols without exhaustive knowledge of the data sources and their architecture. BIRD System is driven with a high level query engine: BIRD-QL, which makes it possible for biologists to express easily queries and to extract knowledge by classical constraints and scientific functions (StructuralDistance,SequencePattern...). BIRDQL in not a mathematically complete language but indeed an idiom adpated to the GUI, human readable enough to be modified by hand.
BIRDQL Grammar
ID <list of id/ac/query_id > DB <bank names>
WH Field Contains kw1 |& kw2 |& kw_n
WH PATTERN <function SequencePattern() >
WH PATTERN <function DiagonalMolecule()>
WH PATTERN <function InteractionProtein()
WH PATTERN <function …. ()
LD <Field out>
FM <n>
FM Fasta/Flat/Xml/CSV/Simple/Object
BIRDQL example
Two other examples below also show how to use the BIRD-QL syntax.
Example 1: simple query, search and fasta format generation
ID * DB UNIPROT
WH DE contains synthetase |and tyrosyl
WH OX contains 382
FD AC, ID,DE,OX,SQ
FM FASTA
Result
>Q92PK5 | SYY_RHIME | Tyrosyl-tRNA synthetase (EC 6.1.1.1) (Tyrosine--tRNA ligase) (TyrRS). | 382
MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI...
Example 2: complex query
ID * DB GENBANK, REFSEQ
WH OC Contains Eukaryote
WH DR Contains GO
WH GENE contains GF100027
FM OID
The query above allow to search in Genbank and RefSeq, the Eucaryotic sequences containing the GF100027 gene with a cross reference in GeneOntology.
Example 3: complex query
ID * DB GENBANK, REFSEQ WH OC Contains Eukaryote WH DR Contains GO WH GENE contains GF100027 FM SIMPLE
The query above allow to search in Genbank and RefSeq, the Eucaryotic sequences containing the GF100027 gene with a cross reference in GeneOntology.
DATA ACCESS
WEB Server
BIRD-QL Service
API JAVA & SQL Native
BIRD Miner Intelligent
BIRD Implementation
Architecture Federation Data Model Query Engine Data Integration Key Technologies
wwwww
BIRD System in Action
Decrypthon Data Center Implementation
http://decrypthon-1.ens-lyon.fr:9080/BirdSystem/HomePage.do
Macsim utilise BIRD
Macsim peut se mettre maintenant en connexion directe avec Bird
GPS utilises BIRD
Gscope utilise BIRD
Gscope peut se mettre maintenant en connexion directe avec Bird
- proc BirdFromQueryText {Texte {OutFile ""} {BirdUrl ""}}
- proc BirdFromQueryFile {Fichier {OutFile ""} {BirdUrl ""}}
Bird sait intégrer les fiches infos d'un projet Gscope. On peut alors les interroger directement par http ou par Gscope ou, mieux, par des affiches avec la commande BirdGscopeSearch