Difference between revisions of "BIRD"

From Wikili
Jump to: navigation, search
(DATA ACCESS)
 
(283 intermediate revisions by 4 users not shown)
Line 1: Line 1:
BIRD : Biological Integration and Retrieval Data est développé par [[Hoan Nguyen]] at LBGI laboratory ([IGBMC] Strasbourg)
+
BIRD System : Biological Integration and Retrieval Data was designed by Hoan Nguyen at LBGI laboratory (POCH Team) of IGBMC[http://www-igbmc.u-strasbg.fr] Strasbourg  
==What is BIRD System==
+
==What is the BIRD System==
 +
===BIRD System Overview===
 +
The BIRD System was designed to manage large collections of biological data ([[Bird_Databases_List]]) and to perform intensive computation and simulation. BIRD has inherited some of the idealogy of the Saada project [http://amwdb.u-strasbg.fr/saada/article.php3?id_article=32]. A generic configurable data model has been designed and allows the simultaneous integration of genomics, transcriptomics and ontology datasets using a limited number of product mapping rules provided by the user (operator or system administrator). The integration rules allow the easy creation of a database according to semantic topics and real requirements.
 +
BIRD is driven by a high level query engine (BIRD-QL), based on SQL and a full text engine allowing the biologist to quickly extract knowledge without programming. Thanks to such an engine, the system is capable of generating sub-databases in accordance with the real requirements of a given project.
  
 +
The hosted data can be accessed by the community using various methods such as a Web interface, Http Service, an API Java or a BIRD-QL Engine Query.
  
BIRD (Nguyen et al, CORIA 2008, Hermes Edition)is designed to manage collections of biological data. A generic configurable data model has been designed and allows the simultaneous integration of genomics, transcriptomics and ontology datasets using a limited number of product mapping rules provided by the user (operator or system administrator). The integration rules allow the easy creation of the database according to semantic topics and real requirements.
+
The BIRD System is developed using the Java technology and uses the IBM DB2 as the data server, as well as the Websphere Federation Server for virtual databases. The web application is hosted either by a Tomcat Server or by a WebSphere Application Server.  
BIRD is driven with a high level query engine, based on SQL and a full text engine allowing the biologist to quickly extract knowledge without programming. Thanks to such an engine, the system is capable to generate the sub-bank of data in accordance with the real requirement.  
 
  
The hosted data can be accessed by the community using various methods such as a Web interface, Http Service, an API Java or a BIRD-QL Engine Query (via HTTP service or API Java).  
+
The BIRD System is not only a data retrieval tool, but also provides a platform for Knowledge Discovery in Biological Databases or an inductive database. We use the IBM Intelligent Miner (association rules, classification, ..) in order to develop the data mining model. The user can then use BIRD-QL for mining  pertinent information or for analyzing the relational patterns based on the descriptive patterns available in the BIRD-QL engine.  
  
BIRD is developed with the Java technology. BIRD uses IBM DB2 for data server; Websphere Federtion Server for virtual databases and Miner Intelligent for KDD. The web application is hosted by a Tomcat Server or by a WebSphere Application Server.
 
  
Server at Decrypthon: [[http://decrypthon-1.ens-lyon.fr:9080/BirdSystem/HomePage.do]]
+
The first goal of the Bird System is the implementation of the Décrypthon Data Center in the framework of the Décrypthon Programme (AFM/CNRS/IBM ) [http://www.decrypthon.fr]
  
Server at IGBMC: [[http://bird.u-strasbg.fr:9080/BirdSystem/HomePage.do]]
+
==[[BIRDQL]] Biological Query Language ==
  
==DATABASES List ==
+
The heterogeneous data integrated in the BIRD System are represented by several relational tables. The exploitation of these data by SQL queries is not obvious and can only be performed by expert developers or computer scientists.  
GENBANK, REFSEQ, PDB, UNIPROT, UCSC, INTERPRO, GO, TAXONOMY, MACSIM, EVI-GENORET, STRING (local user), UMD Data (local user), ...
 
==BIRDQL Biological Query Language ==
 
===BIRDQL in few words===
 
The heterogeneous data integrated in BIRD System are represented by several relational tables. The exploitation of these data by SQL queries is not obvious except for developers or computer scientist experts.
 
Building queries with SQL in this context is not easy with because that requires to use joins (terme technique) to select data in multiple tables. This complexity must be hidden by HTML forms but a lot of queries can not be setup with HTML forms.
 
We proposes own query language (BIRDQL), there is new standard biological query language allowing the biologist or clinician to create data retrieval protocols without exhaustive knowledge of the data sources and their architecture. BIRD System is driven with a high level query engine: BIRD-QL, which makes it possible for biologists to express easily queries and to extract knowledge by classical constraints and scientific functions (StructuralDistance,SequencePattern...).
 
BIRDQL in not a mathematically complete language but indeed an idiom adpated to the GUI, human readable enough to be modified by hand.
 
  
===BIRDQL Grammar ===
+
In this context, building complex queries with SQL involves the use of joins (technical term) to select data in multiple tables. This complexity can be hidden by HTML forms, but many types of queries cannot be specified with HTML forms.
  
ID  <list of id/ac/query_id > DB  <bank names>
+
We have therefore developed our own query language ([[BIRDQL]]), which is a new biological query language that allows the biologist or clinician to create data retrieval protocols without requiring exhaustive knowledge of the data sources and their architecture. BIRDQL makes it possible for biologists to easily express queries and to extract knowledge using classical constraints and scientific functions (StructuralDistance,SequencePattern,AssociationRule...).
  
WH  Field Contains kw1 |& kw2 |& kw_n
+
[[BIRDQL]] in not a mathematically complete language but instead is an idiom that is adapted to the GUI and is human readable enough to be modified by hand.
 +
see more [[BIRDQL]]
  
WH  PATTERN <function SequencePattern() >
 
  
WH  PATTERN <function DiagonalMolecule()>
 
  
WH  PATTERN <function InteractionProtein()
+
[[Category:Bird_project]]
 
 
WH  PATTERN <function …. ()
 
 
 
LD  <Field out>
 
 
 
FM  <n>
 
 
 
FM  Fasta/Flat/Xml/CSV/Simple/Object
 
 
 
===BIRDQL example===
 
Two other examples below also show how to use the BIRD-QL syntax.
 
 
 
Example 1: simple query, search and fasta format generation
 
 
 
 
 
ID * DB UNIPROT
 
 
 
WH DE contains synthetase |and tyrosyl
 
 
 
WH OX contains 382
 
 
 
FD AC, ID,DE,OX,SQ
 
 
 
FM FASTA
 
 
 
 
 
Result
 
 
 
 
 
>Q92PK5 | SYY_RHIME | Tyrosyl-tRNA synthetase (EC 6.1.1.1) (Tyrosine--tRNA ligase) (TyrRS). | 382
 
MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI...
 
 
 
Example 2: complex query
 
 
 
ID * DB GENBANK, REFSEQ
 
 
 
WH OC Contains Eukaryote
 
 
 
WH DR Contains GO
 
 
 
WH GENE contains GF100027
 
 
 
FM OID
 
 
 
The query above allow to search in Genbank and RefSeq, the Eucaryotic sequences containing the GF100027 gene with a cross reference in GeneOntology.
 
 
 
 
 
Example 3: complex query
 
 
 
ID * DB GENBANK, REFSEQ
 
WH OC Contains Eukaryote
 
WH DR Contains GO
 
WH GENE contains GF100027
 
FM SIMPLE
 
 
 
The query above allow to search in Genbank and RefSeq, the Eucaryotic sequences containing the GF100027 gene with a cross reference in GeneOntology.
 
 
 
 
 
 
 
[[Image:Example.jpg]]
 
 
 
==DATA ACCESS (Decrypthon)==
 
===Data Browsing===
 
Database content can be browsed from HTML BIRD WEB
 
 
 
===Data Selection by BIRD-QL Service===
 
Data can also be selected with BIRD-QL queries; Expert users can however modify queries by hand. 2 query service are available:
 
        1. BIRD-QL Editor to run BIRD-QL queries.
 
        2. Script command (curl under linux) to run BIRD-QL queries, and can use it in calculations intensive, download birdql cmd.
 
        3. 
 
===Simple Service===
 
 
 
Syntax:  http://d1.crihan.fr:8080/bird/bsearch?db=<database>&accession=< ac or id> & field=<DE,OS..> &format=<fasta/flat>
 
 
 
Ex1 get EST Info :  http://d1.crihan.fr:8080/bird/bsearch?db=gbest&accession=Cj133605 &field=DE,OS,OC,TISSUE_TYPE,DEV_STAGE
 
 
 
Ex2 get Protein :    http://d1.crihan.fr:8080/bird/bsearch?db=uniprot&accession=Q23456
 
 
 
Ex3  get PDB :        http://d1.crihan.fr:8080/bird/bsearch?db=pdb&idcode=1XDS
 
 
 
Ex4  get fasta :  http://d1.crihan.fr:8080/bird/bsearch?db=pdb&idcode=1XDS&format=fasta
 
WEB Server
 
 
 
 
 
 
 
 
 
===API JAVA & SQL Native===
 
 
 
==BIRD business intelligence ==
 
 
 
===Data Discovery in Database===
 
===DB2 Miner Intelligent (API)===
 
===Example in BIRD System===
 
 
 
==BIRD Implementation==
 
 
 
Architecture Federation
 
Data Model
 
Query Engine
 
Data Integration
 
Key Technologies
 
 
 
wwwww
 
 
 
==BIRD System in Action ==
 
===Decrypthon Data Center Implementation===
 
http://decrypthon-1.ens-lyon.fr:9080/BirdSystem/HomePage.do
 
 
 
===Macsim utilise BIRD===
 
Macsim peut se mettre maintenant en connexion directe avec Bird
 
 
 
===GPS utilises BIRD===
 
http://nucleic.fr
 
 
 
===Gscope utilise BIRD===
 
Gscope peut se mettre maintenant en connexion directe avec Bird
 
 
 
 
 
* proc '''BirdFromQueryText''' {Texte {OutFile ""} {BirdUrl ""}}
 
* proc '''BirdFromQueryFile''' {Fichier {OutFile ""} {BirdUrl ""}}
 
 
 
Bird sait intégrer les fiches infos d'un projet Gscope. On peut alors les interroger directement par http ou par Gscope ou, mieux, par des affiches avec la commande '''BirdGscopeSearch'''
 

Latest revision as of 08:18, 1 October 2013

BIRD System : Biological Integration and Retrieval Data was designed by Hoan Nguyen at LBGI laboratory (POCH Team) of IGBMC[1] Strasbourg

What is the BIRD System

BIRD System Overview

The BIRD System was designed to manage large collections of biological data (Bird_Databases_List) and to perform intensive computation and simulation. BIRD has inherited some of the idealogy of the Saada project [2]. A generic configurable data model has been designed and allows the simultaneous integration of genomics, transcriptomics and ontology datasets using a limited number of product mapping rules provided by the user (operator or system administrator). The integration rules allow the easy creation of a database according to semantic topics and real requirements. BIRD is driven by a high level query engine (BIRD-QL), based on SQL and a full text engine allowing the biologist to quickly extract knowledge without programming. Thanks to such an engine, the system is capable of generating sub-databases in accordance with the real requirements of a given project.

The hosted data can be accessed by the community using various methods such as a Web interface, Http Service, an API Java or a BIRD-QL Engine Query.

The BIRD System is developed using the Java technology and uses the IBM DB2 as the data server, as well as the Websphere Federation Server for virtual databases. The web application is hosted either by a Tomcat Server or by a WebSphere Application Server.

The BIRD System is not only a data retrieval tool, but also provides a platform for Knowledge Discovery in Biological Databases or an inductive database. We use the IBM Intelligent Miner (association rules, classification, ..) in order to develop the data mining model. The user can then use BIRD-QL for mining pertinent information or for analyzing the relational patterns based on the descriptive patterns available in the BIRD-QL engine.


The first goal of the Bird System is the implementation of the Décrypthon Data Center in the framework of the Décrypthon Programme (AFM/CNRS/IBM ) [3]

BIRDQL Biological Query Language

The heterogeneous data integrated in the BIRD System are represented by several relational tables. The exploitation of these data by SQL queries is not obvious and can only be performed by expert developers or computer scientists.

In this context, building complex queries with SQL involves the use of joins (technical term) to select data in multiple tables. This complexity can be hidden by HTML forms, but many types of queries cannot be specified with HTML forms.

We have therefore developed our own query language (BIRDQL), which is a new biological query language that allows the biologist or clinician to create data retrieval protocols without requiring exhaustive knowledge of the data sources and their architecture. BIRDQL makes it possible for biologists to easily express queries and to extract knowledge using classical constraints and scientific functions (StructuralDistance,SequencePattern,AssociationRule...).

BIRDQL in not a mathematically complete language but instead is an idiom that is adapted to the GUI and is human readable enough to be modified by hand. see more BIRDQL