Difference between revisions of "BIRDQL"

From Wikili
Jump to: navigation, search
(BIRDQL Grammar)
(BIRDQL example)
Line 60: Line 60:
 
>Q92PK5 | SYY_RHIME | Tyrosyl-tRNA synthetase (EC 6.1.1.1) (Tyrosine--tRNA ligase) (TyrRS). | 382
 
>Q92PK5 | SYY_RHIME | Tyrosyl-tRNA synthetase (EC 6.1.1.1) (Tyrosine--tRNA ligase) (TyrRS). | 382
 
MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI...
 
MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI...
 +
 +
  
 
'''Example 2''': complex query
 
'''Example 2''': complex query
Line 336: Line 338:
  
 
FM FLAT
 
FM FLAT
 +
 +
 +
'''Example 11''': Get GENE ONTOLOGY
 +
 +
ID Q32437 DB UNIPROT
 +
 +
FD AC,DR(GO)
 +
 +
Result:
 +
 +
AC  Q32437;
 +
 +
DR  GO; GO:0009507; C:chloroplast; IEA:InterPro.
 +
 +
DR  GO; GO:0016021; C:integral to membrane; IEA:UniProtKB-KW.
 +
 +
DR  GO; GO:0009523; C:photosystem II; IEA:UniProtKB-KW.
 +
 +
DR  GO; GO:0008137; F:NADH dehydrogenase (ubiquinone) activity; IEA:InterPro.
 +
 +
DR  GO; GO:0048038; F:quinone binding; IEA:UniProtKB-KW.
 +
 +
DR  GO; GO:0042773; P:ATP synthesis coupled electron transport; IEA:InterPro.

Revision as of 08:24, 17 June 2008

BIRDQL Biological Query Language

BIRDQL in few words

The heterogeneous data integrated in BIRD System are represented by several relational tables. The exploitation of these data by SQL queries is not obvious except for developers or computer scientist experts.

Building queries with SQL in this context is not easy with because that requires to use joins (terme technique) to select data in multiple tables. This complexity must be hidden by HTML forms but a lot of queries can not be setup with HTML forms.

We proposes own query language (BIRDQL), there is new standard biological query language allowing the biologist or clinician to create data retrieval protocols without exhaustive knowledge of the data sources and their architecture. BIRD System is driven with a high level query engine: BIRDQL, which makes it possible for biologists to express easily queries and to extract knowledge by classical constraints and scientific functions (StructuralDistance,SequencePattern,AssociationRule...).

BIRDQL in not a mathematically complete language but indeed an idiom adpated to the GUI, human readable enough to be modified by hand.

Data can be selected with BIRD Data Access Protocol

BIRDQL Grammar

ID <list of id/ac/query_id > DB <bank names>

WH <Field> Contains <(kw1 & kw2) | kw_n>

WH PATTERN <function SequencePattern() >

WH PATTERN <function DiagonalMolecule()>

WH PATTERN <function InteractionProtein()>

WH PATTERN <function AssociationRule()>

FD <<Field out>>

LM <n>

FM <Fasta/Flat/Xml/CSV/Simple/Object/OID>


Error creating thumbnail: Unable to save thumbnail to destination

BIRDQL example

Data can be selected with BIRD Data Access Protocol

Two other examples below also show how to use the BIRD-QL syntax.

Example 1: simple query, search and fasta format generation


ID * DB UNIPROT

WH DE contains "synthetase" & "tyrosyl"

WH OX contains 382

FD AC, ID,DE,OX,SQ

FM FASTA


Result


>Q92PK5 | SYY_RHIME | Tyrosyl-tRNA synthetase (EC 6.1.1.1) (Tyrosine--tRNA ligase) (TyrRS). | 382 MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI...


Example 2: complex query

ID * DB GENBANK, REFSEQ

WH OC Contains "Eukaryote"

WH DR Contains "GO"

WH GENE contains "GF100027"

FM FASTA

The query above allow to search in Genbank and RefSeq, the Eucaryotic sequences containing the GF100027 gene with a cross reference in GeneOntology.


Example 2: complex query

ID * DB GENBANK, REFSEQ

WH OC Contains "Eukaryote"

WH DR Contains "GO"

WH GENE contains "GF100027"

FM FASTA

The query above allow to search in Genbank and RefSeq, the Eucaryotic sequences containing the GF100027 gene with a cross reference in GeneOntology.

Example 3: mining in EST

ID * DB GBEST

WH TISSUE_TYPE contains "retina"

WH DEV_STAGE contains "adult"

LM 100

FD AC,DE,OX,OC,tissue_type,dev_stage,chr

FM FLAT

Example 4: Mining in EST

ID CJ133635,CJ133593,CJ133659 DB GBEST

WH DE contains "AMINOTRANSFERASE"

WH OC contains "Eukaryota" & not "Metazoa"

WH TISSUE_TYPE contains "retina"

FD AC,DE,OX,OC,tissue_type,dev_stage,chr

FM FLAT


Example 5: Mining in EST

ID * DB GBEST

WH TISSUE_TYPE contains "colon"

WH DEV_STAGE contains "adult"

LM 100

FD AC,DE,OX,OC,tissue_type,dev_stage,chr,os

FM FLAT


Example 6: Mining In PDB

ID * DB PDB

WH DE contains "ERYTHRINA CORALLODENDRON LECTIN IN COMPLEX"

WH OS contains "ERYTHRINA CORALLODENDRON"

WH RESO contains 1.90

LM 10

FM FASTA


//

ID * DB PDB

WH CL contains "METAL BINDING PROTEIN"

WH DE contains "LACTOFERRIN"

WH FUNCTION Diagnonal3D()>125

FUZZY 100

LM 100

FM FASTA

//

ID * DB PDB WH CL contains "METAL BINDING PROTEIN"

WH DE contains "LACTOFERRIN"

WH FUNCTION Diagnonal3D()>125

FUZZY 100

LM 100

FM SIMPLE


//

ID * DB PDB

WH CL contains "METAL BINDING PROTEIN"

WH DE contains "LACTOFERRIN"

WH FUNCTION Diagnonal3D()>125


LM 10

FM FLAT


//

ID * DB PDB

WH CL contains "METAL BINDING PROTEIN"

WH DE contains "LACTOFERRIN"

WH FUNCTION Diagnonal3D()>125

FD GET_COUNT

FM FLAT


Example 7: rmining in EVI Genoret Database

ID * DB EVImm

WH text CONTAINS "retina"

FD GET_COUNT

FM CSV

//

ID * DB EVImm

WH text CONTAINS "retina"

LM 10

FD NOM,VALIDEPROT

FM CSV

//

ID * DB EVImm

WH text CONTAINS "retina"

LM 10

FD NOM,VALIDEPROT

FM FLAT

//

ID * DB EVImm

WH text CONTAINS "retina" & "chr6:127091327-127116667"

LM 10

FM XML


Example 8: running SQL Native (authorized user)

ID * DB STRING

WH SQLNATIVE select * from items.proteins

Limit 100

FM CSV


Example 9: SAGE Data (or MYGEO [1]) (authorized user)


ID * DB MYGEO

WH SQLNATIVE select sample_id,tag,count from MYGEO.entry_sage where tag='GTGAAACCCC'

LM 500

FM CSV

==> results

GSM1|GTGAAACCCC|63

GSM2|GTGAAACCCC|76

GSM571|GTGAAACCCC|13

GSM572|GTGAAACCCC|395

GSM573|GTGAAACCCC|359

GSM574|GTGAAACCCC|286

GSM668|GTGAAACCCC|132

GSM669|GTGAAACCCC|129

GSM670|GTGAAACCCC|112


//

ID * DB MYGEO

WH SQLNATIVE select e.sample_id,e.tag,e.count from MYGEO.entry_sage as e, MYGEO.sample as h where (h.ac=e.sample_id and tag='GTGAAACCCC') and h.platform_id='GPL4' and count between 1 and 3

LM 1000

FM CSV


Results

GSM135388|GTGAAACCCC|1|GPL4


//

ID * DB MYGEO

WH SQLNATIVE select sample_id,tag,count from MYGEO.entry_sage where tag='GTGAAACCCC' and count between 1 and 3

LM 1000

FM CSV


Example 10: Association rules (authorized user)

ID * DB protein_interaction

WH PATTERN AssociationPattern(Right(protA,ProtB,ProtC),Left(ProtK),sup=30,conf=90))

FD ID,Rules

FM FLAT


Example 11: Get GENE ONTOLOGY

ID Q32437 DB UNIPROT

FD AC,DR(GO)

Result:

AC Q32437;

DR GO; GO:0009507; C:chloroplast; IEA:InterPro.

DR GO; GO:0016021; C:integral to membrane; IEA:UniProtKB-KW.

DR GO; GO:0009523; C:photosystem II; IEA:UniProtKB-KW.

DR GO; GO:0008137; F:NADH dehydrogenase (ubiquinone) activity; IEA:InterPro.

DR GO; GO:0048038; F:quinone binding; IEA:UniProtKB-KW.

DR GO; GO:0042773; P:ATP synthesis coupled electron transport; IEA:InterPro.