Difference between revisions of "BIRDQL"
(→BIRDQL Grammar) |
(→BIRDQL example) |
||
Line 60: | Line 60: | ||
>Q92PK5 | SYY_RHIME | Tyrosyl-tRNA synthetase (EC 6.1.1.1) (Tyrosine--tRNA ligase) (TyrRS). | 382 | >Q92PK5 | SYY_RHIME | Tyrosyl-tRNA synthetase (EC 6.1.1.1) (Tyrosine--tRNA ligase) (TyrRS). | 382 | ||
MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI... | MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI... | ||
+ | |||
+ | |||
'''Example 2''': complex query | '''Example 2''': complex query | ||
Line 336: | Line 338: | ||
FM FLAT | FM FLAT | ||
+ | |||
+ | |||
+ | '''Example 11''': Get GENE ONTOLOGY | ||
+ | |||
+ | ID Q32437 DB UNIPROT | ||
+ | |||
+ | FD AC,DR(GO) | ||
+ | |||
+ | Result: | ||
+ | |||
+ | AC Q32437; | ||
+ | |||
+ | DR GO; GO:0009507; C:chloroplast; IEA:InterPro. | ||
+ | |||
+ | DR GO; GO:0016021; C:integral to membrane; IEA:UniProtKB-KW. | ||
+ | |||
+ | DR GO; GO:0009523; C:photosystem II; IEA:UniProtKB-KW. | ||
+ | |||
+ | DR GO; GO:0008137; F:NADH dehydrogenase (ubiquinone) activity; IEA:InterPro. | ||
+ | |||
+ | DR GO; GO:0048038; F:quinone binding; IEA:UniProtKB-KW. | ||
+ | |||
+ | DR GO; GO:0042773; P:ATP synthesis coupled electron transport; IEA:InterPro. |
Revision as of 08:24, 17 June 2008
Contents
BIRDQL Biological Query Language
BIRDQL in few words
The heterogeneous data integrated in BIRD System are represented by several relational tables. The exploitation of these data by SQL queries is not obvious except for developers or computer scientist experts.
Building queries with SQL in this context is not easy with because that requires to use joins (terme technique) to select data in multiple tables. This complexity must be hidden by HTML forms but a lot of queries can not be setup with HTML forms.
We proposes own query language (BIRDQL), there is new standard biological query language allowing the biologist or clinician to create data retrieval protocols without exhaustive knowledge of the data sources and their architecture. BIRD System is driven with a high level query engine: BIRDQL, which makes it possible for biologists to express easily queries and to extract knowledge by classical constraints and scientific functions (StructuralDistance,SequencePattern,AssociationRule...).
BIRDQL in not a mathematically complete language but indeed an idiom adpated to the GUI, human readable enough to be modified by hand.
Data can be selected with BIRD Data Access Protocol
BIRDQL Grammar
ID <list of id/ac/query_id > DB <bank names>
WH <Field> Contains <(kw1 & kw2) | kw_n>
WH PATTERN <function SequencePattern() >
WH PATTERN <function DiagonalMolecule()>
WH PATTERN <function InteractionProtein()>
WH PATTERN <function AssociationRule()>
FD <<Field out>>
LM <n>
FM <Fasta/Flat/Xml/CSV/Simple/Object/OID>
BIRDQL example
Data can be selected with BIRD Data Access Protocol
Two other examples below also show how to use the BIRD-QL syntax.
Example 1: simple query, search and fasta format generation
ID * DB UNIPROT
WH DE contains "synthetase" & "tyrosyl"
WH OX contains 382
FD AC, ID,DE,OX,SQ
FM FASTA
Result
>Q92PK5 | SYY_RHIME | Tyrosyl-tRNA synthetase (EC 6.1.1.1) (Tyrosine--tRNA ligase) (TyrRS). | 382
MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI...
Example 2: complex query
ID * DB GENBANK, REFSEQ
WH OC Contains "Eukaryote"
WH DR Contains "GO"
WH GENE contains "GF100027"
FM FASTA
The query above allow to search in Genbank and RefSeq, the Eucaryotic sequences containing the GF100027 gene with a cross reference in GeneOntology.
Example 2: complex query
ID * DB GENBANK, REFSEQ
WH OC Contains "Eukaryote"
WH DR Contains "GO"
WH GENE contains "GF100027"
FM FASTA
The query above allow to search in Genbank and RefSeq, the Eucaryotic sequences containing the GF100027 gene with a cross reference in GeneOntology.
Example 3: mining in EST
ID * DB GBEST
WH TISSUE_TYPE contains "retina"
WH DEV_STAGE contains "adult"
LM 100
FD AC,DE,OX,OC,tissue_type,dev_stage,chr
FM FLAT
Example 4: Mining in EST
ID CJ133635,CJ133593,CJ133659 DB GBEST
WH DE contains "AMINOTRANSFERASE"
WH OC contains "Eukaryota" & not "Metazoa"
WH TISSUE_TYPE contains "retina"
FD AC,DE,OX,OC,tissue_type,dev_stage,chr
FM FLAT
Example 5: Mining in EST
ID * DB GBEST
WH TISSUE_TYPE contains "colon"
WH DEV_STAGE contains "adult"
LM 100
FD AC,DE,OX,OC,tissue_type,dev_stage,chr,os
FM FLAT
Example 6: Mining In PDB
ID * DB PDB
WH DE contains "ERYTHRINA CORALLODENDRON LECTIN IN COMPLEX"
WH OS contains "ERYTHRINA CORALLODENDRON"
WH RESO contains 1.90
LM 10
FM FASTA
//
ID * DB PDB
WH CL contains "METAL BINDING PROTEIN"
WH DE contains "LACTOFERRIN"
WH FUNCTION Diagnonal3D()>125
FUZZY 100
LM 100
FM FASTA
//
ID * DB PDB WH CL contains "METAL BINDING PROTEIN"
WH DE contains "LACTOFERRIN"
WH FUNCTION Diagnonal3D()>125
FUZZY 100
LM 100
FM SIMPLE
//
ID * DB PDB
WH CL contains "METAL BINDING PROTEIN"
WH DE contains "LACTOFERRIN"
WH FUNCTION Diagnonal3D()>125
LM 10
FM FLAT
//
ID * DB PDB
WH CL contains "METAL BINDING PROTEIN"
WH DE contains "LACTOFERRIN"
WH FUNCTION Diagnonal3D()>125
FD GET_COUNT
FM FLAT
Example 7: rmining in EVI Genoret Database
ID * DB EVImm
WH text CONTAINS "retina"
FD GET_COUNT
FM CSV
//
ID * DB EVImm
WH text CONTAINS "retina"
LM 10
FD NOM,VALIDEPROT
FM CSV
//
ID * DB EVImm
WH text CONTAINS "retina"
LM 10
FD NOM,VALIDEPROT
FM FLAT
//
ID * DB EVImm
WH text CONTAINS "retina" & "chr6:127091327-127116667"
LM 10
FM XML
Example 8: running SQL Native (authorized user)
ID * DB STRING
WH SQLNATIVE select * from items.proteins
Limit 100
FM CSV
Example 9: SAGE Data (or MYGEO [1]) (authorized user)
ID * DB MYGEO
WH SQLNATIVE select sample_id,tag,count from MYGEO.entry_sage where tag='GTGAAACCCC'
LM 500
FM CSV
==> results
GSM1|GTGAAACCCC|63
GSM2|GTGAAACCCC|76
GSM571|GTGAAACCCC|13
GSM572|GTGAAACCCC|395
GSM573|GTGAAACCCC|359
GSM574|GTGAAACCCC|286
GSM668|GTGAAACCCC|132
GSM669|GTGAAACCCC|129
GSM670|GTGAAACCCC|112
//
ID * DB MYGEO
WH SQLNATIVE select e.sample_id,e.tag,e.count from MYGEO.entry_sage as e, MYGEO.sample as h where (h.ac=e.sample_id and tag='GTGAAACCCC') and h.platform_id='GPL4' and count between 1 and 3
LM 1000
FM CSV
Results
GSM135388|GTGAAACCCC|1|GPL4
//
ID * DB MYGEO
WH SQLNATIVE select sample_id,tag,count from MYGEO.entry_sage where tag='GTGAAACCCC' and count between 1 and 3
LM 1000
FM CSV
Example 10: Association rules (authorized user)
ID * DB protein_interaction
WH PATTERN AssociationPattern(Right(protA,ProtB,ProtC),Left(ProtK),sup=30,conf=90))
FD ID,Rules
FM FLAT
Example 11: Get GENE ONTOLOGY
ID Q32437 DB UNIPROT
FD AC,DR(GO)
Result:
AC Q32437;
DR GO; GO:0009507; C:chloroplast; IEA:InterPro.
DR GO; GO:0016021; C:integral to membrane; IEA:UniProtKB-KW.
DR GO; GO:0009523; C:photosystem II; IEA:UniProtKB-KW.
DR GO; GO:0008137; F:NADH dehydrogenase (ubiquinone) activity; IEA:InterPro.
DR GO; GO:0048038; F:quinone binding; IEA:UniProtKB-KW.
DR GO; GO:0042773; P:ATP synthesis coupled electron transport; IEA:InterPro.