Difference between revisions of "BIRDQL"
(→BIRDQL example) |
(→BIRDQL example) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==[[BIRDQL]] Biological Query Language == | ==[[BIRDQL]] Biological Query Language == | ||
===BIRDQL in few words=== | ===BIRDQL in few words=== | ||
− | The heterogeneous data integrated in [[BIRD]] System are represented by several relational tables. The exploitation of these data by SQL queries is not obvious except for developers or computer scientist experts. | + | This query language is conceived by Hoan Nguyen[http://lbgi.igbmc.fr/~nguyen/]. |
+ | |||
+ | The heterogeneous data integrated in integrator system or [[BIRD]] System are represented by several relational tables. The exploitation of these data by SQL queries is not obvious except for developers or computer scientist experts. | ||
Building queries with SQL in this context is not easy with because that requires to use joins (terme technique) to select data in multiple tables. This complexity must be hidden by HTML forms but a lot of queries can not be setup with HTML forms. | Building queries with SQL in this context is not easy with because that requires to use joins (terme technique) to select data in multiple tables. This complexity must be hidden by HTML forms but a lot of queries can not be setup with HTML forms. | ||
Line 7: | Line 9: | ||
We proposes own query language (BIRDQL), there is new standard biological query language allowing the biologist or clinician to create data retrieval protocols without exhaustive knowledge of the data sources and their architecture. BIRD System is driven with a high level query engine: BIRDQL, which makes it possible for biologists to express easily queries and to extract knowledge by classical constraints and scientific functions (StructuralDistance,SequencePattern,AssociationRule...). | We proposes own query language (BIRDQL), there is new standard biological query language allowing the biologist or clinician to create data retrieval protocols without exhaustive knowledge of the data sources and their architecture. BIRD System is driven with a high level query engine: BIRDQL, which makes it possible for biologists to express easily queries and to extract knowledge by classical constraints and scientific functions (StructuralDistance,SequencePattern,AssociationRule...). | ||
− | BIRDQL in not a mathematically complete language but indeed an idiom adpated to the GUI, human readable enough to be modified by hand. The construction of this BIRDQL query engine was used some main idea from SaadaQL [http://amwdb.u-strasbg.fr/saada/spip.php?article52] | + | BIRDQL in not a mathematically complete language but indeed an idiom adpated to the GUI, human readable enough to be modified by hand. The construction of this BIRDQL query engine was used some main idea from SaadaQL [http://amwdb.u-strasbg.fr/saada/spip.php?article52]. SaadaQL query language was developed in the framework of my PhD ( Astrophysics & Virtual Observatory ,2002-2005) at university of Strasbourg. |
Data can be selected with [[BIRD Data Access Protocol]] | Data can be selected with [[BIRD Data Access Protocol]] | ||
Line 24: | Line 26: | ||
WH PATTERN <function AssociationRule()> | WH PATTERN <function AssociationRule()> | ||
+ | |||
+ | WH SQLNative select from ... | ||
FD <[http://bird.u-strasbg.fr:8080/bird/bsearch?service=metadata&db=all Field out1,Field out2,...] / GET_COUNT/GET_DR(bankname)> | FD <[http://bird.u-strasbg.fr:8080/bird/bsearch?service=metadata&db=all Field out1,Field out2,...] / GET_COUNT/GET_DR(bankname)> | ||
Line 41: | Line 45: | ||
Data can be selected with [[BIRD Data Access Protocol]] | Data can be selected with [[BIRD Data Access Protocol]] | ||
− | |||
− | '''Example | + | Examples below also show how to use the BIRD-QL syntax. |
+ | |||
+ | |||
+ | '''Example ''': simple query, Full Text search | ||
+ | |||
+ | |||
+ | ID * DB MSV3d (Missense Variant Database) | ||
+ | |||
+ | WH TEXT contains "DMD" | ||
+ | |||
+ | FD ID | ||
+ | |||
+ | LM 100 | ||
+ | |||
+ | FM JSON | ||
+ | |||
+ | Result | ||
+ | |||
+ | |||
+ | // | ||
+ | |||
+ | '''Example ''': simple query, search and fasta format generation | ||
ID * DB UNIPROT | ID * DB UNIPROT | ||
− | WH | + | WH TEXT contains "synthetase" & "tyrosyl" & not ("homo sapiens" & "human") |
− | + | FD AC, ID,DE,OX,SQ | |
− | + | LM 100 | |
FM FASTA | FM FASTA | ||
− | |||
Result | Result | ||
Line 63: | Line 86: | ||
MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI... | MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI... | ||
+ | // | ||
+ | |||
+ | |||
+ | |||
+ | '''Example ''': DBSNP | ||
+ | |||
+ | |||
+ | '''Example ''': | ||
+ | |||
+ | get DBSNP in XML by ID | ||
+ | |||
+ | // | ||
+ | |||
+ | ID 268 DB DBSNP | ||
+ | |||
+ | |||
+ | |||
+ | find snp by position | ||
+ | |||
+ | // | ||
+ | |||
+ | ID * DB DBSNP | ||
+ | |||
+ | WH SQLNative select id from dbsnp_ds_ch3.fulltext where XMLEXISTS('$i/Rs/Assembly/Component/MapLoc[@physMapInt=30466018] ' passing text as "i") | ||
+ | |||
+ | LM 1000 | ||
+ | |||
+ | FM FLAT | ||
+ | |||
+ | '''Example ''': find snp by position | ||
+ | ID * DB DBSNP | ||
+ | WH SQLNative select id from dbsnp_ds_ch18.fulltext where XMLEXISTS('$i/Rs/Assembly/Component/MapLoc[@physMapInt>=30466000 and @physMapInt<=30466200 ] ' passing text as "i") | ||
+ | FM FLAT | ||
+ | // | ||
− | |||
− | |||
− | + | '''Example ''': find snp by position and reference sequence (GRCh37.p5) | |
− | |||
− | FM | + | ID * DB DBSNP |
+ | WH SQLNative Select ID from dbsnp_ds_ch8.fulltext where XMLEXISTS('$i/Rs/Assembly/Component/MapLoc[@physMapInt=19817621 and ../../@groupLabel="GRCh37.p5"] ' passing text as "i") | ||
+ | FM FLAT | ||
+ | // | ||
+ | |||
+ | |||
+ | // | ||
+ | |||
+ | ID * DB UNIPROT | ||
+ | |||
+ | WH TEXT contains "histone" & not "homo sapiens" | ||
+ | |||
+ | FD AC,DE,OS | ||
+ | |||
+ | LM 3 | ||
+ | |||
+ | FM FLAT | ||
+ | |||
+ | // | ||
+ | |||
+ | ID * DB UNIPROT | ||
+ | WH TEXT contains not "homo sapiens" | ||
+ | FD AC,DE,OS | ||
+ | LM 3 | ||
'''Example 2''': complex query, GBFULL=EST+ WGS +Release +New | '''Example 2''': complex query, GBFULL=EST+ WGS +Release +New | ||
Line 207: | Line 284: | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | '''Example 7''': Get GENE ONTOLOGY or DBREF | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | '''Example | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
ID Q32437 DB UNIPROT | ID Q32437 DB UNIPROT | ||
Line 354: | Line 308: | ||
AC Q34215; | AC Q34215; | ||
DR Pfam; PF00033; Cytochrom_B_N; 1. | DR Pfam; PF00033; Cytochrom_B_N; 1. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
[[Category:Bird_project]] | [[Category:Bird_project]] |
Latest revision as of 10:04, 10 March 2014
Contents
BIRDQL Biological Query Language
BIRDQL in few words
This query language is conceived by Hoan Nguyen[1].
The heterogeneous data integrated in integrator system or BIRD System are represented by several relational tables. The exploitation of these data by SQL queries is not obvious except for developers or computer scientist experts.
Building queries with SQL in this context is not easy with because that requires to use joins (terme technique) to select data in multiple tables. This complexity must be hidden by HTML forms but a lot of queries can not be setup with HTML forms.
We proposes own query language (BIRDQL), there is new standard biological query language allowing the biologist or clinician to create data retrieval protocols without exhaustive knowledge of the data sources and their architecture. BIRD System is driven with a high level query engine: BIRDQL, which makes it possible for biologists to express easily queries and to extract knowledge by classical constraints and scientific functions (StructuralDistance,SequencePattern,AssociationRule...).
BIRDQL in not a mathematically complete language but indeed an idiom adpated to the GUI, human readable enough to be modified by hand. The construction of this BIRDQL query engine was used some main idea from SaadaQL [2]. SaadaQL query language was developed in the framework of my PhD ( Astrophysics & Virtual Observatory ,2002-2005) at university of Strasbourg.
Data can be selected with BIRD Data Access Protocol
BIRDQL Grammar
ID <list of id/ac/query_id > DB <bank names>
WH <Field> Contains <(kw1 & kw2) | kw_n>
WH PATTERN <function SequencePattern() >
WH PATTERN <function DiagonalMolecule()>
WH PATTERN <function InteractionProtein()>
WH PATTERN <function AssociationRule()>
WH SQLNative select from ...
FD <Field out1,Field out2,... / GET_COUNT/GET_DR(bankname)>
OF <OFFSET, Default OF=0>
LM <number of maximum display>
FM <Fasta/Flat/Xml/CSV/Simple/Object/OID>
BIRDQL example
Data can be selected with BIRD Data Access Protocol
Examples below also show how to use the BIRD-QL syntax.
Example : simple query, Full Text search
ID * DB MSV3d (Missense Variant Database)
WH TEXT contains "DMD"
FD ID
LM 100
FM JSON
Result
//
Example : simple query, search and fasta format generation
ID * DB UNIPROT
WH TEXT contains "synthetase" & "tyrosyl" & not ("homo sapiens" & "human")
FD AC, ID,DE,OX,SQ
LM 100
FM FASTA
Result
>Q92PK5 | SYY_RHIME | Tyrosyl-tRNA synthetase (EC 6.1.1.1) (Tyrosine--tRNA ligase) (TyrRS). | 382
MSEFKSDFLHTLSERGFIHQTSDDAGLDQLFRTETVTAYIGFDPTAASLHAGGLIQIMMLHWLQATGHRPISLMGGGTGMVGDPSFKDEARQLMTPETI...
//
Example : DBSNP
Example :
get DBSNP in XML by ID
//
ID 268 DB DBSNP
find snp by position
//
ID * DB DBSNP
WH SQLNative select id from dbsnp_ds_ch3.fulltext where XMLEXISTS('$i/Rs/Assembly/Component/MapLoc[@physMapInt=30466018] ' passing text as "i")
LM 1000
FM FLAT
Example : find snp by position
ID * DB DBSNP WH SQLNative select id from dbsnp_ds_ch18.fulltext where XMLEXISTS('$i/Rs/Assembly/Component/MapLoc[@physMapInt>=30466000 and @physMapInt<=30466200 ] ' passing text as "i") FM FLAT //
Example : find snp by position and reference sequence (GRCh37.p5)
ID * DB DBSNP
WH SQLNative Select ID from dbsnp_ds_ch8.fulltext where XMLEXISTS('$i/Rs/Assembly/Component/MapLoc[@physMapInt=19817621 and ../../@groupLabel="GRCh37.p5"] ' passing text as "i")
FM FLAT
//
//
ID * DB UNIPROT
WH TEXT contains "histone" & not "homo sapiens"
FD AC,DE,OS
LM 3
FM FLAT
//
ID * DB UNIPROT
WH TEXT contains not "homo sapiens"
FD AC,DE,OS
LM 3
Example 2: complex query, GBFULL=EST+ WGS +Release +New
ID * DB GBFULL
WH OC Contains "Eukaryote"
WH DR Contains "GO"
WH GENE contains "GF100027"
FM FASTA
The query above allow to search in Genbank full, the Eucaryotic sequences containing the GF100027 gene with a cross reference in GeneOntology.
Example 3: mining in GENBANK EST
ID * DB GBEST
WH TISSUE_TYPE contains "retina"
WH DEV_STAGE contains "adult"
LM 100
FD AC,DE,OX,OC,tissue_type,dev_stage,chr
FM FLAT
Example 4: Mining in GENBANK EST
ID CJ133635,CJ133593,CJ133659 DB GBEST
WH DE contains "AMINOTRANSFERASE"
WH OC contains "Eukaryota" & not "Metazoa"
WH TISSUE_TYPE contains "retina"
FD AC,DE,OX,OC,tissue_type,dev_stage,chr
FM FLAT
Example 5: Mining in EST
ID * DB GBEST
WH TISSUE_TYPE contains "colon"
WH DEV_STAGE contains "adult"
LM 100
FD AC,DE,OX,OC,tissue_type,dev_stage,chr,os
FM FLAT
Example 6: Mining In PDB
ID * DB PDB
WH TEXT contains "DMD" & "ERYTHRINA CORALLODENDRON"
LM 10
FM FASTA
//
ID * DB PDB
WH TEXT contains "METAL BINDING PROTEIN" & "LACTOFERRIN"
WH FUNCTION Diagnonal3D()>125
FUZZY 100
LM 100
FM FASTA
//
ID * DB PDB
WH TEXT "METAL BINDING PROTEIN" & "LACTOFERRIN"
WH FUNCTION Diagnonal3D()>125
FUZZY 100
LM 100
FM SIMPLE
//
ID * DB PDB
WH CL contains "METAL BINDING PROTEIN"
WH DE contains "LACTOFERRIN"
WH FUNCTION Diagnonal3D()>125
LM 10
FM FLAT
//
ID * DB PDB
WH CL contains "METAL BINDING PROTEIN"
WH DE contains "LACTOFERRIN"
WH FUNCTION Diagnonal3D()>125
FD GET_COUNT
FM FLAT
Example 7: Get GENE ONTOLOGY or DBREF
ID Q32437 DB UNIPROT
FD AC,DR(GO)
//
ID Q34215 DB UNIPROT FD AC,DR(InterPro)
>>Result:
AC Q32437; DR GO; GO:0009507; C:chloroplast; IEA:InterPro. DR GO; GO:0016021; C:integral to membrane; IEA:UniProtKB-KW. ...... // AC Q34215; DR Pfam; PF00033; Cytochrom_B_N; 1.