Difference between revisions of "Architecture of Gscope"
Line 31: | Line 31: | ||
Since Pabyssi I didn't change the name of this central procedure. | Since Pabyssi I didn't change the name of this central procedure. | ||
− | To give a name to each 'PAB' of a project we use a prefix (ex. PAB oe BOX or EHomsa) and a 1, 2, 3 4 | + | To give a name to each 'PAB' of a project we use a '''prefix''' (ex. PAB oe BOX or EHomsa) and a 1, 2, 3, 4 or 5 digits number PAB0001 EHoma12345 |
===Gscope File Organisation=== | ===Gscope File Organisation=== | ||
− | Each Gscope project (we call it MyProject) is located in one directories tree. Starting at RepertoireDuGenome (normally /genomics/link/MyProject) | + | Each Gscope project (we call it MyProject) is located in one directories tree. Starting at RepertoireDuGenome (normally /genomics/link/MyProject) |
− | In | + | Suppose the prefix is MP and it concerns 2345 proteins ... from MP00001 to MP2345 |
− | * nuctfa | + | |
− | * nucembl | + | In directory /genomics/link/MyProject you'll find the directories |
− | * prottfa | + | * nuctfa containing the fasta file for each nucleic sequence (from MP0001 to MP2345) |
− | * protembl | + | * nucembl containing the embl format |
+ | * prottfa containing the fasta file for each proteic sequence (from MP0001 to MP2345) | ||
+ | * protembl containing the embl format | ||
+ | |||
+ | * blastp | ||
+ | * ballast | ||
+ | * msf | ||
+ | * msfleon | ||
+ | * macsimXml | ||
+ | * macsimcRsf | ||
+ | |||
+ | thes subdirectories are the default directories containting the default correspondin information '''BUT''' we coulmd imagine to create different blast for different datbases. In that case we culd have | ||
+ | |||
+ | * blastpProtall | ||
+ | * blastpUniref | ||
+ | |||
+ | and to keep the default directory we use link | ||
+ | |||
+ | blastp -> blastpProtall |
Revision as of 17:44, 8 January 2018
Architecture of Gscope
To undestand how it is today we need a brief overview of the Historical Evolution or Evolutionary History of Gscope
Gscope from the begining
Odile Lecompte, Olivier Poch and Raymond Ripp had to annotate the genome of Pirococcus abyssi.
Starting with the DNA sequence of Pyrococcsu abyssi (1765120 bases) we determined the genes and tried to find the function of each protein.
For that we needed to have an interactive visualization tool allowing to show the sequences, blast outputs, multiple alignments and many other things.
PAB
The Pabyssi gscope project handles DNA and protein sequences. Each one is represented as a rectangular box on the GscopeBoard.
We called it a PAB (from Pyrococcus AByssi) (and were never able to find a more generic name ... it could be Box or SeqEntity or ???)
Each one had an id PAB0001, PAB0002, ... (Numerotation may not be consecutive)
The procedure ListeDesPABs returns the list of all this ids. We use very often :
foreach Nom [ListeDesPABs] { DoSomething $Nom }
Since Pabyssi I didn't change the name of this central procedure.
To give a name to each 'PAB' of a project we use a prefix (ex. PAB oe BOX or EHomsa) and a 1, 2, 3, 4 or 5 digits number PAB0001 EHoma12345
Gscope File Organisation
Each Gscope project (we call it MyProject) is located in one directories tree. Starting at RepertoireDuGenome (normally /genomics/link/MyProject)
Suppose the prefix is MP and it concerns 2345 proteins ... from MP00001 to MP2345
In directory /genomics/link/MyProject you'll find the directories
- nuctfa containing the fasta file for each nucleic sequence (from MP0001 to MP2345)
- nucembl containing the embl format
- prottfa containing the fasta file for each proteic sequence (from MP0001 to MP2345)
- protembl containing the embl format
- blastp
- ballast
- msf
- msfleon
- macsimXml
- macsimcRsf
thes subdirectories are the default directories containting the default correspondin information BUT we coulmd imagine to create different blast for different datbases. In that case we culd have
- blastpProtall
- blastpUniref
and to keep the default directory we use link
blastp -> blastpProtall