AmdConsortium
From Wiki
(40 intermediate revisions not shown.) | |||
Line 1: | Line 1: | ||
- | The AMD Consortium within EVI-Genoret Database | + | The AMD Consortium within EVI-Genoret Database by Raymond Ripp |
The Genoret Database aims to host all data for the Amd Consortium. | The Genoret Database aims to host all data for the Amd Consortium. | ||
See the [http://www-genoret.u-strasbg.fr/genoret/AmdConsortium AMD Consortium Welcome Page] | See the [http://www-genoret.u-strasbg.fr/genoret/AmdConsortium AMD Consortium Welcome Page] | ||
- | ==Integration of the | + | |
- | * Baltimore January version | + | ==Introduction== |
+ | * I got a lot of different files from all centers, from 2006 up to october 2010. | ||
+ | * Files came from Baltimore, Bonn, Creteil, Jerusalem, London, Paris, Southampton and from the CNG (Evry) | ||
+ | * Very often people didn't use the same identities, column title and content ... | ||
+ | ** | ||
+ | * I tried to run as much verifications as possible to detect inconsistency, error, missings, etc. | ||
+ | |||
+ | ==Mise à jour pour CIDR juin 2012== | ||
+ | Le CIDR demandait un fichier avec tous les patients .. Branle-bas de combat, tout le monde a réagi et j'ai eu de nouvelles infos. | ||
+ | * de la part de Diana du CNG : AMD_142_Leveillard_120525.xlsx | ||
+ | ** 4897 lignes mais avec Cambridge (Yates) qui ne nous concerne pas et London (Webster) dont les échatillons sont retournés à Londres | ||
+ | ** j'ai changé les entêtes pour créer AMD_142_Leveillard_120525.csv | ||
+ | ** il est lu par la proc CngList qui normalise le sexe et normalise les Individual de Baltimore et Paris (et simule ceux de Paris20120604) | ||
+ | * de la part de Isabelle Audo : AMDCOHORT06062012AnonymeWithConcentration.xls | ||
+ | ** 1142 patients avec Bleeding Sex Birth CicId Family Concentration Diagnosis | ||
+ | ** il y avait 2 doublons | ||
+ | ** j'ai corrigé les date de bleeding car elles étaient sensés être par ordre chronologique | ||
+ | ** avec cette date et le birth j'ai pu calculer le AgeAtOnset | ||
+ | ** pour les diagnostics voir AmdParis dans genoret_amdconsortium.tcl. En gros il y a des diag pour les 2 yeux pour le OD et/ou le OS. J'en ai fait un OD et OS systématique. | ||
+ | ** J'ai créé un Paris20120604 en parallèle avec Paris qui sera à supprimer. | ||
+ | * Pour le CIDR il faut la liste de tout le monde. | ||
+ | ** Frédéric avait fait le fichier EU_JHU_For_CIDR.csv en récoltant tout ce qu'il pouvait dans ma base. En virant Londres et en ne mettant pas Baltimore. | ||
+ | ** Je suis parti de ce fichier et y ai rajouté Baltimore | ||
+ | ** Pour créé EU_JHU_for_CIDR_20120618.csv qu'on a envoyé au CIDR | ||
+ | ** J'ai eu beaucoup de mal avec les diagnostics et la méthode n'est pas très bonne.Il faudrait faire les OD et OS séparés et rajouter une priorité du genre | ||
+ | ga et cnv (= gacnv) > ga et non cnv = cnv et non ga > large drusen > intermediate > control ... Le unknown/missing plane sur tout ça ! | ||
+ | ** J'ai rajouté la Barcode mais il en manque pour beaucoup, mais c'est normal le CNG n'a pas tout fourni. | ||
+ | ==Additionnal information from Diana 20111004== | ||
+ | Diana sent us the Excell file Final_dat_AMD_P28_29_collections.xls containing additionnal and existing CngStatus. | ||
+ | |||
+ | I wrote the tcl program VerificationP28 : | ||
+ | * 1 new Creteil Nat2-285 | ||
+ | * 1 Baltimore has wrong sex dasn AMdCommon (or CgnStatus ...) 464001 | ||
+ | * 262 new London for AmdCommon | ||
+ | * 1718 new London for CngStatus (0 before) | ||
+ | * 18 new Southapton | ||
+ | |||
+ | Diana sent us also AMD_JYates_110529.xls containing Indexes, Barcode and LabCenter. | ||
+ | I wrote CngBarcode | ||
+ | |||
+ | ==Sql tables== | ||
+ | ===Original table=== | ||
+ | I created a Sql table for each center independently from others with the raw data. | ||
+ | |||
+ | So I create the tables Baltimore, BaltimoreSeptember, Bonn, Creteil, Jerusalem, London, Paris, Southampton. | ||
+ | |||
+ | ===AmdCommon=== | ||
+ | These tables where very different and I decided to create a common table '''AmdCommon''' with following columns | ||
+ | |||
+ | amdcentre amdid amdbirth amdsex amdsmoke amddiagod amddiagos amddiagodos amdwhicheye amdownpk pk | ||
+ | |||
+ | amdownpk is the link to the original data. | ||
+ | |||
+ | ===CngStatus=== | ||
+ | Only a subset of these patients were genotyped. So I created the '''CngStatus''' table with the information I got grom the CNG | ||
+ | |||
+ | lab_center pedigree individual father mother sex status coll_comment barcode pk diag basicdiag birth | ||
+ | |||
+ | The most important columns are | ||
+ | * barcode corresponding to the blood sample | ||
+ | * '''diag''' obtained by a mix of amddiagod amddiagos amddiagodos | ||
+ | * '''basicdiag''' obtained from diag indicating ga cnv control amd | ||
+ | |||
+ | ===Managing BasicDiag from Diag=== | ||
+ | My Tcl functions BasicDiag and SetBasicDiagForCng allow to manage it. | ||
+ | |||
+ | * BasicDiag uses the lookup table from the file CngStatusDiag.txt | ||
+ | * SetBasicDiagForCng calls BasicDiag for all '''diag''' and if it doesn't find it, it appends the missing '''diag''' to CngStatusDiagToDo.txt | ||
+ | * CngStatusDiagToDo.txt can be viewed from the AmdConsortium welcome page ... and I hope we will solve all remaining problems ... | ||
+ | |||
+ | ===NoFundus and ambigious patients from Bonn 2011/02/09=== | ||
+ | Jürgen Kaschkötö sent us two files | ||
+ | * DeBonn/Bonn-Regensburg-Phenotype.xls containing 84 lines as for example | ||
+ | |||
+ | H-028-GA;;;;Late AMD GA;Late AMD GA | ||
+ | H-030-GA;SO25121919;H-030-2-OS;Individual was SO25121919 , now H-030-2-OS;Late AMD GA;Late AMD GA | ||
+ | H-053-GA;NM27021930;H-053-2-MN;Individual was NM27021930 , now H-053-2-MN;Late AMD GA;Late AMD GA | ||
+ | etc. | ||
+ | |||
+ | these patients where all with NoFundus | ||
+ | |||
+ | * DeBonn/AmbigiousSolved.txt cointaining 20 lines as forexample | ||
+ | |||
+ | L-003-GA L-003-2-AK | ||
+ | L-007-GA L-007-2-GH | ||
+ | L-018-GA L-018-2-EN | ||
+ | etc. | ||
+ | |||
+ | the entries L-123-GA were ambigious because L-003-GA <==> ambigious L-003-1-HW L-003-2-AK | ||
+ | |||
+ | I decide NOT TO UPDATE the origial database which was obtained by a sql request to Tuebingen. | ||
+ | |||
+ | I updated AmdCommon and CngStatus on 2011/02/16 with the NoFundus patient. | ||
+ | I'll do it soon for the ambigious. | ||
+ | |||
+ | ===Integration of AMD_COLL_xxx files containing all genotypes 2010/06/18=== | ||
+ | On https://beaune.cng.fr/amd/ I found the files | ||
+ | - | ||
+ | AMD_COLL_AUDO.tar 17-Jun-2010 11:50 35M | ||
+ | AMD_COLL_LOTERY.tar 17-Jun-2010 11:49 128M | ||
+ | AMD_COLL_SCHOLL.tar 15-Jun-2010 15:29 70M | ||
+ | AMD_COLL_SOUIED.tar 17-Jun-2010 11:52 70M | ||
+ | AMD_COLL_ZACK.tar 17-Jun-2010 11:53 41M | ||
+ | results_AMD.txt.gz 17-Sep-2009 10:40 11M | ||
+ | |||
+ | * result_AMD.txt.gz was the same I already had. | ||
+ | * I did tar -xvf AMD_COLL_xxxx.tar creating the correspondig directory containing (ie Audo) | ||
+ | |||
+ | 64465487 2008-09-01 10:46 Final_dat_AMD_COLL_AUDO.txt | ||
+ | 717 2008-08-12 10:01 report.txt | ||
+ | 26731497 2008-08-12 09:52 success_mark.txt | ||
+ | 3139 2008-08-12 09:52 success_samp.txt | ||
+ | |||
+ | The Final_dat_AMD_COLL_xxx contain | ||
+ | |||
+ | Family Individual Father Mother Sex Status rs12354060 rs12184279 rs12564807 rs3115860 ... ... ... (with 332224 rs) | ||
+ | |||
+ | I created a small one called xxxx.txt (Audo, Lotery, Scholl, Souied, Zack) with the first 10 columns, run '''gscope CngGenoPheno''' and got the CngGenoPheno list available on the AmdConsortium welcome page 'Phenotypic Information. | ||
+ | |||
+ | * Only a few references from Bonn are still missing | ||
+ | but why are most of Individual like 'ZM23041920' (I found them because I had the coll_comment field containing 'Individual was ZM23041920 , now H-208-2-MZ' ) | ||
+ | * Paris, Creteil and Southampton are ok, Baltimore seems to be ok but there are still multiple diagnosis ... | ||
+ | |||
+ | ==Integration of the CNG Status 2008/10/01== | ||
+ | # Reading the file AMD_verif_statuts_Leveillard_envoye_1oct08.xls | ||
+ | ## I normalized the headers (without blank, no /) | ||
+ | ## I set sex to M or F instead of 1 or 2 | ||
+ | ## I replaced the centre with respectively Bonn, Creteil ... | ||
+ | ## I replaced NAT2-xyz by xyz for Creteil (some sex values don't correpond) | ||
+ | ## I remove TL to obtain the Paris ID | ||
+ | ## I took the beginning of the Bonn's Ids (L-060-GA => L-060- but some Ids are ambiguous L-060-1-BH or L-060-2-KS ? | ||
+ | # Reading the files snplist_NXNL2.xlt snplist_TXNL6.xlt genotypeNXNL2.xls genotypeTXNL6.xls | ||
+ | ## convert to .csv | ||
+ | ## remove the comment lines at beginning | ||
+ | |||
+ | ==Integration of the Quality Control by Raymond Ripp 2008/04== | ||
+ | 2010/10/06 I have to integrate QC_results_AMD_Don_Zack_complete_sent_7april08.xls in the AllTogether table | ||
+ | |||
+ | With this qc_alltogether table I could feed the AmdCommon in which we mist a lot Baltimore | ||
+ | |||
+ | Comparing diagnosis from sep 2007 with apr 2008 I did minor corrections in Apr 2008 (upper case lower cae etc,) and major differences are listed below | ||
+ | 2 004001 DiagSep=" " DiagApr2008="High Risk AMD" | ||
+ | 3 004001 DiagSep=" " DiagApr2008="High Risk AMD" | ||
+ | 31 034002 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | 32 034002 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | 42 043002 DiagSep=" " DiagApr2008="Unaffected/No Pathology" | ||
+ | 50 049001 DiagSep="Advanced GA" DiagApr2008=" " | ||
+ | 52 052004 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | 88 107010 DiagSep="Low risk AMD" DiagApr2008="Unaffected/No Pathology" | ||
+ | 129 139002 DiagSep="Exudative AMD" DiagApr2008="Advanced GA and Exudative AMD" | ||
+ | 206 185002 DiagSep=" " DiagApr2008="Exudative AMD" | ||
+ | 301 247020 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | 300 247020 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | 325 258001 DiagSep="Can't Confirm Dx" DiagApr2008="Can't Grade" | ||
+ | 457 316002 DiagSep=" " DiagApr2008="Exudative AMD" | ||
+ | 586 406002 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | 631 435006 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | 630 435006 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | 666 447002 DiagSep=" " DiagApr2008="High Risk AMD" | ||
+ | 678 452003 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | 677 452003 DiagSep=" " DiagApr2008="Not confirmed" | ||
+ | I set Advanced GA for 049001 because it was already for Pk 49 | ||
+ | |||
+ | Important changes are 107010 139002 258001 | ||
+ | (written 2010/09/20) | ||
+ | I got following files | ||
+ | |||
+ | 2007-12-12 10:08 QualityControlCreteil.xls | ||
+ | 2007-12-12 10:20 QualityControlCreteil.csv | ||
+ | 2007-12-12 10:36 QualityControlBaltimore.csv | ||
+ | 2007-12-12 10:40 QualityControlBonn.csv | ||
+ | 2007-12-12 10:46 QualityControl.csv | ||
+ | 2008-04-07 13:44 QC_results_AMD_Audo_sent_7april08.xls | ||
+ | 2008-04-07 14:52 QC_results_AMD_Don_Zack_complete_sent_7april08.xls (this was not integrated before 2010/10/06) | ||
+ | 2008-05-14 09:23 QualityControlBonn.xls | ||
+ | 2008-05-14 09:23 QualityControlBaltimore.xls | ||
+ | 2008-05-15 18:08 AMD_QC_results_sent_batch2_may08.zip | ||
+ | 2009-04-07 14:29 QualityControlParis.csv | ||
+ | |||
+ | and created the Quality Control tables | ||
+ | (see 'List all available Quality Controls' and per center on the AmdConsortium welcome page) | ||
+ | |||
+ | ==Integration of the Phenotyping Data by Raymond 2008/02/11== | ||
+ | |||
+ | * '''Baltimore''' January version | ||
# Betsy Campochiaro sent several Excel files corresponding to an Access database | # Betsy Campochiaro sent several Excel files corresponding to an Access database | ||
- | # Creation of a | + | # Creation of a .csv file containing nearby all fields pointed as secondary tables (Genoret Tcl program) |
# Integration of this file in the csvschema table csvt8 | # Integration of this file in the csvschema table csvt8 | ||
# Detection of errors with the Genoret Tcl program | # Detection of errors with the Genoret Tcl program | ||
# Corrections and updates of small errors | # Corrections and updates of small errors | ||
- | # Add ped to father_id and mother_id ( | + | # Add ped to father_id and mother_id (2008/01/11) |
- | * Bonn | + | # I replaced the numbers corresponding to the diagnosis with the text of the diagnosis (2008/01/10) |
+ | |||
+ | * '''Bonn''' | ||
# Direct connection to the Phenotyping Database in Tuebingen (2007-11) | # Direct connection to the Phenotyping Database in Tuebingen (2007-11) | ||
# Save the display of all patients as .csv file | # Save the display of all patients as .csv file | ||
Line 21: | Line 207: | ||
# Keep only the centre Bonn or the FamStudy (ok) | # Keep only the centre Bonn or the FamStudy (ok) | ||
# Some values were stored as boolean and couldn't be displayed correctly (a small square). I replaced nul by NO and not null by YES | # Some values were stored as boolean and couldn't be displayed correctly (a small square). I replaced nul by NO and not null by YES | ||
- | * Creteil | + | |
+ | * '''Creteil''' | ||
# Eric Souied sent an Excel file | # Eric Souied sent an Excel file | ||
# I removed birthdays and created a csvt table | # I removed birthdays and created a csvt table | ||
- | * Paris CIC | + | # As the file contains only CNV patients (without this diagnosis) I add CNV to the AmdDiagOdOs column in the common table. |
+ | |||
+ | * '''Paris CIC''' | ||
# Isabelle Audo sent an Excel file | # Isabelle Audo sent an Excel file | ||
- | * Jerusalem | + | |
+ | * '''Jerusalem''' | ||
# Itay Chowers sent a new file 2007/12/24 and a .doc file db_codes | # Itay Chowers sent a new file 2007/12/24 and a .doc file db_codes | ||
- | + | ## I deleted all empty rows especially at the end | |
- | + | ## and I added the missing empty columns in the rows at the end | |
- | ## | + | ## I got the db_codes and I replaced the initial_stage_fellow_eye_areds (2 with J=DRY-2, etc.) in the commonview (2008/02/05). |
- | + | # 2008/02/15 I merged different columns to get an integrated diagnosis. It depends now on the firsteyewithamd, etc. See the AmdConsortium Welcome page. | |
- | ## | + | * '''London''' |
- | ## | + | # I got the data from Andrew Webster as Excel file corresponding to his Access database. |
- | # | + | |
- | * London | + | |
- | # I got the data from Andrew Webster as | + | |
# I removed names and birthdates | # I removed names and birthdates | ||
# Guillaume created a local SQL database for these level2 data. | # Guillaume created a local SQL database for these level2 data. | ||
# He created also a level1 database (as defiened by Tuebingen) with the data from Montpellier and Tuebingen. | # He created also a level1 database (as defiened by Tuebingen) with the data from Montpellier and Tuebingen. | ||
# I'll extract the London data to create a csvt table (not yet done) | # I'll extract the London data to create a csvt table (not yet done) | ||
- | * Southampton | + | |
+ | * '''Southampton''' | ||
# Angela sent the .xls file | # Angela sent the .xls file | ||
# Raymond upload it in AmdConsortium Gallery | # Raymond upload it in AmdConsortium Gallery | ||
Line 50: | Line 238: | ||
# Integrate it as csvt27 | # Integrate it as csvt27 | ||
# Modify birthday to keep only the year. And some dates were written as dd.mm.yy | # Modify birthday to keep only the year. And some dates were written as dd.mm.yy | ||
+ | # I merged the cohort, diag_dry, diag_wet_amd, amd_, consolidated_areds in one value (2008/02/05) |
Current revision
The AMD Consortium within EVI-Genoret Database by Raymond Ripp
The Genoret Database aims to host all data for the Amd Consortium.
See the AMD Consortium Welcome Page
Contents |
Introduction
- I got a lot of different files from all centers, from 2006 up to october 2010.
- Files came from Baltimore, Bonn, Creteil, Jerusalem, London, Paris, Southampton and from the CNG (Evry)
- Very often people didn't use the same identities, column title and content ...
- I tried to run as much verifications as possible to detect inconsistency, error, missings, etc.
Mise à jour pour CIDR juin 2012
Le CIDR demandait un fichier avec tous les patients .. Branle-bas de combat, tout le monde a réagi et j'ai eu de nouvelles infos.
- de la part de Diana du CNG : AMD_142_Leveillard_120525.xlsx
- 4897 lignes mais avec Cambridge (Yates) qui ne nous concerne pas et London (Webster) dont les échatillons sont retournés à Londres
- j'ai changé les entêtes pour créer AMD_142_Leveillard_120525.csv
- il est lu par la proc CngList qui normalise le sexe et normalise les Individual de Baltimore et Paris (et simule ceux de Paris20120604)
- de la part de Isabelle Audo : AMDCOHORT06062012AnonymeWithConcentration.xls
- 1142 patients avec Bleeding Sex Birth CicId Family Concentration Diagnosis
- il y avait 2 doublons
- j'ai corrigé les date de bleeding car elles étaient sensés être par ordre chronologique
- avec cette date et le birth j'ai pu calculer le AgeAtOnset
- pour les diagnostics voir AmdParis dans genoret_amdconsortium.tcl. En gros il y a des diag pour les 2 yeux pour le OD et/ou le OS. J'en ai fait un OD et OS systématique.
- J'ai créé un Paris20120604 en parallèle avec Paris qui sera à supprimer.
- Pour le CIDR il faut la liste de tout le monde.
- Frédéric avait fait le fichier EU_JHU_For_CIDR.csv en récoltant tout ce qu'il pouvait dans ma base. En virant Londres et en ne mettant pas Baltimore.
- Je suis parti de ce fichier et y ai rajouté Baltimore
- Pour créé EU_JHU_for_CIDR_20120618.csv qu'on a envoyé au CIDR
- J'ai eu beaucoup de mal avec les diagnostics et la méthode n'est pas très bonne.Il faudrait faire les OD et OS séparés et rajouter une priorité du genre
ga et cnv (= gacnv) > ga et non cnv = cnv et non ga > large drusen > intermediate > control ... Le unknown/missing plane sur tout ça !
- J'ai rajouté la Barcode mais il en manque pour beaucoup, mais c'est normal le CNG n'a pas tout fourni.
Additionnal information from Diana 20111004
Diana sent us the Excell file Final_dat_AMD_P28_29_collections.xls containing additionnal and existing CngStatus.
I wrote the tcl program VerificationP28 :
- 1 new Creteil Nat2-285
- 1 Baltimore has wrong sex dasn AMdCommon (or CgnStatus ...) 464001
- 262 new London for AmdCommon
- 1718 new London for CngStatus (0 before)
- 18 new Southapton
Diana sent us also AMD_JYates_110529.xls containing Indexes, Barcode and LabCenter. I wrote CngBarcode
Sql tables
Original table
I created a Sql table for each center independently from others with the raw data.
So I create the tables Baltimore, BaltimoreSeptember, Bonn, Creteil, Jerusalem, London, Paris, Southampton.
AmdCommon
These tables where very different and I decided to create a common table AmdCommon with following columns
amdcentre amdid amdbirth amdsex amdsmoke amddiagod amddiagos amddiagodos amdwhicheye amdownpk pk
amdownpk is the link to the original data.
CngStatus
Only a subset of these patients were genotyped. So I created the CngStatus table with the information I got grom the CNG
lab_center pedigree individual father mother sex status coll_comment barcode pk diag basicdiag birth
The most important columns are
- barcode corresponding to the blood sample
- diag obtained by a mix of amddiagod amddiagos amddiagodos
- basicdiag obtained from diag indicating ga cnv control amd
Managing BasicDiag from Diag
My Tcl functions BasicDiag and SetBasicDiagForCng allow to manage it.
- BasicDiag uses the lookup table from the file CngStatusDiag.txt
- SetBasicDiagForCng calls BasicDiag for all diag and if it doesn't find it, it appends the missing diag to CngStatusDiagToDo.txt
- CngStatusDiagToDo.txt can be viewed from the AmdConsortium welcome page ... and I hope we will solve all remaining problems ...
NoFundus and ambigious patients from Bonn 2011/02/09
Jürgen Kaschkötö sent us two files
- DeBonn/Bonn-Regensburg-Phenotype.xls containing 84 lines as for example
H-028-GA;;;;Late AMD GA;Late AMD GA H-030-GA;SO25121919;H-030-2-OS;Individual was SO25121919 , now H-030-2-OS;Late AMD GA;Late AMD GA H-053-GA;NM27021930;H-053-2-MN;Individual was NM27021930 , now H-053-2-MN;Late AMD GA;Late AMD GA etc.
these patients where all with NoFundus
- DeBonn/AmbigiousSolved.txt cointaining 20 lines as forexample
L-003-GA L-003-2-AK L-007-GA L-007-2-GH L-018-GA L-018-2-EN etc.
the entries L-123-GA were ambigious because L-003-GA <==> ambigious L-003-1-HW L-003-2-AK
I decide NOT TO UPDATE the origial database which was obtained by a sql request to Tuebingen.
I updated AmdCommon and CngStatus on 2011/02/16 with the NoFundus patient. I'll do it soon for the ambigious.
Integration of AMD_COLL_xxx files containing all genotypes 2010/06/18
On https://beaune.cng.fr/amd/ I found the files
- AMD_COLL_AUDO.tar 17-Jun-2010 11:50 35M AMD_COLL_LOTERY.tar 17-Jun-2010 11:49 128M AMD_COLL_SCHOLL.tar 15-Jun-2010 15:29 70M AMD_COLL_SOUIED.tar 17-Jun-2010 11:52 70M AMD_COLL_ZACK.tar 17-Jun-2010 11:53 41M results_AMD.txt.gz 17-Sep-2009 10:40 11M
- result_AMD.txt.gz was the same I already had.
- I did tar -xvf AMD_COLL_xxxx.tar creating the correspondig directory containing (ie Audo)
64465487 2008-09-01 10:46 Final_dat_AMD_COLL_AUDO.txt 717 2008-08-12 10:01 report.txt 26731497 2008-08-12 09:52 success_mark.txt 3139 2008-08-12 09:52 success_samp.txt
The Final_dat_AMD_COLL_xxx contain
Family Individual Father Mother Sex Status rs12354060 rs12184279 rs12564807 rs3115860 ... ... ... (with 332224 rs)
I created a small one called xxxx.txt (Audo, Lotery, Scholl, Souied, Zack) with the first 10 columns, run gscope CngGenoPheno and got the CngGenoPheno list available on the AmdConsortium welcome page 'Phenotypic Information.
- Only a few references from Bonn are still missing
but why are most of Individual like 'ZM23041920' (I found them because I had the coll_comment field containing 'Individual was ZM23041920 , now H-208-2-MZ' )
- Paris, Creteil and Southampton are ok, Baltimore seems to be ok but there are still multiple diagnosis ...
Integration of the CNG Status 2008/10/01
- Reading the file AMD_verif_statuts_Leveillard_envoye_1oct08.xls
- I normalized the headers (without blank, no /)
- I set sex to M or F instead of 1 or 2
- I replaced the centre with respectively Bonn, Creteil ...
- I replaced NAT2-xyz by xyz for Creteil (some sex values don't correpond)
- I remove TL to obtain the Paris ID
- I took the beginning of the Bonn's Ids (L-060-GA => L-060- but some Ids are ambiguous L-060-1-BH or L-060-2-KS ?
- Reading the files snplist_NXNL2.xlt snplist_TXNL6.xlt genotypeNXNL2.xls genotypeTXNL6.xls
- convert to .csv
- remove the comment lines at beginning
Integration of the Quality Control by Raymond Ripp 2008/04
2010/10/06 I have to integrate QC_results_AMD_Don_Zack_complete_sent_7april08.xls in the AllTogether table
With this qc_alltogether table I could feed the AmdCommon in which we mist a lot Baltimore
Comparing diagnosis from sep 2007 with apr 2008 I did minor corrections in Apr 2008 (upper case lower cae etc,) and major differences are listed below
2 004001 DiagSep=" " DiagApr2008="High Risk AMD" 3 004001 DiagSep=" " DiagApr2008="High Risk AMD" 31 034002 DiagSep=" " DiagApr2008="Not confirmed" 32 034002 DiagSep=" " DiagApr2008="Not confirmed" 42 043002 DiagSep=" " DiagApr2008="Unaffected/No Pathology" 50 049001 DiagSep="Advanced GA" DiagApr2008=" " 52 052004 DiagSep=" " DiagApr2008="Not confirmed" 88 107010 DiagSep="Low risk AMD" DiagApr2008="Unaffected/No Pathology" 129 139002 DiagSep="Exudative AMD" DiagApr2008="Advanced GA and Exudative AMD" 206 185002 DiagSep=" " DiagApr2008="Exudative AMD" 301 247020 DiagSep=" " DiagApr2008="Not confirmed" 300 247020 DiagSep=" " DiagApr2008="Not confirmed" 325 258001 DiagSep="Can't Confirm Dx" DiagApr2008="Can't Grade" 457 316002 DiagSep=" " DiagApr2008="Exudative AMD" 586 406002 DiagSep=" " DiagApr2008="Not confirmed" 631 435006 DiagSep=" " DiagApr2008="Not confirmed" 630 435006 DiagSep=" " DiagApr2008="Not confirmed" 666 447002 DiagSep=" " DiagApr2008="High Risk AMD" 678 452003 DiagSep=" " DiagApr2008="Not confirmed" 677 452003 DiagSep=" " DiagApr2008="Not confirmed"
I set Advanced GA for 049001 because it was already for Pk 49
Important changes are 107010 139002 258001 (written 2010/09/20) I got following files
2007-12-12 10:08 QualityControlCreteil.xls 2007-12-12 10:20 QualityControlCreteil.csv 2007-12-12 10:36 QualityControlBaltimore.csv 2007-12-12 10:40 QualityControlBonn.csv 2007-12-12 10:46 QualityControl.csv 2008-04-07 13:44 QC_results_AMD_Audo_sent_7april08.xls 2008-04-07 14:52 QC_results_AMD_Don_Zack_complete_sent_7april08.xls (this was not integrated before 2010/10/06) 2008-05-14 09:23 QualityControlBonn.xls 2008-05-14 09:23 QualityControlBaltimore.xls 2008-05-15 18:08 AMD_QC_results_sent_batch2_may08.zip 2009-04-07 14:29 QualityControlParis.csv
and created the Quality Control tables (see 'List all available Quality Controls' and per center on the AmdConsortium welcome page)
Integration of the Phenotyping Data by Raymond 2008/02/11
- Baltimore January version
- Betsy Campochiaro sent several Excel files corresponding to an Access database
- Creation of a .csv file containing nearby all fields pointed as secondary tables (Genoret Tcl program)
- Integration of this file in the csvschema table csvt8
- Detection of errors with the Genoret Tcl program
- Corrections and updates of small errors
- Add ped to father_id and mother_id (2008/01/11)
- I replaced the numbers corresponding to the diagnosis with the text of the diagnosis (2008/01/10)
- Bonn
- Direct connection to the Phenotyping Database in Tuebingen (2007-11)
- Save the display of all patients as .csv file
- Integration of this file in the csvschema table csvt16
- They are 3 sub_table in our table ... the 3rd oe should be ok
- It seems that now a connection to Tuebingen gives only the 3rd table ...
- Keep only the year of birth (Genoret Tcl program)
- Keep only the centre Bonn or the FamStudy (ok)
- Some values were stored as boolean and couldn't be displayed correctly (a small square). I replaced nul by NO and not null by YES
- Creteil
- Eric Souied sent an Excel file
- I removed birthdays and created a csvt table
- As the file contains only CNV patients (without this diagnosis) I add CNV to the AmdDiagOdOs column in the common table.
- Paris CIC
- Isabelle Audo sent an Excel file
- Jerusalem
- Itay Chowers sent a new file 2007/12/24 and a .doc file db_codes
- I deleted all empty rows especially at the end
- and I added the missing empty columns in the rows at the end
- I got the db_codes and I replaced the initial_stage_fellow_eye_areds (2 with J=DRY-2, etc.) in the commonview (2008/02/05).
- 2008/02/15 I merged different columns to get an integrated diagnosis. It depends now on the firsteyewithamd, etc. See the AmdConsortium Welcome page.
- London
- I got the data from Andrew Webster as Excel file corresponding to his Access database.
- I removed names and birthdates
- Guillaume created a local SQL database for these level2 data.
- He created also a level1 database (as defiened by Tuebingen) with the data from Montpellier and Tuebingen.
- I'll extract the London data to create a csvt table (not yet done)
- Southampton
- Angela sent the .xls file
- Raymond upload it in AmdConsortium Gallery
- Open Excel
- Delete the 'nearly empty' columns on the right
- Rename the duplicated column Project no.2
- Save as .csv
- Integrate it as csvt27
- Modify birthday to keep only the year. And some dates were written as dd.mm.yy
- I merged the cohort, diag_dry, diag_wet_amd, amd_, consolidated_areds in one value (2008/02/05)