AmdConsortium
From Wiki
The AMD Consortium within EVI-Genoret Database by Raymond Ripp
The Genoret Database aims to host all data for the Amd Consortium.
See the AMD Consortium Welcome Page
Contents |
Introduction
- I got a lot of different files from all centers, from 2006 up to october 2010.
- Files came from Baltimore, Bonn, Creteil, Jerusalem, London, Paris, Southampton and from the CNG (Evry)
- Very often people didn't use the same identities, column title and content ...
- I tried to run as much verifications as possible to detect inconsistency, error, missings, etc.
Mise à jour pour CIDR juin 2012
Le CIDR demandait un fichier avec tous les patients .. Branle-bas de combat, tout le monde a réagi et j'ai eu de nouvelles infos.
- de la part de Diana du CNG : AMD_142_Leveillard_120525.xlsx
- 4897 lignes mais avec Cambridge (Yates) qui ne nous concerne pas et London (Webster) dont les échatillons sont retournés à Londres
- j'ai changé les entêtes pour créer AMD_142_Leveillard_120525.csv
- il est lu par la proc CngList qui normalise le sexe et normalise les Individual de Baltimore et Paris (et simule ceux de Paris20120604)
- de la part de Isabelle Audo : AMDCOHORT06062012AnonymeWithConcentration.xls
- 1142 patients avec Bleeding Sex Birth CicId Family Concentration Diagnosis
- il y avait 2 doublons
- j'ai corrigé les date de bleeding car elles étaient sensés être par ordre chronologique
- avec cette date et le birth j'ai pu calculer le AgeAtOnset
- pour les diagnostics voir AmdParis dans genoret_amdconsortium.tcl. En gros il y a des diag pour les 2 yeux pour le OD et/ou le OS. J'en ai fait un OD et OS systématique.
- J'ai créé un Paris20120604 en parallèle avec Paris qui sera à supprimer.
- Pour le CIDR il faut la liste de tout le monde.
- Frédéric avait fait le fichier EU_JHU_For_CIDR.csv en récoltant tout ce qu'il pouvait dans ma base. En virant Londres et en ne mettant pas Baltimore.
- Je suis parti de ce fichier et y ai rajouté Baltimore
- Pour créé EU_JHU_for_CIDR_20120618.csv qu'on a envoyé au CIDR
- J'ai eu beaucoup de mal avec les diagnostics et la méthode n'est pas très bonne.Il faudrait faire les OD et OS séparés et rajouter une priorité du genre
ga et cnv (= gacnv) > ga et non cnv = cnv et non ga > large drusen > intermediate > control ... Le unknown/missing plane sur tout ça !
- J'ai rajouté la Barcode mais il en manque pour beaucoup, mais c'est normal le CNG n'a pas tout fourni.
Additionnal information from Diana 20111004
Diana sent us the Excell file Final_dat_AMD_P28_29_collections.xls containing additionnal and existing CngStatus.
I wrote the tcl program VerificationP28 :
- 1 new Creteil Nat2-285
- 1 Baltimore has wrong sex dasn AMdCommon (or CgnStatus ...) 464001
- 262 new London for AmdCommon
- 1718 new London for CngStatus (0 before)
- 18 new Southapton
Diana sent us also AMD_JYates_110529.xls containing Indexes, Barcode and LabCenter. I wrote CngBarcode
Sql tables
Original table
I created a Sql table for each center independently from others with the raw data.
So I create the tables Baltimore, BaltimoreSeptember, Bonn, Creteil, Jerusalem, London, Paris, Southampton.
AmdCommon
These tables where very different and I decided to create a common table AmdCommon with following columns
amdcentre amdid amdbirth amdsex amdsmoke amddiagod amddiagos amddiagodos amdwhicheye amdownpk pk
amdownpk is the link to the original data.
CngStatus
Only a subset of these patients were genotyped. So I created the CngStatus table with the information I got grom the CNG
lab_center pedigree individual father mother sex status coll_comment barcode pk diag basicdiag birth
The most important columns are
- barcode corresponding to the blood sample
- diag obtained by a mix of amddiagod amddiagos amddiagodos
- basicdiag obtained from diag indicating ga cnv control amd
Managing BasicDiag from Diag
My Tcl functions BasicDiag and SetBasicDiagForCng allow to manage it.
- BasicDiag uses the lookup table from the file CngStatusDiag.txt
- SetBasicDiagForCng calls BasicDiag for all diag and if it doesn't find it, it appends the missing diag to CngStatusDiagToDo.txt
- CngStatusDiagToDo.txt can be viewed from the AmdConsortium welcome page ... and I hope we will solve all remaining problems ...
NoFundus and ambigious patients from Bonn 2011/02/09
Jürgen Kaschkötö sent us two files
- DeBonn/Bonn-Regensburg-Phenotype.xls containing 84 lines as for example
H-028-GA;;;;Late AMD GA;Late AMD GA H-030-GA;SO25121919;H-030-2-OS;Individual was SO25121919 , now H-030-2-OS;Late AMD GA;Late AMD GA H-053-GA;NM27021930;H-053-2-MN;Individual was NM27021930 , now H-053-2-MN;Late AMD GA;Late AMD GA etc.
these patients where all with NoFundus
- DeBonn/AmbigiousSolved.txt cointaining 20 lines as forexample
L-003-GA L-003-2-AK L-007-GA L-007-2-GH L-018-GA L-018-2-EN etc.
the entries L-123-GA were ambigious because L-003-GA <==> ambigious L-003-1-HW L-003-2-AK
I decide NOT TO UPDATE the origial database which was obtained by a sql request to Tuebingen.
I updated AmdCommon and CngStatus on 2011/02/16 with the NoFundus patient. I'll do it soon for the ambigious.
Integration of AMD_COLL_xxx files containing all genotypes 2010/06/18
On https://beaune.cng.fr/amd/ I found the files
- AMD_COLL_AUDO.tar 17-Jun-2010 11:50 35M AMD_COLL_LOTERY.tar 17-Jun-2010 11:49 128M AMD_COLL_SCHOLL.tar 15-Jun-2010 15:29 70M AMD_COLL_SOUIED.tar 17-Jun-2010 11:52 70M AMD_COLL_ZACK.tar 17-Jun-2010 11:53 41M results_AMD.txt.gz 17-Sep-2009 10:40 11M
- result_AMD.txt.gz was the same I already had.
- I did tar -xvf AMD_COLL_xxxx.tar creating the correspondig directory containing (ie Audo)
64465487 2008-09-01 10:46 Final_dat_AMD_COLL_AUDO.txt 717 2008-08-12 10:01 report.txt 26731497 2008-08-12 09:52 success_mark.txt 3139 2008-08-12 09:52 success_samp.txt
The Final_dat_AMD_COLL_xxx contain
Family Individual Father Mother Sex Status rs12354060 rs12184279 rs12564807 rs3115860 ... ... ... (with 332224 rs)
I created a small one called xxxx.txt (Audo, Lotery, Scholl, Souied, Zack) with the first 10 columns, run gscope CngGenoPheno and got the CngGenoPheno list available on the AmdConsortium welcome page 'Phenotypic Information.
- Only a few references from Bonn are still missing
but why are most of Individual like 'ZM23041920' (I found them because I had the coll_comment field containing 'Individual was ZM23041920 , now H-208-2-MZ' )
- Paris, Creteil and Southampton are ok, Baltimore seems to be ok but there are still multiple diagnosis ...
Integration of the CNG Status 2008/10/01
- Reading the file AMD_verif_statuts_Leveillard_envoye_1oct08.xls
- I normalized the headers (without blank, no /)
- I set sex to M or F instead of 1 or 2
- I replaced the centre with respectively Bonn, Creteil ...
- I replaced NAT2-xyz by xyz for Creteil (some sex values don't correpond)
- I remove TL to obtain the Paris ID
- I took the beginning of the Bonn's Ids (L-060-GA => L-060- but some Ids are ambiguous L-060-1-BH or L-060-2-KS ?
- Reading the files snplist_NXNL2.xlt snplist_TXNL6.xlt genotypeNXNL2.xls genotypeTXNL6.xls
- convert to .csv
- remove the comment lines at beginning
Integration of the Quality Control by Raymond Ripp 2008/04
2010/10/06 I have to integrate QC_results_AMD_Don_Zack_complete_sent_7april08.xls in the AllTogether table
With this qc_alltogether table I could feed the AmdCommon in which we mist a lot Baltimore
Comparing diagnosis from sep 2007 with apr 2008 I did minor corrections in Apr 2008 (upper case lower cae etc,) and major differences are listed below
2 004001 DiagSep=" " DiagApr2008="High Risk AMD" 3 004001 DiagSep=" " DiagApr2008="High Risk AMD" 31 034002 DiagSep=" " DiagApr2008="Not confirmed" 32 034002 DiagSep=" " DiagApr2008="Not confirmed" 42 043002 DiagSep=" " DiagApr2008="Unaffected/No Pathology" 50 049001 DiagSep="Advanced GA" DiagApr2008=" " 52 052004 DiagSep=" " DiagApr2008="Not confirmed" 88 107010 DiagSep="Low risk AMD" DiagApr2008="Unaffected/No Pathology" 129 139002 DiagSep="Exudative AMD" DiagApr2008="Advanced GA and Exudative AMD" 206 185002 DiagSep=" " DiagApr2008="Exudative AMD" 301 247020 DiagSep=" " DiagApr2008="Not confirmed" 300 247020 DiagSep=" " DiagApr2008="Not confirmed" 325 258001 DiagSep="Can't Confirm Dx" DiagApr2008="Can't Grade" 457 316002 DiagSep=" " DiagApr2008="Exudative AMD" 586 406002 DiagSep=" " DiagApr2008="Not confirmed" 631 435006 DiagSep=" " DiagApr2008="Not confirmed" 630 435006 DiagSep=" " DiagApr2008="Not confirmed" 666 447002 DiagSep=" " DiagApr2008="High Risk AMD" 678 452003 DiagSep=" " DiagApr2008="Not confirmed" 677 452003 DiagSep=" " DiagApr2008="Not confirmed"
I set Advanced GA for 049001 because it was already for Pk 49
Important changes are 107010 139002 258001 (written 2010/09/20) I got following files
2007-12-12 10:08 QualityControlCreteil.xls 2007-12-12 10:20 QualityControlCreteil.csv 2007-12-12 10:36 QualityControlBaltimore.csv 2007-12-12 10:40 QualityControlBonn.csv 2007-12-12 10:46 QualityControl.csv 2008-04-07 13:44 QC_results_AMD_Audo_sent_7april08.xls 2008-04-07 14:52 QC_results_AMD_Don_Zack_complete_sent_7april08.xls (this was not integrated before 2010/10/06) 2008-05-14 09:23 QualityControlBonn.xls 2008-05-14 09:23 QualityControlBaltimore.xls 2008-05-15 18:08 AMD_QC_results_sent_batch2_may08.zip 2009-04-07 14:29 QualityControlParis.csv
and created the Quality Control tables (see 'List all available Quality Controls' and per center on the AmdConsortium welcome page)
Integration of the Phenotyping Data by Raymond 2008/02/11
- Baltimore January version
- Betsy Campochiaro sent several Excel files corresponding to an Access database
- Creation of a .csv file containing nearby all fields pointed as secondary tables (Genoret Tcl program)
- Integration of this file in the csvschema table csvt8
- Detection of errors with the Genoret Tcl program
- Corrections and updates of small errors
- Add ped to father_id and mother_id (2008/01/11)
- I replaced the numbers corresponding to the diagnosis with the text of the diagnosis (2008/01/10)
- Bonn
- Direct connection to the Phenotyping Database in Tuebingen (2007-11)
- Save the display of all patients as .csv file
- Integration of this file in the csvschema table csvt16
- They are 3 sub_table in our table ... the 3rd oe should be ok
- It seems that now a connection to Tuebingen gives only the 3rd table ...
- Keep only the year of birth (Genoret Tcl program)
- Keep only the centre Bonn or the FamStudy (ok)
- Some values were stored as boolean and couldn't be displayed correctly (a small square). I replaced nul by NO and not null by YES
- Creteil
- Eric Souied sent an Excel file
- I removed birthdays and created a csvt table
- As the file contains only CNV patients (without this diagnosis) I add CNV to the AmdDiagOdOs column in the common table.
- Paris CIC
- Isabelle Audo sent an Excel file
- Jerusalem
- Itay Chowers sent a new file 2007/12/24 and a .doc file db_codes
- I deleted all empty rows especially at the end
- and I added the missing empty columns in the rows at the end
- I got the db_codes and I replaced the initial_stage_fellow_eye_areds (2 with J=DRY-2, etc.) in the commonview (2008/02/05).
- 2008/02/15 I merged different columns to get an integrated diagnosis. It depends now on the firsteyewithamd, etc. See the AmdConsortium Welcome page.
- London
- I got the data from Andrew Webster as Excel file corresponding to his Access database.
- I removed names and birthdates
- Guillaume created a local SQL database for these level2 data.
- He created also a level1 database (as defiened by Tuebingen) with the data from Montpellier and Tuebingen.
- I'll extract the London data to create a csvt table (not yet done)
- Southampton
- Angela sent the .xls file
- Raymond upload it in AmdConsortium Gallery
- Open Excel
- Delete the 'nearly empty' columns on the right
- Rename the duplicated column Project no.2
- Save as .csv
- Integrate it as csvt27
- Modify birthday to keep only the year. And some dates were written as dd.mm.yy
- I merged the cohort, diag_dry, diag_wet_amd, amd_, consolidated_areds in one value (2008/02/05)