Vep
Date : 2013/10/14 Author : kchennen
Variant Effect Predictor
- Installation on studio with Raymond
- installation in /biolo/vep
- Download latest archieve (v73)
> curl "http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-tools/scripts/variant_effect_predictor.tar.gz?view=tar&root=ensembl&pathrev=branch-ensembl-73" | tar xz > cd variant_effect_predictor
- Install the API with a local cache in /biolo/vep/cache
> perl INSTALL.pl -c /biolo/vep/cache Hello! This installer is configured to install v73 of the Ensembl API for use by the VEP. It will not affect any existing installations of the Ensembl API that you may have. It will also download and install cache files from Ensembl's FTP server. Checking for installed versions of the Ensembl API...done It looks like you already have v73 of the API installed. You shouldn't need to install the API Skip to the next step (n) to install cache files Do you want to continue installing the API (y/n)?y Setting up directories Downloading required files - fetching ensembl - unpacking ./Bio/tmp/ensembl.tar.gz - moving files - fetching ensembl-variation ** GET http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation.tar.gz?root=ensembl&view=tar&only_with_tag=branch-ensembl-73 ==> 301 Moved ** GET http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation.tar.gz?pathrev=branch-ensembl-73&root=ensembl&view=tar ==> 200 OK (8s) - unpacking ./Bio/tmp/ensembl-variation.tar.gz - moving files - fetching ensembl-functgenomics ** GET http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-functgenomics.tar.gz?root=ensembl&view=tar&only_with_tag=branch-ensembl-73 ==> 301 Moved ** GET http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-functgenomics.tar.gz?pathrev=branch-ensembl-73&root=ensembl&view=tar ==> 200 OK (5s) - unpacking ./Bio/tmp/ensembl-functgenomics.tar.gz - moving files - fetching BioPerl ** GET http://bioperl.org/DIST/BioPerl-1.6.1.tar.gz ==> 200 OK (15s) - unpacking ./Bio/tmp/BioPerl-1.6.1.tar.gz - moving files Testing VEP script - OK!
Install local cache for database connections for homo sapiens The VEP can either connect to remote or local databases, or use local cache files. Using local cache files is the fastest and most efficient way to run the VEP Cache files will be stored in /my/home/kchennen/.vep Do you want to install any cache files (y/n)? y Cache directory /my/home/kchennen/.vep does not exists - do you want to create it (y/n)? y Downloading list of available cache files The following species/files are available; which do you want (can specify multiple separated by spaces): 1 : ailuropoda_melanoleuca_vep_73.tar.gz 2 : anas_platyrhynchos_vep_73.tar.gz 3 : anolis_carolinensis_vep_73.tar.gz ... 25 : homo_sapiens_refseq_vep_73.tar.gz 26 : homo_sapiens_vep_73.tar.gz ... ? 25 26 - downloading ftp://ftp.ensembl.org/pub/release-73/variation/VEP/homo_sapiens_refseq_vep_73.tar.gz ** GET ftp://ftp.ensembl.org:21/pub/release-73/variation/VEP/homo_sapiens_refseq_vep_73.tar.gz ==> 200 OK (253s) - unpacking homo_sapiens_refseq_vep_73.tar.gz - downloading ftp://ftp.ensembl.org/pub/release-73/variation/VEP/homo_sapiens_vep_73.tar.gz ** GET ftp://ftp.ensembl.org:21/pub/release-73/variation/VEP/homo_sapiens_vep_73.tar.gz ==> 200 OK (305s) - unpacking homo_sapiens_vep_73.tar.gz Download FASTA files for homo sapiens The VEP can use FASTA files to retrieve sequence data for HGVS notations and reference sequence checks. FASTA files will be stored in /my/home/kchennen/.vep Do you want to install any FASTA files (y/n)? y FASTA files for the following species are available; which do you want (can specify multiple separated by spaces, "0" to install for species specified for cache download): 1 : ailuropoda_melanoleuca 2 : anas_platyrhynchos 3 : ancestral_alleles ... 26 : homo_sapiens ... ? 26 Downloading Homo_sapiens.GRCh37.73.dna.primary_assembly.fa.gz ** GET ftp://ftp.ensembl.org:21/pub/release-73/fasta//homo_sapiens/dna/Homo_sapiens.GRCh37.73.dna.primary_assembly.fa.gz ==> 200 OK (99s) Extracting data The FASTA file should be automatically detected by the VEP when using --cache or --offline. If it is not, use "--fasta /my/home/kchennen/.vep/homo_sapiens/73/Homo_sapiens.GRCh37.73.dna.primary_assembly.fa" Success
- Configure
* create configuration file in /my/home/kchennen/.vep ########################## ## general features flags ########################## force_overwrite 1 verbose 1 species homo_sapiens fork 4 ########################### ## output annotation flags ########################### sift b # the SIFT prediction and score, with both given as prediction(score) polyphen b # the PolyPhen prediction and score regulatory 1 # Look for overlaps with regulatory regions. The script can also call if a variant falls in a high information position within a transcription factor binding site. numbers 1 # Adds affected exon and intron numbering to to output. domains 1 # Adds names of overlapping protein domains to output. terms so ################################ ## ouput indentifications flags ################################ hgvs 1 # Add HGVS nomenclature based on Ensembl stable identifiers to the output. symbol 1 # Adds the gene symbol (e.g. HGNC) (where available) to the output. ccds 1 # Adds the CCDS transcript identifer (where available) to the output. protein 1 # Add the Ensembl protein identifier to the output where appropriate. canonical 1 # Adds a flag indicating if the transcript is the canonical transcript for the gene. biotype 1 # Adds the biotype of the transcript. Not used by default xref_refseq 1 # Output aligned RefSeq mRNA identifier for transcrip ############################# ## Co-located variants flags ############################# gmaf 1 # Add the global minor allele frequency (MAF) from 1000 Genomes Phase 1 data for any existing variant to the output. #maf_1kg 1 # Add MAF from continental populations (AFR,AMR,ASN,EUR) of 1000 Genomes Phase 1 to the output. maf_esp 1 # Include MAF from NHLBI-ESP populations. pubmed 1 # Report Pubmed IDs for publications that cite existing variant. check_alleles 1 # When checking for existing variants, only report a co-located variant if none of the alleles supplied are novel. check_svs 1 # Checks for the existence of structural variants that overlap your input. ##failed 1 # When checking for co-located variants, by default the script will exclude variants that have been flagged as failed. ############################# ## Filtering and QC options ############################# #check_ref 1 # Force the script to check the supplied reference allele against the sequence stored in the Ensembl Core database. #coding_only 1 # Only return consequences that fall in the coding regions of transcripts. no_intergenic 1 # Do not include intergenic consequences in the output. #most_severe 1 # Output only the most severe consequence per variation. #summary 1 # Output only a comma-separated list of all observed consequences per variation. #per_gene 1 # Output only the most severe consequence per gene. filter_common 1 # Shortcut flag for the filters below - this will exclude variants that have a co-located existing variant with global MAF > 0.01 (1%). May be modified using any of the following freq_* filters. * add plugins in /my/home/kchennen/.vep/Plugins