Vep

From Wikili
Jump to: navigation, search

Date : 2013/10/14 Author : kchennen

Variant Effect Predictor

Installation

  • Installation on studio with Raymond
    • installation in /biolo/vep
 > curl "http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-tools/scripts/variant_effect_predictor.tar.gz?view=tar&root=ensembl&pathrev=branch-ensembl-73" | tar xz
 > cd variant_effect_predictor  
  • Install the API with a local cache in /biolo/vep/cache
 > perl INSTALL.pl -c /biolo/vep/cache
   Hello! This installer is configured to install v73 of the Ensembl API for use by the VEP.
   It will not affect any existing installations of the Ensembl API that you may have.
   
   It will also download and install cache files from Ensembl's FTP server.
   Checking for installed versions of the Ensembl API...done
   It looks like you already have v73 of the API installed.
   You shouldn't need to install the API
   
   Skip to the next step (n) to install cache files
   
   Do you want to continue installing the API (y/n)?y
   Setting up directories
       
   Downloading required files
    - fetching ensembl
    - unpacking ./Bio/tmp/ensembl.tar.gz
    - moving files
    - fetching ensembl-variation
    ** GET http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation.tar.gz?root=ensembl&view=tar&only_with_tag=branch-ensembl-73 ==> 301 Moved
    ** GET http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation.tar.gz?pathrev=branch-ensembl-73&root=ensembl&view=tar ==> 200 OK (8s)
    - unpacking ./Bio/tmp/ensembl-variation.tar.gz
    - moving files
    - fetching ensembl-functgenomics
    ** GET http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-functgenomics.tar.gz?root=ensembl&view=tar&only_with_tag=branch-ensembl-73 ==> 301 Moved
    ** GET http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-functgenomics.tar.gz?pathrev=branch-ensembl-73&root=ensembl&view=tar ==> 200 OK (5s)
    - unpacking ./Bio/tmp/ensembl-functgenomics.tar.gz
    - moving files
    - fetching BioPerl
    ** GET http://bioperl.org/DIST/BioPerl-1.6.1.tar.gz ==> 200 OK (15s)
    - unpacking ./Bio/tmp/BioPerl-1.6.1.tar.gz
    - moving files
       
    Testing VEP script
    - OK!
  • Install local cache for database connections for homo sapiens
       The VEP can either connect to remote or local databases, or use local cache files. Using local cache files is the fastest and most efficient way to run the VEP
       Cache files will be stored in /biolo/vep/cache
       Do you want to install any cache files (y/n)? y
       Cache directory /biolo/vep/cache does not exists - do you want to create it (y/n)? y
       
       Downloading list of available cache files
       The following species/files are available; which do you want (can specify multiple separated by spaces): 
       1 : ailuropoda_melanoleuca_vep_73.tar.gz
       2 : anas_platyrhynchos_vep_73.tar.gz
       3 : anolis_carolinensis_vep_73.tar.gz
       ...
       25 : homo_sapiens_refseq_vep_73.tar.gz
       26 : homo_sapiens_vep_73.tar.gz
       ...
       
       ? 26
        - downloading ftp://ftp.ensembl.org/pub/release-73/variation/VEP/homo_sapiens_vep_73.tar.gz
       ** GET ftp://ftp.ensembl.org:21/pub/release-73/variation/VEP/homo_sapiens_vep_73.tar.gz ==> 200 OK (305s)
        - unpacking homo_sapiens_vep_73.tar.gz
       
      Download FASTA files for homo sapiens
      
       The VEP can use FASTA files to retrieve sequence data for HGVS notations and reference sequence checks.
       FASTA files will be stored in /biolo/vep/cache
       Do you want to install any FASTA files (y/n)? y
       FASTA files for the following species are available; which do you want (can specify multiple separated by spaces, "0" to install for species specified for cache download): 
       1 : ailuropoda_melanoleuca
       2 : anas_platyrhynchos
       3 : ancestral_alleles
       ...
       26 : homo_sapiens
       ...
       
       ? 26
       Downloading Homo_sapiens.GRCh37.73.dna.primary_assembly.fa.gz
       ** GET ftp://ftp.ensembl.org:21/pub/release-73/fasta//homo_sapiens/dna/Homo_sapiens.GRCh37.73.dna.primary_assembly.fa.gz ==> 200 OK (99s)
       Extracting data
       The FASTA file should be automatically detected by the VEP when using --cache or --offline. If it is not, use "--fasta /biolo/vep/cache/homo_sapiens/73/Homo_sapiens.GRCh37.73.dna.primary_assembly.fa"        
       Success
 
  • Configure
    • Add plugins
    • Create the configuration file vep.ini in /biolo/vep/cache
  ##########################
  ## general features flags 
  ##########################
  force_overwrite    1
  verbose            1
  species            homo_sapiens
  fork               4
  
  ###########################
  ## output annotation flags 
  ###########################
  sift                 b # the SIFT prediction and score, with both given as prediction(score)
  polyphen             b # the PolyPhen prediction and score
  regulatory           1 # Look for overlaps with regulatory regions. The script can also call if a variant falls in a high information position within a transcription factor binding site.
  numbers              1 # Adds affected exon and intron numbering to to output.
  domains              1 # Adds names of overlapping protein domains to output. 
              
  terms                so
  
  
  ################################
  ## ouput indentifications flags 
  ################################
  hgvs               1 # Add HGVS nomenclature based on Ensembl stable identifiers to the output.
  symbol             1 # Adds the gene symbol (e.g. HGNC) (where available) to the output.
  ccds               1 # Adds the CCDS transcript identifer (where available) to the output.
  protein            1 # Add the Ensembl protein identifier to the output where appropriate.
  canonical          1 # Adds a flag indicating if the transcript is the canonical transcript for the gene.
  biotype            1 # Adds the biotype of the transcript. Not used by default
  xref_refseq        1 # Output aligned RefSeq mRNA identifier for transcrip
  
  
  
  #############################
  ## Co-located variants flags 
  #############################
  gmaf                 1 # Add the global minor allele frequency (MAF) from 1000 Genomes Phase 1 data for any existing variant to the output.
  #maf_1kg             1 # Add MAF from continental populations (AFR,AMR,ASN,EUR) of 1000 Genomes Phase 1 to the output.
  maf_esp              1 # Include MAF from NHLBI-ESP populations.
  pubmed               1 # Report Pubmed IDs for publications that cite existing variant. 
  check_alleles        1 # When checking for existing variants, only report a co-located variant if none of the alleles supplied are novel.
  check_svs            1 # Checks for the existence of structural variants that overlap your input. 
  ##failed             1 # When checking for co-located variants, by default the script will exclude variants that have been flagged as failed.
  
  
  #############################
  ##  Filtering and QC options 
  #############################
  #check_ref          1 # Force the script to check the supplied reference allele against the sequence stored in the Ensembl Core database.
  #coding_only        1 # Only return consequences that fall in the coding regions of transcripts.
  no_intergenic       1 # Do not include intergenic consequences in the output.
  #most_severe        1 # Output only the most severe consequence per variation. 
  #summary            1 # Output only a comma-separated list of all observed consequences per variation.
  #per_gene           1 # Output only the most severe consequence per gene.  
  filter_common       1 # Shortcut flag for the filters below - this will exclude variants that have a co-located existing variant with global MAF > 0.01 (1%). May be modified using any of the following freq_* filters.
  • Creation of an alias
 vep: 	 aliased to /biolo/vep/variant_effect_predictor.pl --force_overwrite --cache --dir /biolo/vep/cache

Usage

  • Set environment
 > setvep
   vep -i myfile.vcf
  • usage
 > vep -i input.vcf -o ouput.vcf > output.log