Building Snpeff Database
2
2
Entering edit mode
11.7 years ago
bioinfo ▴ 830

I was trying to create a snpeff database using my reference genome (in genbank format). I followed http://snpeff.sourceforge.net/supportNewGenome.html#genbank but during the editing the configuration file, I messed up. Eventually I couldn't add the genome to the config file.

My commands:

vi snpEffect.config
# Sodalis genome, version NC_007712.1   GI:85057978 
NC_007712.1   GI:85057978 : Sodalis

I don't know how to save the above input information in configuration file and I m not sure whether the information I put in above is correct for my genome. I copied it from version section of genbank file of my bacteria from NCBI.

Any help will be highly appreciated.

vcftools snp gatk • 23k views
ADD COMMENT
9
Entering edit mode
6.3 years ago
rleach ▴ 180

I just went through figuring this out and I thought I would add my process, including the FASTA component, using Vibrio phage VP882 as my example and utilizing the gff strategy you mentioned in a comment to the other answer. Here is everything I did using an established snpEff installation. It worked when I ran my analysis using it, so this strategy is confirmed in my case:

 #How to create a snpEff database using a gff3 and genomic DNA fasta file... (note, the chromosome names must match in the 2 files)
 #NOTE: This uses /bin/tcsh...

 setenv DBNAME Vibrio_phage_VP882
 setenv GFF3 ~/Downloads/VibriophageVP882.gff3
 setenv FASTA ~/Downloads/VibriophageVP882.fasta

 #Go into the snpEff directory and create a directory for your files
 cd /usr/local/snpEff
 mkdir data/$DBNAME

 #Copy the files into snpEff's directory structure
 cp $GFF3 data/$DBNAME/genes.gff
 cp $FASTA data/$DBNAME/sequences.fa

 #Edit snpEff.config and insert your specific database information:
 echo "$DBNAME.genome : $DBNAME" >> snpEff.config

 #Build the database
 java -jar snpEff.jar build -gff3 -v $DBNAME

I did not have any errors or warnings, so if you see anything untoward, you'll have to figure those things out.

You can set the 3 variable values at the top of this script and run the rest without changing it (unless your snpEff installation is in a different place.

And just for completeness, I downloaded the gff3 and fasta files directly from this page:

https://www.ncbi.nlm.nih.gov/nuccore/NC_009016

Using the complete record/file download option at the top right of the record, selecting gff and fasta in 2 separate downloads.

Rob

ADD COMMENT
1
Entering edit mode

Thank you so much! The documentation for SNPeff is rather poor, and this was the first source I (finally) found that worked! Great, thanks!

ADD REPLY
0
Entering edit mode

I try to run the same commands but when I try to annotate my vcf snpEff tries to upload the database from sourceforge and ends up with an error. Should the build option create some database related files at directories? Directories and their contents remain unchanged and this is weird.

ADD REPLY
0
Entering edit mode

The problem was following: the memory argument -Xmx4G should be added before running build command.

ADD REPLY
0
Entering edit mode

the folder /usr/local/snpEff/data does not exist, you need to create it.

ADD REPLY
6
Entering edit mode
11.7 years ago

Try this:

  1. Create directory "Sodalis" in snpEff data directory:

    mkdir /path/to/snpEff/data/Sodalis
    
  2. Downloaded and save the GenBank file to the Sodalis directory (note the file name must be gene.gb)

    /path/to/snpEff/data/Sodalis/genes.gb
    
  3. Edit snpEff.config and insert your specific database information:

    # Sodalis genome, version NC_007712.1 GI:85057978
    Soladis.genome : Sodalis
    
  4. Create database (note the "-genbank" flag):

    cd /path/to/snpEff
    java -jar snpEff.jar build -genbank -v Sodalis
    

I expect this could help...

Fred

ADD COMMENT
3
Entering edit mode

very useful starter, thanks! in my case the genbank data seemed to be faulty and I had to use GFF3 + FASTA to reach the best result

  • download gff and fasta from NCBI
  • build folder and register in .config as described above
  • place the two files renamed: genes.gff and sequences.fa into the new database folder
  • build with: java -jar $SNPEFF/snpEff.jar build -gff3 -v <name>
ADD REPLY
0
Entering edit mode

Could you please tell me what the fault was with the genbank file?

ADD REPLY
0
Entering edit mode

there is no big problem you just have to change name to genes.gbk and run that above command and it will work.

ADD REPLY

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6