Error In Bedtools Getfasta: Chromosome Not Found
3
1
Entering edit mode
11.2 years ago
Pat Baldrich ▴ 10

I am trying to use BEDtools to get some sequences from genomic coordinates. But I am having an errors saying WARNING. chromosome (chr12) was not found in the FASTA file. Skipping. for each read that I have in my bed file.

I gave you some details about what I am doing.

I just download the last version of BEDtools (I think) bedtools-2.17.0.

Then I have 2 different files (much more longer that the little part that I show) :

A fasta file with all the sequences of chromosomes:

>chr01
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

a BED file with my genomic coordinates (already sorted)

chr01 187814 190840
chr01 307073 310104
chr01 701047 704068
chr01 702941 705962
chr01 702952 705972
chr01 867716 870740
chr01 914064 917087
chr01 991080 994104
chr01 1039795 1042815
chr01 1058713 1061736

And then I write the command line:

bedtools getfasta -fi all.con -bed 1-13<em>sorted2.bed -fo NewCandidates/Genomic</em>coordinates/1-13_1500.fa

The only thing that I get is "WARNING. chromosome (chr01) was not found in the FASTA file. Skipping." , thousands of times...

If someone can help me and tell me what I am doing wrong, I will be very grateful.

bedtools • 13k views
ADD COMMENT
0
Entering edit mode

well if you just need fasta files for each chromosome (HG19) you can download it from ucsc genome browser: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes/

ADD REPLY
0
Entering edit mode

your paste above is not fasta format. did the editor eat your ">"? should look like:

> chr01
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNn
ACTGCGCACTGA

etc.

ADD REPLY
0
Entering edit mode

Yes, this happens because ">" is formatted as a blockquote UNLESS you indent lines with 4 spaces. Please note that questions are auto-previewed as you type, to help avoid this kind of problem. Fixed it for you.

ADD REPLY
0
Entering edit mode

I don't get it - is it chr01 or chr12 or both (all) of them?
What about - instead of using all.con try to use fasta file with only one chromosome in it and bed file with only that chromosome coordinates.

ADD REPLY
0
Entering edit mode

Thanks for your comment, but then I need to split all my files, and I have 24 different libraries per 12 chromosomes...If this is the only solution, I think is not the best solution for me...

ADD REPLY
0
Entering edit mode

Just to test if it's working:
-fi chr01.fa -bed chr01.bed

ADD REPLY
0
Entering edit mode

Hi, I tried and I get exactly the same error: "WARNING. chromosome (chr01) was not found in the FASTA file. Skipping." Thanks again!

My command line in case of: bedtools getfasta -fi Chr1.con -bed NewCandidates/Genomiccoordinates/1-13chr01.bed -fo NewCandidates/Genomiccoordinates/1-131500.fa

ADD REPLY
0
Entering edit mode

I had the same problem. The issue was the .fai index file generated by bedtools. The solution was to remove the bedtools generated .fai file and run samtools faidx on your input fasta first, then run bedtools getfasta.

ADD REPLY
5
Entering edit mode
11.2 years ago

Your chromosome names do not match. Make sure the bed file has identically named chromosomes. Yours seem to be zero padded, I bet it is yeast.

for some recent ideas on the subject read this from the author of BedTools: What is in a (chromosome) name

ADD COMMENT
2
Entering edit mode

It is worth noting, that if chromosome names in FASTA and BED files don't match and getfasta write that there is no index file and create it, one have to delete created index before trying running the procedure again on corrected files. Otherwise the problem will persist.

ADD REPLY
0
Entering edit mode

Istavan, that link is very nice. Thanks

ADD REPLY
0
Entering edit mode

Hi,

Thank you all for your answers. I am not working with human genome, but rice genome. So this make things more complicated.

What do you mean Istvan when you say chromosomes names do not match. Is chr01 in both files...

Thank you again!!

ADD REPLY
2
Entering edit mode
7.6 years ago
Prakki Rama ★ 2.7k

I encountered same error today. By deleting the old index file and running bedtools command automatically generated index for the fasta file which helped to resolve the issue.

ADD COMMENT
0
Entering edit mode
18 months ago
Jeroen • 0

I ran into the same problem described here, but the solutions were not working for me. I was 100% sure that the chromosome names in fasta and bed were matching. It turned out that a simple "dos2unix" conversion on the fasta did the trick for me, as I got the input fasta from a quick copy-paste in notepad on windows.

ADD COMMENT

Login before adding your answer.

Traffic: 1789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6