vcfutils.pl issue, all nnnnn in fastq file
0
0
Entering edit mode
8.5 years ago
duoduoo • 0

Hi,

I'm using samtools/1.2 and bcftools/1.2

I'm having the similar issue with https://github.com/samtools/bcftools/issues/50: (non of the replies solves my problem...)

samtools mpileup -uf ref.fa my.bam | bcftools call -c - | vcfutils.pl vcf2fq > my.fq

I'm getting all nnnnnnnnn and !!!!!!!!!!!!!!!!!! in the final fq file.

Is this something wrong with "vcfutils.pl" itself? I googled around, it seems people have same question, but no solution.

How can I get a correct fast file now?

P.S. Besides vcfutils.pl, I did try bcftools consensus, it worked fine for me. But my problem is, in my bam file, there are supposed to be some missing data. Since the consensus sequence was mapped to human reference genome, I guess all the missing/low quality sites are taken as the same as human reference genome? (even if this works, dead-end? and I have the vcf file I want, I don't need to generate them from bam file by myself.)

Thanks a lot!

samtools genome bcftools vcfutils • 4.2k views
ADD COMMENT
0
Entering edit mode

did you check the output from bcftools call -c ? (something like samtools mpileup -uf ref.fa my.bam | bcftools call -c - -o output.vcf -O v)

ADD REPLY
0
Entering edit mode

Hi I checked the output vcf from bcftools, it looks fine. But indeed, it didn't distinguish between missing data from others. (Or this is it? it is basically like this?) All non-alternative allele sites showed as they are reference alleles. So I was thinking if I should add -g INT, but then it only output variable sites, but still, it doesn't solve the problem.

ADD REPLY
0
Entering edit mode

well, i guess you need to look at your file again. You should be seeing sequences interspersed among Ns. Last ?! are quality scores.

ADD REPLY
0
Entering edit mode

No, it's not like there are sequences between N and ?!, I checked how a normal fastq file should look like, it's not like that. The generated fastq file is like:

@1
nnnnnnnnnnnn
nnnnnnnnnnnn
nnnnnnnnnnnn
nnnnnnnnnnnn

and

!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!
ADD REPLY
0
Entering edit mode

only Ns in entire file? What I got were Ns, contiguous sequences and quality scores in between and !! ?? at the end . Because this fastq is built from VCF, I expected fastq to have Ns and low scores, in addition to bases in VCF. Following is that command I ran and it seems working for me:

samtools mpileup -uf rnaseq/reference/chr12.fa rnaseq/MeOH_REP1_picard/q20.cutadapt.sorted.dedup.rg.bam  | bcftools call -c - | vcfutils.pl vcf2fq > meoh.rep1.fq

let me update on this again. Fastq validation is failing. I guess perl script is writing entire sequence and statistics into two lines instead of 4.

ADD REPLY
0
Entering edit mode

Yes, I'm getting all N, all "!" and all "~". It must be something wrong with either the vcfutils.pl itself, or my input bam file or bcf file generated from mpileup.

And this command is the same as what I ran, would you mind tell me the version of your samtools? Thanks~

ADD REPLY

Login before adding your answer.

Traffic: 2346 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6