Biostar Beta. Not for public use.
Generating consensus sequence from bam file
0
Entering edit mode
20 months ago
chparada • 0
@chparada53146

Hi,

I am trying to generate consensus sequence from a bam file obtained after mapping SRA reads to a reference genome.

I used the following commands:

bwa mem ref.fasta SRR_1.fastq SRR_2.fastq > bwa.sam
samtools view -b -F 4 bwa.sam > bwa_aligned.bam
samtools index bwa_aligned.bam

I am not sure how to generate the consensus sequence that I have in mind. In case I don't explain this well. I made a diagram:

===========================================================>ref.fasta
- -- ---- ----      ----- --- --- -      --    ------ ----- 
------ --- ---     -------- --- --       ---- -      --- -->SRR_reads_mapping



==============  +  ================  +   ==================> consensus_sequence.fasta

Please let me know if you have any advice on this.

Cheers!!!

genome samtools bwa fasta • 502 views
ADD COMMENTlink
0
Entering edit mode

What do you mean by consensus sequence? The most frequent nucleotide in each position? The variants? There are plenty of tools that can give you the reading in each position, try bcftools mpileup for instance.

ADD REPLYlink
0
Entering edit mode

Thank you for your answers.

I meant obtaining the most frequent nucleotide in each position. From the mapped reads, I want to been able to obtain a consensus sequence in single fasta file. I am going to try bcftools mpileup.

ADD REPLYlink
2
Entering edit mode
22 months ago
nsmi8446 • 120
@nsmi844630691

You could call variants (using whatever variant calling software you like, GATK, freebayes etc.) from your .bam file and then use vcf-consensus (http://vcftools.sourceforge.net/perl_module.html#vcf-consensus) to build your consensus sequence. The code below should work:

cat ref.fa | vcf-consensus file.vcf.gz > out.fa

ADD COMMENTlink
1
Entering edit mode

I agree with nsmi8446. This is a nice concise way to solve your problem.

ADD REPLYlink
0
Entering edit mode
20 months ago
waldeyr • 0
@waldeyr35135
samtools mpileup -uf my_reference.fna my_file.bam | bcftools view -cg - | vcfutils.pl vcf2fq > my_consensus.fq
ADD COMMENTlink
0
Entering edit mode
20 months ago
trausch ♦ 1.3k
@trausch18874

Alfred has a consensus mode that extracts all reads at a given alignment position and then runs a multiple sequence alignment computation with consensus generation. It's primarily for long reads but I think it also works for short reads.

alfred consensus -t ill -f bam -p chr4:500500 input.bam
ADD COMMENTlink
0
Entering edit mode
20 months ago
chen ♦ 1.9k
@chen19798

try gencore: https://github.com/OpenGene/gencore

Generate consensus reads to reduce sequencing noises and remove duplications

ADD COMMENTlink
0
Entering edit mode
20 months ago
guillaume.rbt • 590
@guillaume.rbt14460

you can use GATK FastaAlternateReferenceMaker to generate the consensus sequence based on a SNP calling : https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_fasta_FastaAlternateReferenceMaker.php

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3