Biostar Beta. Not for public use.
Generating consensus sequence from bam file
0
Entering edit mode
15 months ago
chparada • 0

Hi,

I am trying to generate consensus sequence from a bam file obtained after mapping SRA reads to a reference genome.

I used the following commands:

bwa mem ref.fasta SRR_1.fastq SRR_2.fastq > bwa.sam
samtools view -b -F 4 bwa.sam > bwa_aligned.bam
samtools index bwa_aligned.bam

I am not sure how to generate the consensus sequence that I have in mind. In case I don't explain this well. I made a diagram:

===========================================================>ref.fasta
- -- ---- ----      ----- --- --- -      --    ------ ----- 
------ --- ---     -------- --- --       ---- -      --- -->SRR_reads_mapping



==============  +  ================  +   ==================> consensus_sequence.fasta

Please let me know if you have any advice on this.

Cheers!!!

ADD COMMENTlink
0
Entering edit mode

What do you mean by consensus sequence? The most frequent nucleotide in each position? The variants? There are plenty of tools that can give you the reading in each position, try bcftools mpileup for instance.

ADD REPLYlink
0
Entering edit mode

Thank you for your answers.

I meant obtaining the most frequent nucleotide in each position. From the mapped reads, I want to been able to obtain a consensus sequence in single fasta file. I am going to try bcftools mpileup.

ADD REPLYlink
2
Entering edit mode
13 months ago
nsmi8446 • 120

You could call variants (using whatever variant calling software you like, GATK, freebayes etc.) from your .bam file and then use vcf-consensus (http://vcftools.sourceforge.net/perl_module.html#vcf-consensus) to build your consensus sequence. The code below should work:

cat ref.fa | vcf-consensus file.vcf.gz > out.fa

ADD COMMENTlink
1
Entering edit mode

I agree with nsmi8446. This is a nice concise way to solve your problem.

ADD REPLYlink
0
Entering edit mode
13 months ago
waldeyr • 0
samtools mpileup -uf my_reference.fna my_file.bam | bcftools view -cg - | vcfutils.pl vcf2fq > my_consensus.fq
ADD COMMENTlink
0
Entering edit mode
11 months ago
trausch ♦ 1.3k
Germany

Alfred has a consensus mode that extracts all reads at a given alignment position and then runs a multiple sequence alignment computation with consensus generation. It's primarily for long reads but I think it also works for short reads.

alfred consensus -t ill -f bam -p chr4:500500 input.bam
ADD COMMENTlink
0
Entering edit mode
12 months ago
chen ♦ 1.9k
OpenGene

try gencore: https://github.com/OpenGene/gencore

Generate consensus reads to reduce sequencing noises and remove duplications

ADD COMMENTlink
0
Entering edit mode
12 months ago
guillaume.rbt • 590
France

you can use GATK FastaAlternateReferenceMaker to generate the consensus sequence based on a SNP calling : https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_fasta_FastaAlternateReferenceMaker.php

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1