Biostar Beta. Not for public use.
getting a consense one header .fasta file from a genic region, from .bam template
0
Entering edit mode
18 months ago
ricfoz • 30
National School of Antropology and Hist…

I started with a 9GB file of a human Xth chromosome with a .bam format, and i am trying to get a specific genic region in .fasta clasic format (you know, a one header file, starting with ">", with description on header, and a single line, of continuous tandem single nucleotides).

I have been able to retrieve the reads of the genic region i want in bam format. I share the path i have followed:

samtools sort OriginalFile.bam -o OriginalFile.sort.bam samtools inex OriginalFile.sort.bam samtools view -b OriginalFile.sort.bam RegName:XX-XX+1 > GeneName.bam

after this i got the GeneName.bam file which contains all the reads on which i am interested in, but i can´t run the phylogenetic tool i am trying to pipe the GeneName.bam sequence to ... in order to follow my workpath, i need to transform this GeneName.bam file, into this classic .fasta format i described lines ago.

I tried a couple of pipelines in order to achieve this, but what i have got as a retrieved .fasta file, is a man readable file, but with all the single fastq reads, with headers and everything, stacked on top of each other. I need to get the consensus of this, with one header and only contiguous nucleotides.

I got GeneName.fasta file running:

samtools bam2fq GeneName.bam > GeneName.fasta

or

samtools bam2fq GeneName.bam > GeneName.fastq samtools seqtk seq -A GeneName.fastq > GeneName.fasta # (i installed bowtie2, boost, tophat2, seqtk, and bcf tools, as they seem to complement each other's works at some times)

and i got the same result on both, analogous file.

With this line of thinking, i thought i may have to run mpileup to the GeneName.bam file, before transforming to fasta, but mpileup gives as a result .bcf format... still, this last assumption is just a form of discussing my problem.

Have anyone out there found the solution for this, can a .bam file converted to classic .fasta format, or can help in any way?

Greetings

ADD COMMENTlink
0
Entering edit mode

Yes, that post is mine... i have posted a couple of times on the matter as i have been working through the problem ... now i am at the point at which i can retrieve the genicregion.fasta file, but it gives only single fastq reads piled up upon each other... i want a .fasta format, which is completely different from .fastq single read.

ADD REPLYlink
0
Entering edit mode

now that you got fastq file, try converting to fasta file using tools such as: https://bioconda.github.io/recipes/ucsc-fastqtofa/README.html. This would give you each read as single fasta record. You can download the binaries from http://hgdownload.cse.ucsc.edu/admin/exe/ and select as per your platform.

ADD REPLYlink
0
Entering edit mode

That's not what OP needs, rather a de novo assembly. OP wants a "reference" fasta for that region if I understood correctly, or create a consensus sequence of the bam file.

ADD REPLYlink
1
Entering edit mode
18 months ago
ricfoz • 30
National School of Antropology and Hist…

Alright, i solved my problem, so i share what i found to the community so you can solve similar problems

after retrieving a .bam file with the desired length, through the command "samtools view -b ChrName:XX-XX+1 > Outputname.bam"

you need to pipe this output into samtools mpileup command, in order to get a .bcf file

"samtools mpileup -g -f refgenome.fa filename.bam > filename.bcf" (check samtools tutorials to change or check/confirm desired flag options)

then you need to transform .bcf to .vcf callig: "bcftools call -m filename.bcf > filename .vcf"

this last .vcf file needs to be compressed and indexed like following:

bgzip -c filename.vcf > filename.vcf.gz

tabix filename.vcf.gz

after having succesfully run these commands, you can get to the final step, using samtools again:

samtools faidx refgenome.fa ChrName:XX-XX+1 | bcftools consensus filename.vcf.gz > filename.fa

I hope these pipeline is useful for you people, and i thank all the community, and people who aided on the go. Greetings!

ADD COMMENTlink
1
Entering edit mode

Good job and thanks for sharing, I have moved this to an answer because it solves your question, making it also easier to find for others.

ADD REPLYlink
0
Entering edit mode

No problem, i guess the aim of the community is to get aid on problems with actually solved answers.

Cheers.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3