multiple FASTA sequnces in to one full sequnce according to their chromosome number
0
0
Entering edit mode
5.1 years ago
Anu • 0

Hello,

I have NGS data of a sample ( .bam files) and I want to get a full FASTA seqeunces for my analysis. I have converted .bam files in to fasta using samtools and the result is like this.

>FCC332NACXX:8:1202:5499:10405#/1
TCACCCTAACCCCTAACCCCTAACCCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCCTTACCCGAACCCGAACCCGAACCCGAACCCTAGCCCTA
>FCC332NACXX:8:2209:15007:71606#/2
TACACTCTTTCCCCAACCCCAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAACCCTAAACCCTAAACCCTAACACTAACC
>FCC332NACXX:8:1116:16641:80184#/2
GAACCCGAACCCGAACCCGAACCCGAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCGAACCCGAACCCGAACCCGAACCCTAACCCGAAC
>FCC332NACXX:8:1305:3227:74171#/1
CGAACCCGAACCCGAACCCTAACCCGAACCCGAACCCTAACCCGAACCCGAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCACCCGAA
>FCC332NACXX:8:2210:8091:41229#/2
ACCCTCACCCTCACCCTCACCCTCACCCTAACCCCTAACCCCTAACCCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCCTTACCCGAACCCGAAC

I want to get the full sequences similar to the one below.

>chr1:43585222-43586222
GGAACTGCGTTCCTTTGGAGGAGGAGAGGCGCTCTGCGTTTTAGAGTTTC
CAGTTTTTCTGTTCTGTTTTTTCCCCATCTTTGTGGTTTTATCAACTTTT
GGTCTTTGATGATGGTGATGTACAGATGGGTTTTTGGTGTGGATGTCCTT
>chr7:47247166-47248166
AAATAGCCGTGTTCAGGCCACATGGGATAGCAGGAAATGACGCCTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTGAGACGGTGTCTCGCTCTGTTGCCCAG
GCTGGAGTGCAGTGGTACAATCTCAGCTCACGGCAAGCTCCACCTCCCGG

How can I get that? I am using linux/ubuntu. Thank you!

NGS FASTA bam • 1.6k views
ADD COMMENT
1
Entering edit mode

This looks like you want to create a consensus sequence.

For this you first have to call the variants. Afterwards you can include this variants into the reference sequence.

Have a look at the examples in the manual by bcftools consensus for further instructions.

ADD REPLY
0
Entering edit mode

It is not clear if OP wants a consensus or just retrieval of original segment of sequence from the reference which contains inverted repeats.

ADD REPLY
0
Entering edit mode

I have converted .bam files in to fasta using samtools

What was the point of doing this? And how is this related to the example sequences you show in part 2? Is that the genomic region where each of your read aligns?

ADD REPLY
0
Entering edit mode

i wanted to search for a specific sequence in the data. Ex: lets say i want to search for inverted repeats in the sequnce, for that kind of things i think we need the fasta format

ADD REPLY
0
Entering edit mode

Can you comment on how the two examples (part I and II) are related logically? We know you want to get from part I to part II but it is unclear what relationship they have.

ADD REPLY
0
Entering edit mode

part 1 is the fasta format i get after i convert the .bam files to fasta.. when i run my code to find the " inverted repeats" in that format i don't get the desired output. Thats i want to get the my part1 as part 2. My codes work for part 2 format. Therefore, i want to convert my part 1 similar to part 2 ( like removing all the headers in part 1 and organize the sequences under their chromosome) .

ADD REPLY
0
Entering edit mode

part 1 is the fasta format i get after i convert the .bam files to fasta.

By doing that you are just converting original reads to fasta format. You are losing alignment information (if it is present in your BAM file) in the process. My assumption is that >chr1:43585222-43586222 can only be gleaned from the actual alignments, which would be required to get to ⬇

organize the sequences under their chromosome

ADD REPLY
0
Entering edit mode

Can i keep the alignment information and still get the fasta format? i have original fastq and .bam files. Is there a way to get the fasta format like in part 2?

ADD REPLY
1
Entering edit mode

What you did by "converting" the bam to fasta is just getting the original reads back. A bam file just contains the reads mapped to their location. Your conversion was merely an extraction of those reads. You did not get any new information.

In contrast, what you would have to do is a (de novo) assembly of those reads. You don't want to compare them to a reference genome read per read; you want to stitch them together to form a larger contig: the longer fasta records that you are looking for. This is however not a trivial procedure.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6