About The 10Th Column Of The Output Sequence From Samtools View Command
1
0
Entering edit mode
12.5 years ago
Zhshqzyc ▴ 520

Hello, I used samtools to get the data for chromosome 22. The command is:

samtools view input.bam 'chr22'  >raw_chr22.txt

Then I extracted the 3,4 and 10th columns from it. The partial data for demo purpose like

chr22 14430092 NNNNCNAGCNGAGCGNNTCTGGGNACCTCGAAGGCAGACATG
chr22 14430092 NNNNNNNNNNNTNNNNNNNNANNGNNTNNNNNNNNNNNNNNN
chr22 14430092 CCTCGCGGGACTGGTATGGGGACGGTCATGCAATCTGGACAA

The data has three columns, chromosome, position and sequence. My question is that they all have the same position but different sequence. So what is the real sequence in this specific position.

Thanks

samtools sequence • 5.1k views
ADD COMMENT
1
Entering edit mode
12.5 years ago
brentp 24k

It might help if you post the output of all the columns. What aligner did you use. You can check the actual sequence within something like:

samtools faidx $REF chr22:14430092-14430142

replacing $REF with your reference fasta file. When I try this with hg19, it gives:

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

So maybe whatever aligner you're using is mapping to a region of N's? Again, it would be useful if you showed all the columns, then we could see the flag.

Update:

Based on the file you posted, those all have mapping quality 0, so you should disregard those alignments unless you have reason not to. Also, one of them has 1113 in the bit flag. If you go here, you can see that it's an optical duplicate with an unpaired mate--further reason not to consider it a valid alignment.

ADD COMMENT
0
Entering edit mode

The file (test_chr22.txt) is here.

Thanks.

ADD REPLY
0
Entering edit mode

out of curiosity, if you down-vote, could you also let me know why. I'm happy to be enlightened.

ADD REPLY
0
Entering edit mode

Sorry, it was a mistakenly hit. I supposed to hit up-vote. But my brain is not working well today. I will correct it. So can you get the real sequence within the region? Any language is fine.

ADD REPLY
0
Entering edit mode

zhs... my answer already shows how to get the real sequence with samtools faidx.

ADD REPLY
0
Entering edit mode

But it is from gh18/hg19 reference, my bam file is from a patient. Can I get the sequence from the patient based on my output file? Maybe it is a wrong question because my ignorance of genome area?

ADD REPLY
0
Entering edit mode

In that case, ask the patient if they have the reference file that they used to do the alignment.

ADD REPLY
0
Entering edit mode

I still don't understand it. I want to write C# code to extract the sequence by "FLAG" and start position from the output file. Suppose a region is given, can I use substring method in c#/java from

1113 14430092 CCTCGCGGGACTGGTATGGGGACGGTCATGCAATCTGGACAA

Such as

string s = CCTCGCGGGACTGGTATGGGGACGGTCATGCAATCTGGACAA;
if FLAG ==some condition
        Then
          sequence = s.Substring(start,end);
ADD REPLY

Login before adding your answer.

Traffic: 1445 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6