Strange Pattern in Bam File
5
4
Entering edit mode
7.9 years ago
Can Holyavkin ▴ 240

We have encountered strange pattern in bam file that is generated from amplicon sequencing. (Nextera XT, illumina MiSeq).

As you can see at the middle of bam file, rectangular-shaped coverage is formed. Half of the reads finished at the right side of the rectangular; while other half of reads finished at the left side of rectangular.

What can cause such abnormal coverage distribution? Any structural variant?

enter image description here enter image description here enter image description here enter image description here

bam alignment structural variant • 2.6k views
ADD COMMENT
7
Entering edit mode
7.9 years ago
Can Holyavkin ▴ 240

We solved the real cause of this pattern. It's duplication event of 200 bp area in that rectangular area. We BLAST the unmapped parts of reads at the ends of rectangular area. And we found out that they are perfectly matched to region inside of this area.

enter image description here

ADD COMMENT
1
Entering edit mode

thanks for following up, interesting

ADD REPLY
4
Entering edit mode
7.9 years ago
John 13k

You have an indel. Realign yo' reads with IndelRealigner. enter image description here

ADD COMMENT
2
Entering edit mode
7.9 years ago
igor 13k

If it's amplicon sequencing, wouldn't you expect uneven coverage that corresponds to your amplicons?

Also, these are not randomly sheared libraries. Nextera transposase cuts at certain site. You should expect to see more fragments at specific sequences.

ADD COMMENT
1
Entering edit mode

Dear igor, Actually you are right about uneven coverage of Nextera kits. We see such changes in especially in GC rich sites. However we did not come across such pattern in exome sequencing. As you know both of exome sequencing kits (Nextera Rapid Capture Exome) and Nextera kits use same transposese.

ADD REPLY
2
Entering edit mode
7.9 years ago
Can Holyavkin ▴ 240

I repeated the alignment and realignment steps with BWA and GATK. The results are quite different now.

The upper image was taken after alignment+realignment of CLC genomics Workbench. The middle image was taken after alignment with BWA and realignment with GATK. The image in bottom was taken after alignment with BWA.

Could it be due to partial duplication of this segment to somewhere else?

enter image description here

ADD COMMENT
1
Entering edit mode

Hmm, well that certainly did something, but your pileup still looks weird.

Now i'm thinking perhaps it wasn't an indel, but some contamination. I would definitely start by taking the sequence of DNA that mapped there and BLASTing it. I also would consider throwing up a mappability track for your reference genome to see if mappability in that region is lower than usual.

ADD REPLY
0
Entering edit mode

Thank you John. Now I will check all rest of these trimmed reads and see that if they are aligned to somewhere else. However, I couldn't understand that what kind of contamination may cause this pattern.

ADD REPLY
1
Entering edit mode
7.9 years ago

Could it be cDNA contamination, coupled with an isoform that isn't in your gene track? Check ensembl to view known isoforms, and look for soft-clipping that matches up with the previous or next exons.

Otherwise, yeah, a repeat pileup is a good guess.

ADD COMMENT
1
Entering edit mode

Dear Chris, thank you for your reply. But we are not expecting cDNA in our sample. It's only PCR amplified (long-range primer set) products from genomic DNA. Also, I tried to remove repeats with built-in module of CLC Genomics Workbench. Unfortunately, it didn't change the coverage pattern at all.

ADD REPLY

Login before adding your answer.

Traffic: 3178 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6