Aligning MiSeq targeted amplicons (ultra-deep sequencing)
3
6
Entering edit mode
9.6 years ago
Nikleotide ▴ 130

This might sound a very basic question to some of you, but for me it has turned to a dilemma.

I am trying to align MiSeq deep sequencing targeted amplicons to human genome (paired ended).

This is the way I do this:

  1. I check the quality and trim for bad quality and adapter sequences
  2. I align the reads using bwa aln algorithm
  3. Convert the SAI files to SAM (using bwa sampe)
  4. Convert SAM to BAM (using samtools)
  5. Sorting and indexing the bam files (using samtools)
  6. Run GATK's indel realigner
  7. Update BAM files header

I used to do this all the way up until step 5 and recently I added steps 6 and 7 but the results still look the same.

What I get is somehow weird in terms of noises. As you can see in the attached picture for coverages around 1000x I get scattered noises all around my reads.

Am I doing something wrong or this is the what it is supposed to loo when we do ultra-deep sequencing?

Any thoughts will be appreciated.

UPDATE:

Apparently doing steps 6 and 7 doesn't do any good and it actually messes up some of the aligned reads! This I came across when I compared some samples gone through 1-5 and 1-7 steps.

Targeted-Sequencing MiSeq ultra-deep BWA Alignment • 7.4k views
ADD COMMENT
0
Entering edit mode

Hi Nikleotide can you share your code? Thanks

ADD REPLY
1
Entering edit mode
9.6 years ago
dfornika ★ 1.1k
We've done a lot of amplicon sequencing on the MiSeq. Your analysis pipeline is reasonable and those results look fairly typical. The error rate does look a bit high, but an IGV snapshot can be deceiving, depending on how the reads are sorted vertically. Have you looked at the per-base sequence quality using a tool like FastQC or the qrqc R package?
ADD COMMENT
0
Entering edit mode

Thank you very much for the input.

Yes, we do base quality control beforehand (FastQC).

ADD REPLY
1
Entering edit mode
9.6 years ago
Burnedthumb ▴ 90

I have seen this before, and I don't think this is something to worry about. Your data analysis method looks fine. Check the basequality of those of 'weird' bases, they are probably very low. If you run a SNP caller on your bam file, those artifacts will not interfere as most SNP callers take in account the base quality.

ADD COMMENT
0
Entering edit mode

Thank you very much for your input. The problem is even if this noise looks something not to worry about, we do deep sequencing to hunt for very low allele frequency alleles (aiming for 1%). Although I have checked those noises for their frequency at their positions and the quality, what worries me the most is if I call a variant with 1% allele frequency, how much I can trust it knowing the background noise does exist. We are using an in house method for screening bam files for variants.

ADD REPLY
1
Entering edit mode
9.6 years ago
Rad ▴ 810

GATK indel realigner could be sometimes useless or with minor effects, at this level everything depends on the alignments you have. You will need first to check your fastq files, I had similar problems in the past and it was because we had some contaminations, if you want to save a run you would want to clean the data first and then do the alignment.

Also use bowtie2 instead of bwa, and use filter for the length (>80bp) and quality (>30) and take it from there, I am pretty sure this is a quality issue that you have here

ADD COMMENT
0
Entering edit mode

Thanks @Rad. I will try bowtie2 and the trimming strategy you suggested and will let you know.

For now, may I ask what about bowtie2 make it a better choice for this situation?

ADD REPLY

Login before adding your answer.

Traffic: 2455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6