Validation in paired end data alignment
0
0
Entering edit mode
6.9 years ago
micro32uvas ▴ 10

Hello Everyone!

I am dealing with whole genome reseq data from Illumina platform for paired end data. Here's the flow,I've been following:

  1. Fastq-->Sam (BWA mem with -M switch)
  2. Sam-->Bam (Samtools view -b -S with F-2308 switch, which just didnt worked out for mate pairs, so i shifted to -F 2304)
  3. Sorted Bam as per coordinates
  4. Added Read Groups by Picard's add or replace Read groups
  5. validated by validateSamFile of picard, (After replacing -f2308 with -F 2304, the current Bam file gave no errors with the mate pairs and file was validated successfully)
  6. Now i removed duplicates by mark duplicates by picard.
  7. Validated again; now it shows

Error Type Count ERROR:MATE_NOT_FOUND 180154

Now I cant get this thins straight, Help is appreciated

Samtools view -F2304 picard alignment • 1.9k views
ADD COMMENT
0
Entering edit mode

Is there any chance the fastq files you started off with were not in sync (i.e. they may have been trimmed separately getting the order of reads in the file out of sync)?

ADD REPLY
0
Entering edit mode

I doubt that, I got paired end data, mapped both reads with 99.14% coverage. Here's the flagstat of he data

 253528473 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
251345729 + 0 mapped (99.14%:-nan%)
253528473 + 0 paired in sequencing
126740748 + 0 read1
126787725 + 0 read2
243932175 + 0 properly paired (96.21%:-nan%)
250098310 + 0 with itself and mate mapped
1247419 + 0 singletons (0.49%:-nan%)
5322999 + 0 with mate mapped to a different chr
2474188 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLY

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6