I want to do variants calling for hundreds original WGS bam files. There are three steps implemented with Picard before variants calling with GATK. 1. Change bam header (AddOrReplaceReadGroups) 2. Sort bam (SortSam) 3. Reorder bam (ReorderSam)
In the step2, just for one bam file, an error came out. Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Read name 34222060, No real operator (M|I|D|N) in CIGAR.
Then I checked the 34222060 record in the bam file and got the cause by google Why "No real operator (M|I|D|N)" in picard?. I think I have the same problem in the post. I have tried the following solutions but failed. 1. clean the bam file with CleanSam of picard. 2. filter the bad cigars with CountReads of GATK. Failed because of the index loss. Then I used BuildBamIndex of picard to try to create index file. Failed again because of the above same reason.
Any advice or suggestion will be appreciated.
…
34222060 163 1 220186167 70 101S = 220186233 167 GGAAGGACCAGAGGGCCTCCAGATCCCCTTCACATACTTCAACCAGAACAGCTATGTTTCTGTTTTATTTATTGGGGTTTAATTCTGGTAGCACTAAGTGG CCCFFFFFHHHHHJJJJJJJJJJJIJJJIJJIJJJJJJJJJJIJJJIIJIFGIJJJGIIJJJDHIJHHHHHHHFFFDAABDDDEDEEDCCCDCDCDDDDCDRG:Z:sample NH:i:1 NM:i:0
34222060 83 1 220186233 60 101M = 220186167 -167 TTTTATTTATTGGGGTTTAATTTTCAAGAAAACTTTCACTGGAAGGAAGTCTCCTGATTTGTGGAGTGGGGAGAGAAGTCTCTACATACTTTATTAGCTGA DDCDDDDDDBCADFFFFFDHHHHEHGGIIIG@HGJJJIIGHF<JJIEIIHJJJJJJJJIJIJJJJJIJHJJJJJJJJJHEJIJIJIHGHHHHHFFFFFCCCRG:Z:sample NH:i:1 NM:i:0
…
Thanks for so fast reply. The GATK version is GenomeAnalysisTK-3.7. The picard version is picard-tools-2.4.1. It is not me done the alignment, so I do not know the version of read-mapper.
My current goal is to remove or ignore the badcigar.
by 'read-mapper' I meant something like 'bwa' , 'bowtie' ... etc...
how ?