Entering edit mode
5.6 years ago
bruce.moran
▴
960
I am calling variants in RNAseq and exome data. Aligning with STAR, BWA MEM respectively. Get to MarkDuplicates and STAR BAMs give:
Ignoring SAM validation error: ERROR: Record 1, Read name HWI-ST968:151:C1MMAACXX:1:1112:9825:81904, RG ID on SAMRecord not found in header: XYZ PL:ILLUMINA SM:PQR DS:123 CN:NOP LB:LANE_X DT:2018-10-01T20:28:04
But looking at this read there is a valid RG:Z field
HWI-ST968:151:C1MMAACXX:1:1112:9825:81904 163 1 27074 60 23S68M2S = 27089 68 CCAGCATTCCCGCCCGGAAAACTCCACGGACAGAAGAGCCCGGCCGGCCACAGTCCATGGGGTCTCAAAGAGTGGGACATGACTGAGTGACCA CCCFFFFFHHHHHJJJJJJJJJGIGJIJIIJIJIJHIJJJJIEHHFDDDD@AADCCADDDDDDDDDEDDDDD>CBDDDDDDCDDCCD@CCCCD NH:i:1 HI:i:1 AS:i:103 nM:i:8 NM:i:5 MD:Z:3T7G5T1A30C17 jM:B:c,-1 jI:B:i,-1 rB:B:i,24,91,27074,27141 MC:Z:53M40S RG:Z:XYZ PL:ILLUMINA SM:PQR DS:33_L2 CN:NOP LB:LANE_X DT:2018-10-01T20:28:04
And the header of the BAM, specified by --outSAMattrRGline "$RGLINE"
includes:
@RG ID:XYZ PL:ILLUMINA SM:PQR DS:123 CN:NOP LB:LANE_X DT:2018-10-01T20:28:04
Thanks for the input. As soon as I posted I found the issue, as usual. I did have an @RG header, and it <looked> correct, but I used a tab-delimited variable via
RGLINE=$(echo -e "ID:XYZ\tPL:ILLUMINA")
, whereas STAR wants it as space-separated list. ValidateSamFile is happy with it anyway.