Unusual samtools mpileup output?
0
0
Entering edit mode
8.8 years ago
quachtina96 ▴ 40

Hi,

I am using samtools mpileup to generate a pileup file, and my output doesn't match the example pileup files I've seen (e.g. the example on the pileup format article on Wikipedia). Do you agree that there seems to be something wrong with my file? If so, any suggestions on how to fix it?

The command I used:

samtools mpileup -B -f chrRCRS.fa input-sorted.bam > input.pileup

First part of the pileup file.

chrRCRS    1    N    22    ^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]g^]g^]g^]g^]g^]g^]g    IIBFFFIIIIIFFFIBIIIFII
chrRCRS    2    N    22    AAAAAAAAAAAAAAAaaaaaaa    FIBFFFIIIIIFFFIBIIIFII
chrRCRS    3    N    22    TTTTTTTTTTTTTTTttttttt    IIFFFFIIIIIFFFIFIIIFII
chrRCRS    4    N    22    CCCCCCCCCCCCCCCccccccc    IIFBFFIIIIIFFFIFFIIFII
chrRCRS    5    N    23    AAAAAAAAAAAAAAAaaaaaaa^]A    IIFFFIIIIIIFFFIBIIIFFIB
chrRCRS    6    N    24    CCCCCCCCCCCCCCCcccccccC^]c    IIFBFIIIIIIFBFIFIIIFIIBB
chrRCRS    7    N    25    AAAAAAAAAAAAAAAaaaaaaaAa^]a    FIF7FIIIIIIFFFIFIIIFIIBFF
chrRCRS    8    N    25    GGGGGGGGGGGGGGGgggggggGgg    FIF0FIIIFIIIFFIFIIIFIIFBB
chrRCRS    9    N    25    GGGGGGGGGGGGGGGgggggggGgg    FIFBFIIIIIIIFFIFIIIFIIFBB
chrRCRS    10    N    26    TTTTTTTTTTTTTTTtttttttTtt^]T    FFF0FFFFFFFIBFFFIIIFIIFBBB
chrRCRS    11    N    26    CCCCCCCCCCCCCCCcccccccCccC    FIFBIIFFIIFIFIIFIIIFIIFBBB
chrRCRS    12    N    27    TTTTTTTTTTTTTTTtttttttTttT^]T    IIF<IIIIIIIIFIIFFIIFIIFBFBB
chrRCRS    13    N    27    AAAAAAAAAAAAAAAaaaaaaaAaaAA    FIIBIIIIFIIIIIIBIIFFIIFBBBB
chrRCRS    14    N    27    TTTTTTTTTTTTTTTtttttttTttTT    IIIBIIIIIIIIFII7FIFBIIFB<FB
chrRCRS    15    N    27    CCCCCCCCCCCCCCCcccccccCccCC    IIIBIIIIIIIIFIIFFFFBFFFB<FF
chrRCRS    16    N    27    AAAAAAAAAAAAAAAaaaaaaaAaaAA    IIIBIIIIIIIIFII<BBBBBFF<7FF
chrRCRS    17    N    27    CCCCCCCCCCCCCCCcccccccCccCC    IIIBIIIIIIIIFIIFIIIFBIF7FFF
chrRCRS    18    N    28    CCCCCCCCCCCCCCCcccccccCccCC^]C    IIIBIIIIIIIIFIIFIIIF0IIFFBFB
mtDNA samtools variant-calling mpileup • 3.5k views
ADD COMMENT
0
Entering edit mode

I am not sure if it is a problem or unknown bases in the reference genome. The reference nucleotide at all positions (third column) that you have listed above is "N". As a result, bases at those positions from aligned reads show mismatch. For example, for position 2, aligned reads have "A" nucleotide. "A" represents mismatch on the forward strand and "a" represents mismatch on the reverse strand. If you keep scrolling down in the mpileup file, you may find non N's in the reference genome and lots of "." and "," for those positions provided that the non-reference sample has no variants for those positions. In case, all you see in the third column is "N"s then there may be some compatibility issue between fasta file and bam file. Try:

  1. First checking the size of the .fai file. Do you have properly indexed fasta file along with the fasta file in the same folder. What is the size of the .fai file?
  2. Check the chromosome names in the sam header and the reference fasta. Are they same?
ADD REPLY
0
Entering edit mode

1) The fai file is one kb and it only contains

chrRCRS    16569    9    70    71

2) If I'm interpreting this correctly, the chromosome name is the same in the header (See below)

@HD    VN:1.4    GO:none    SO:coordinate
@SQ    SN:chrRCRS    LN:16569
@RG    ID:ID18_Father    PL:ILLUMINA    PU:L00    LB:ID18_Father    SM:ID18_Father_L00
@PG    ID:bwa    PN:bwa    VN:0.7.10-r789    CL:bwa mem chrRCRS.fa ID18_Father_exome_mtExtract_1.fq ID18_Father_exome_mtExtract_2.fq
@PG    ID:GATK IndelRealigner    CL:knownAlleles=[(RodBinding name=knownAlleles source=/gpfs/home/quacht/MToolBox/data/MITOMAP_HMTDB_known_indels_RCRS.vcf)] targetIntervals=/gpfs/home/quacht/MToolBox/data/intervals_file_RCRS.list LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null
@PG    ID:MarkDuplicates    PN:MarkDuplicates    VN:1.92(1464)    CL:net.sf.picard.sam.MarkDuplicates INPUT=[ID18_Father.rg.ra.bam] OUTPUT=ID18_Father.rg.ra.marked.bam METRICS_FILE=ID18_Father-metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true TMP_DIR=[/gpfs/home/quacht/test_myMtoolbox_ID18Father/tmp]    PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
ADD REPLY

Login before adding your answer.

Traffic: 2787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6