Pysam prints wrong outputs
0
0
Entering edit mode
5.0 years ago
vaslanzadeh ▴ 20

Hi, pysam prints me reads that has different chromosome (ref_ID) than the one that I specify with samfile.fetch("chromosome", 100, 20000). The following is my example code:

import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb")
reads = samfile.fetch("chromosome", 100, 20000)
for item in reads:
    print (str(x))

When I use samfile.fetch("X", 100, 20000) it prints reads that their chromosome number is 10, this is one of the reads for example:

K00114:267:H3YL7BBXX:4:2122:14874:29659 163 10  130 255 14M83903N136M   10  84334   150 GAAAATGAAATAAGCAAAGTTAGGACTCGTAAAAATGGCAATCAAACCAAGAACGAAGGGCAAAACGTACTCCTCAAGATCGGTGGGTTCGCAGTGGTTCAACAGGCTTGGTTTCAAGCAGAACAAGTACGGAACTTGTAAATTTTTGTC

But when I use samfile.fetch("IX", 100, 20000) it prints reads on chromosome 4!, one example:

K00114:267:H3YL7BBXX:4:1124:28980:37712 419 4   17  3   22S12M86722N116M    4   86873   128  TGCAACGTCCACAAGCAGTTCTCACCACACCACATCAACAAGTTCTTACACATCTTCTACTTACACTGCACAAATATCTTCTACCTCCGCTGCTGCTACTTCTTCTGCTCCAGCAGCCCTGCCAGCAGCCCATAAAACTTCATCTCACAA

or samfile.fetch("V", 100, 20000) print reads for chromosome 6! one example:

K00114:267:H3YL7BBXX:4:1201:16346:16805 419 6   137 0   148M2S  6   217 148 CAAACACTAAATCAAAACAGTGAAATACTACTACATCAAAACGCATATTCCCTAGAAAAAAAAATTTCTTACAATATACTATACTACACAATACATAATCACTGACTTTCGTAACAACAATTTCCTTCACTCTCCAACTTCTCTGCTCTA

There is something strange here, does anybody knows what is the possible source of this error?

alignment RNA-Seq • 1.3k views
ADD COMMENT
0
Entering edit mode

What are the reference sequences at the bam header?

ADD REPLY
0
Entering edit mode

The bam file contains mapped RNAseq data from S. cervesiae

This is the heather:

HD     VN:1.4  SO:coordinate
@SQ     SN:I    LN:230218
@SQ     SN:II   LN:813184
@SQ     SN:III  LN:316620
@SQ     SN:IV   LN:1531933
@SQ     SN:IX   LN:439888
@SQ     SN:Mito LN:85779
@SQ     SN:V    LN:576874
@SQ     SN:VI   LN:270161
@SQ     SN:VII  LN:1090940
@SQ     SN:VIII LN:562643
@SQ     SN:X    LN:745751
@SQ     SN:XI   LN:666816
@SQ     SN:XII  LN:1078177
@SQ     SN:XIII LN:924431
@SQ     SN:XIV  LN:784333
@SQ     SN:XV   LN:1091291
@SQ     SN:XVI  LN:948066
@PG     ID:STAR PN:STAR VN:STAR_2.5.3a  CL:STAR   --runThreadN 16   --genomeDir ../indexes/star/   --readFilesIn ../vaslanzadeh/FCH3YL7BBXX_L4_HKRDYEAwzxNAAARACPEI-201_1.fq   ../user/FCH3YL7BBXX_L4_HKRDYEAwzxNAAARACPEI-201_2.fq      --outFileNamePrefix wt_1_2pas_   --alignIntronMax 100000   --sjdbFileChrStartEnd wt_1_SJ.out.tab   wt_2_SJ.out.tab   slow_1_SJ.out.tab   slow_2_SJ.out.tab   fast_1_SJ.out.tab   fast_2_SJ.out.tab   
@CO     user command line: STAR --runThreadN 16 --genomeDir ../indexes/star/ --sjdbFileChrStartEnd wt_1_SJ.out.tab wt_2_SJ.out.tab slow_1_SJ.out.tab slow_2_SJ.out.tab fast_1_SJ.out.tab fast_2_SJ.out.tab --alignIntronMax 100000 --outFileNamePrefix wt_1_2pas_ --readFilesIn ../user/FCH3YL7BBXX_L4_HKRDYEAwzxNAAARACPEI-201_1.fq ../user/FCH3YL7BBXX_L4_HKRDYEAwzxNAAARACPEI-201_2.fq
ADD REPLY
0
Entering edit mode

The numbers match the order of the chromosomes in the header. I would make sure it's okay by looking at the reads you got using samtools view |grep [read_name]

ADD REPLY
0
Entering edit mode

This is the output for: samtools view | grep "K00114:267:H3YL7BBXX:4:1124:28980:37712"

K00114:267:H3YL7BBXX:4:1124:28980:37712 419 IX  18  3   22S12M86722N116M    =   86874     86966 TGCAACGTCCACAAGCAGTTCTCACCACACCACATCAACAAGTTCTTACACATCTTCTACTTACACTGCACAAATATCTTCTACCTCCGCTGCTGCTACTTCTTCTGCTCCAGCAGCCCTGCCAGCAGCCCATAAAACTTCATCTCACAA  AA<AFJF-F<JJA7--FF<F7FF-F-<A-7A7F-7-7FJ---<AFF<<F<F<7A--<<FF--<J<F--AJ-F--A--77AAJ7-7AJJ7A--7--777A77AAJJAAJ-AF<-F<-F7--A<JA-A))7)-)A--7-77AJA-<-<-AA-  NH:i:2  HI:i:2  AS:i:206    nM:i:10

The chromosome name is correct this time, it is IX not 4. Should I update my bam file's heather?

ADD REPLY
0
Entering edit mode

So it's indeed mapped to IX chromosome

ADD REPLY

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6