Question

How To Improve Mapping Quality By Bowtie2 For Paired End Reads Of Illumina [Hi-C Librery 100+100]

2

Entering edit mode

10.9 years ago

Raghav ▴ 100

hello every one, I have run

./bowtie2 -fr -x ATREFERENCE -q -1 R1.fastq -2 R2.fastq -S T2.sam

4088543 reads; of these:

4088543 (100.00%) were paired; of these:

1903298 (46.55%) aligned concordantly 0 times
1545760 (37.81%) aligned concordantly exactly 1 time
639485 (15.64%) aligned concordantly >1 times
----
1903298 pairs aligned concordantly 0 times; of these:
  267481 (14.05%) aligned discordantly 1 time
----
1635817 pairs aligned 0 times concordantly or discordantly; of these:
  3271634 mates make up the pairs; of these:
    2469813 (75.49%) aligned 0 times
    463625 (14.17%) aligned exactly 1 time
    338196 (10.34%) aligned >1 times

69.80% overall alignment rate

Here I did not use -I and -X parameter because I am unable to interpret these parameters,though I read the manual but did not correlate -I and -X with improving mapping quality. any one can help me interpret and correlate -I and -X parameters with reference to improve mapping quality. basically I am interested in chimeric regions of chromosome because my library is Hi-C. suggestion are always welcome.

bowtie2 paired reads illumina mapping • 15k views

ADD COMMENT • link updated 10.8 years ago by lh3 33k • written 10.9 years ago by Raghav ▴ 100

zx8754 · Answer 1 · 2013-06-11

4

Entering edit mode

10.9 years ago

Istvan Albert 100k

Those parameters are used to determine if the mate pair is concordant or not.

That will not affect the count for the total mapped reads only how many are reported as matching in the proper pair.

Increasing the insert size limit will increase the number of concordant mapping count and reduce the discordant mapping count.

ADD COMMENT • link 10.9 years ago by Istvan Albert 100k

0

Entering edit mode

Dear , If we run Bowtie2 at default parameters then who do we know that how many parameters it included actually. through which we can correlate our mapping results. one more thing if we got 20% aligned exactly one time and suppose 16% align more than one times, If I would quantify this, each reads of 20% data UNIQUELY COMPLETELY [uniquely means each reads would be mapped exactly one coordinate of particular chromosome and completely means if read length is 100nts then total 100 nts would be align in continues manner (?)] [is it explainable further??? ]

if we talk about 16% of reads which aligned more than one position, there may be several possibilities: 1. few reads may be overloaded with 20% of reads which are mapped exactly ONE time. 2nd possibility, fraction of a read map different different location with unknown insert size. [any other possibility]

how could I get depth of reads which fall on a particular location of chromosome? what is the maximum occurrence reads adhering at same position at particular chromosomal location? in next step, apart from SAM tool is any other way for downstream analysis ???

ADD REPLY • link updated 4.4 years ago by zx8754 11k • written 10.8 years ago by Raghav ▴ 100

score 2 · Answer 2 · 2013-06-11

In a Hi-C library you will be expecting lots of discordant read pairs, I think, so I wouldn't worry about trying to reduce the number of discordant mappings. I'd set -I and -X to reflect the actual distribution of insert sizes in your library, if you know it, or you could estimate it from the set of read pairs that mapped concordantly in this output.

zx8754 · Answer 3 · 2013-06-13

2

Entering edit mode

10.8 years ago

lh3 33k

I guess bowtie2 prefers a PE alignment where the insert size is within [-I,-X]. It may give the read pair higher mapping quality, for example, when one end is repetitive but the pair as a whole is "unique". However, as you are using Hi-C, you should specify -I and -X based on the true insert size distribution, as is suggested by Chris above. Nonetheless, if it were me, I would treat the reads as single-end to avoid artifacts introduced by the mapper. You will lose sensitivity, but to me reducing artifacts is more important for your application.

ADD COMMENT • link 10.8 years ago by lh3 33k

0

Entering edit mode

as I guess the insert size is in between .1 to 2kb then mapped with parameter -I 100 -X 2000 and got 13.61% reads aligned concordantly exactly 1 times, 13.61% reads aligned concordantly >1, and 26.02% reads aligned discordantly 1 time. but it seams very crude way to do this, I agree with your suggestion and done all the mapping as single-end at default parameters, but do not have any idea how customize out put files, I am willing to go with paired end out put as well as single end out puts. now I am looking good tool which help me to downstream analysis and I think samtool is good one for next step. I honestly speaking I am totally unaware with further downstream analysis. it would be helpful if shade light on further downstream analysis steps.

ADD REPLY • link updated 4.4 years ago by zx8754 11k • written 10.8 years ago by Raghav ▴ 100

zx8754 · Answer 4 · 2013-06-12

0

Entering edit mode

10.8 years ago

Raghav ▴ 100

today a whole day I run the bowtie2 for different parameters at least I have checked it for 30 times. and finally be able to got some interesting result without increasing alignment rates [as Istvan Albert has said ] I checked and proved it for my own satisfaction . Thank you :)

I was trying to share my excel sheet here but unfortunately unable to past. but here, I am showing my few results of bowtie2 with parameters and results:

parameters         aligned concordantly exactly 1 time         aligned concordantly >1 times        aligned discordantly 1 time

./bowtie2 -M 1 -t -I 0 -X 250            1441475 (35.26%)                598561 (14.64%)                  367853 (17.96%)
./bowtie2 -M 1 -t -I 0 -X 1000         1542039 (37.72%)                662416 (16.20%)                  267002 (14.17%)
./bowtie2 -M 1 -t -I 00 -X 1500       1194619 (29.22%)                551351 (13.49%)                  608430 (25.97%)
./bowtie2 -t -I 100 -X 2000               556355 (13.61%)                  556355 (13.61%)                 608236 (26.02%)

I have got 26.02% discordant reads at parameter ./bowtie2 -t -I 100 -X 2000. but still unable to say what is an optimum parameter :( any suggestions

ADD COMMENT • link updated 4.4 years ago by zx8754 11k • written 10.8 years ago by Raghav ▴ 100

0

Entering edit mode

Remember that the words *concordant/discordant * don't mean that the data is good/bad. They simply mean that the distance between the pair different than the expectation. If you have a translocation in the genome, or a large scale insertion or deletion the pairs mapping to these locations would always come out as discordant no matter how many times the experiment was repeated.

In addition a reads that maps to multiple location does not mean that the data is wrong, it is just that the genome has repetitive structure. You may wish to focus on unique regions but there is no guarantee that the effect that you study is correlated with unique regions.

ADD REPLY • link 10.8 years ago by Istvan Albert 100k

0

Entering edit mode

i agree, concordant or discordant does not revel goodness of data set. In Hi-C, molecular biologist tried to capture chromatin chromatin interaction, those are basically enhancer promoter regions that interact through transcription factors and associated proteins, these captured chromosomal regions further go for NGS, and we got it back in form of reads, here we simply mapped it our reference genome and got an out put with 20% aligned exactly one time and suppose 16% align more than one times, 64% unmapped reads. Here, we do not know, where I would get our chimeric reads[those reads which aligned at different location of same chromosome/different chromosome] which part of mapping (reads) will cover chimeric regions?? but one thing, chimeric reads must be present within 36% data but how do we capture it, it is big challenge in front of me.

ADD REPLY • link updated 4.4 years ago by zx8754 11k • written 10.8 years ago by Raghav ▴ 100