Question

Normalising ChIP-seq data to RPM using a 1 or 2 count for paired-end data?

1

Entering edit mode

9.2 years ago

James Ashmore ★ 3.4k

I want to normalise my paired-end ChIP-seq data (4 samples treated, 4 samples control) to reads per million (RPM). So far my calculation takes the number of reads mapped to a position, divides it by the total number of mapped reads in the sample, and then multiples this by one million. My question is, should the number of mapped paired-end reads be counted as 1 or 2 in the calculation? I believe Samtools reports the total number of mapped reads as half the actual number of paired-end reads which map?

My current pseudo script for normalising to RPM:

mappedReads = samtools view -c -F 4 input.bam
scalingFactor = 1000000 / mappedReads
bedtools genomecov -ibam input.bam -bg - scale scalingFactor -g chrom.sizes > output.bedGraph
bedGraphToBigWig input.bedGraph chrom.sizes output.bigWig

ChIP-Seq normalisation paired-end • 10k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

For paired end data, rpm=fpm (fragments per million). So, it should be counted as one.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Just to clarify, you are saying that each pair of reads should be counted as just one? So in a perfect world where 100% of my reads align, the number of total alignments would be half the total number of reads in the data?

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

Yup. Count a pair of reads as one. For single end reads each read represents a fragment it is sequenced from whereas for paired-end, a pair of sequence represents the fragment it is sequenced from. As simple as that.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

9.2 years ago

Manvendra Singh ★ 2.2k

I think each read should be counted as 1

or

you can try MAnorm for normalization if it suits for your analysis or comparisons

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Initially I just want to create a coverage track for the normalised data.

ADD REPLY • link 9.2 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

Okay, I agree with Fidel

ADD REPLY • link 9.2 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

9.2 years ago

Ian 6.0k

In case you do not already know MACS2 will generate coverage plots for the non-redundant fragments used in the analysis (by default 1 unique fragment per strand) normalised to millions of reads using the --SPMR flag.

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Ian 6.0k

0

Entering edit mode

I think MACS does not use the paired-end data to extend the reads. Does it extend the reads?

ADD REPLY • link 9.2 years ago by Fidel ★ 2.0k

0

Entering edit mode

MACS2 does not need to extend reads as it knows the start and end of the fragment by using the paired end reads information in the BAM file.

ADD REPLY • link 9.2 years ago by Ian 6.0k

0

Entering edit mode

For this you need to use the BAMPE mode of MACS2 which I am not sure it properly works. We tried it and didn't work, also I see numerous complains in the mailing list. Moreover, the bam file has to be sorted by name. I always use MACS2 for peak calling but not coverage tracks.

ADD REPLY • link 9.2 years ago by Fidel ★ 2.0k

Ram · Accepted Answer · 2015-02-06

8

Entering edit mode

9.2 years ago

Fidel ★ 2.0k

You can use deepTools to do the normalisation that you want. The output is a bigwig file. It is quite fast if you have multiple cores.

bamCoverage -b file.bam --normalizeUsingRPKM -o file.bw

The program first extends the read to match the paired-end length before computing the coverage. I opted to count paired end reads as 2 and not as 1 to avoid a bias when a read is not properly paired which could be a significant fraction.

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Fidel ★ 2.0k

0

Entering edit mode

Thank you for the suggestion, I used Bowtie2 to map my reads but indicated to remove discordant and mixed alignments, in this case counting each pair of reads as 1 would be preferred

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

One option is to filter your bam file to include only the first mate, then use it with bamCompare. If all your reads are properly paired this should work.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Fidel ★ 2.0k

0

Entering edit mode

I just pushed a commit to github enabling bamCoverage to select reads based on the SAM flag. Now you can do something like this to get what you want:

bamCoverage -b file.bam --normalizeUsingRPKM -o file.bw --samFlag 64

--samFlag 64 means to select only the first mate.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Fidel ★ 2.0k