CLIP-seq peak analysis
1
1
Entering edit mode
5.1 years ago
mb314 ▴ 20

I have a list of ~1500 (over a few samples) chromosome:peakstart:peakend from my recent CLIP-seq experiment. I want to know: 1) What genes each peak maps to 2) What sequences are included in those peaks. Is there a quick way to do this? I only could figure out how to the full CDS, not just the peak.

I was able to find all the genes by inputting my peaks as a csv into UCSC Table Browser, but when I get an output it puts every transcript possible on different lines (ex if my peak mapped to gene X, I would get gene X transcript 1, gene X transcript 2, etc). I would like to copy back this list to my original data in excel, but now the lines don't match up. Does anyone have a recommendation?

Thank you!

CLIP-seq ChIP-Seq Peak Calling • 2.2k views
ADD COMMENT
2
Entering edit mode
5.1 years ago
shawn.w.foley ★ 1.3k

I'd recommend converting your file into bed format then using bedtools to intersect with annotated genes (this intersection will give you the full gene length but it will include gene names). You can also run the bedtools getfasta command to convert the bed file into a FASTA file.

If this is CLIP-seq then you should also have strand information (if you performed a strand specific library prep, if you DO have strand information add a -s flag to the two bedtools commands), otherwise there will be ambiguity as to which gene your peak intersects with.

Convert from Chr:start:end to a bed format:

sed 'y/\:/\t/' peakFile.txt > peakFile.bed

Intersect with some annotated gene file (you should be able to get these from UCSC:

bedtools intersect -a annotatedGeneFile.bed -b peakFile.bed -wa -u > annotatedGenesThatOverlapPeaks.bed

And using the fasta file used for mapping:

bedtools getfasta -fi genomeFasta.fa -bed peakFile.bed -fo peakFile.fa
ADD COMMENT
0
Entering edit mode

Thank you for your help! Very useful.

I got bedtools to work with your suggestions and ran the following command sucessfully:

bedtools intersect -wa -wb -a hg19.bed -b Peaks18.3.bed

However, I tried to force strandedness with "-s" and got nothing.

Do you know why adding -s would prevent it from working? I'm using bedtools/2.26.0.

ADD REPLY
0
Entering edit mode

Do you have strand information (either a + or a -) in the 6th column for both bed files?

ADD REPLY
0
Entering edit mode

Yes, the 6th column has + and - for both files. However I converted my hg19 file from a gtf to a bed to use for this. the hg19 bed file has 9 column. The first 6 looks as they should, but the last three saw things like "unknown exon" and "Parent=NR_204540" rather than "thickStart" , "thickEnd" and "itemRgb." DO you think that makes a difference.

ADD REPLY
0
Entering edit mode

It could, I would cut the file so that it's six columns. If that doesn't work then paste a line from each input and I'll have a look.

ADD REPLY
0
Entering edit mode

I changed hg19.bed to a six column file, but still couldn't get the -s option to work. Here's a line from each of my inputs:

hg19.bed: chr1 11873 14409 eID=DDX11L1 . +

Peaks18.3.bed: chr9 75785164 75785290 18.3 1 +

ADD REPLY
0
Entering edit mode

The format looks right. Did you perform a strand-specific library prep and strand specific mapping of the libraries? It's possible that you mapped the libraries to the incorrect strand, try running with the -S flag (capital s), this will find peaks that map to the opposite strand.

If you do a line count of the intersectBed -s and intersectBed -S they should sum to intersectBed with no strand flag. if the -S flag works I'd wonder if there was an error in your original mapping step, if you still get no lines then there must be a formatting problem.

ADD REPLY

Login before adding your answer.

Traffic: 2419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6