Define If Chip Peaks Are Located In Exon, Intron, Promoter, 3Prime Utr
3
4
Entering edit mode
11.9 years ago
e.karasmani ▴ 140

Dear All,

I have file which looks like the following

 chromosome  start     end        peak.location     chip.value  target.gene.name   distance.to.gene
  chr1       162990333 162990703     162990519      33    RP11-331H2.3.1               136

Is there a way to define if this peak is in exon, intron, promoter or 3' UTR?

Do you know a way for that? Could you please give some quidelines?

thank you in advance

best regards Lena

chip-seq exon intron peak-calling • 9.7k views
ADD COMMENT
0
Entering edit mode

Do you have a gene model file (GFF) for the species/organism you're working on? Then, its straightforward to extract the corresponding intervals for your peak from the GFF and obtain the type.

ADD REPLY
0
Entering edit mode

no i don't have anything....how can i do that?

moreover isn't any other way to fix my problem? Any package in R?

ADD REPLY
0
Entering edit mode

GFF files are usually available from the same website where you download your reference. What are you working on? You can read more about GFF format here: http://www.sanger.ac.uk/resources/software/gff/spec.html

I haven't worked on ChIP data. So, I can't tell if the format is supposed to have the type annotated. Pablo's solution might be straightforward if you aren't working on plants I suppose.

ADD REPLY
5
Entering edit mode
11.9 years ago

If your species is in the UCSC Browser, you can download bed Files of your regions (Promoters (x nts upstream), introns, exons, UTRs) and then use BEDtools intersectBed to annotate you ChiPseq peaks.

ADD COMMENT
0
Entering edit mode

my species are mm9 and hg18....is there an option in UCSC where you can download the regions (promoter, intron,exon) for ALL the genome????? because I am looking genome wide and not in a specific area....if this is possible could you please be kind to give me some guidelines???? I am a rookie in bioinformatics....thanks

ADD REPLY
2
Entering edit mode

Go to 'Tables' within the UCSC Genome Browser. Select your species and the correct assembly. Under 'Group' you check 'Genes and ...' and under track e.g. RefSeq. Then you have to set 'output format' to 'bed'.

Now, when you push 'get output', you come to a new window, where you select what regions you want to have. For the Promoter, you can set 'upstream' to 5000, or whatever value you want. When you set a name for the output file in the first window, it will automatically download the data, named by your filename. I also recommend to set the 'gzip' flag, to minimize the file size for download.

ADD REPLY
0
Entering edit mode

thank you very much!!!!!!!!!! However then how can I compare my list to the lists that you say in order to identify where my peaks are???? What should I do?

ADD REPLY
0
Entering edit mode

If you overlap your list of peaks with your list of introns, you get back all peaks lying within introns.

With something like 'intersectBed -a peaks.bed -b introns.bed -wa -wb', you can see what introns that would be.

ADD REPLY
0
Entering edit mode

Hi, I would like to ask you something....how can I get the promoter or TSS from the UCSC tables method you describe??? there is no option for promoters. it has the follwing options only

Whole Gene
Exons

Introns

5' UTR Exons
Coding Exons
3' UTR Exons

what is the difference between coding exons and exons? Exons

ADD REPLY
0
Entering edit mode

Since there is no exact definition where a promoter starts, I recommend to use something like 5000 bases upstream of each gene (Upstream by XXX bases)... but feel free to take any other number, you like to! The TSS is the always the 5' end of your gene (gene on positive strand -> 2nd col in BED, gene on negative strand -> 3rd col in BED). The coding exons contain the region of all exons within the coding region (CDS). Since UTRs can be spliced (untranslated region =/= coding sequence), UCSC distinguished between the complete exons (Exons) and 5'-, coding- and 3'-exons. Looks like this: 1---1/....../2--=====2/.........../3=============3/......./4======---4/..../5-------5 (=CDS, -UTR, .intron). When you now look at the second exon, there are parts from the 5' UTR within the exon, that would be the Exon selection, when you select coding exon, you would get the '=' part and for 5' UTR you would get the '-' part. I hope that was not too confusing now! ;)

ADD REPLY
3
Entering edit mode
11.9 years ago
Pablo ★ 1.9k

You can use SnpEff (http://snpeff.sourceforge.net/) in BED mode. For instance, if your sample is human (hg19):

# Dowloand the database:
java -jar snpEff.jar download -v hg19

# Annotate your file 'chip.bed'
java -jar snpEff.jar eff -v -i bed -o bed hg19 chip.bed > chip.eff.bed
ADD COMMENT
0
Entering edit mode

isn't that for SNPs (presumably of width 1)?

ADD REPLY
0
Entering edit mode

Sorry if I totally misunderstood your point, but the op has asked where the peak is located, isn't it?

ADD REPLY
0
Entering edit mode

yes peaks can be of varying widths (370bp in the example above) and might span more than one feature. I'm not sure snpeff is expecting this.

ADD REPLY
0
Entering edit mode

Yes, it works for this. I created the BED feature specially for Chip-Seq analysis.

ADD REPLY
0
Entering edit mode

For reference, it seems like @Pablo is the author of snpEff.

ADD REPLY
3
Entering edit mode
11.9 years ago

If you'd like to use R/Bioconductor, you might try these packages

ADD COMMENT
0
Entering edit mode

thanks i will try both packages and if i have any question i will ask you!!!!! best regards Eleni PS: eisai ellinas?

ADD REPLY
0
Entering edit mode

could you please help me about how can i use the variant annotation to define from my data.frame where the peaks are located?

ADD REPLY
0
Entering edit mode

You'll want to convert the peaks you have stored in the data.frame to a GRanges object. The vignette I linked to for the package has an example of what to do from there.

ADD REPLY
0
Entering edit mode

can you please help me with the ChIPpeakanno??? if you can check that post i would be grateful http://www.biostars.org/post/show/45636/question-about-chippeakanno-and-iranges/#45638 thank you very much!!

ADD REPLY

Login before adding your answer.

Traffic: 2634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6