Identify upstream and downsteam gene sequences from a large dataset and graph them
1
0
Entering edit mode
5.0 years ago
mb314 ▴ 20

I have a large data set from an eCLIP experiment where the 5' end of my reads represents the nucleotide immediately after the crosslink sites between my protein of interest and the RNA it bound. I've seen that proteins usually crosslink to proteins at U's and I want to see if that's true in my data.

I would like to generate a heat map (or some sort of similar plot) that measures the frequency of each nucleotide 10-20 positions up and down stream of the 5' end of each of my reads from my bam files. Is there a program or script that can do this, or something similar?

alignment heatmap crosslink CLIP-seq • 907 views
ADD COMMENT
0
Entering edit mode
5.0 years ago

You could create a FASTA file of sequences up- and downstream of your 5'-read ends and bring that FASTA file into WebLogo, changing the plot units from bits to probability.

This change in units modifies the plot so that instead of showing the entropy of nucleotides at each position, it shows the frequency of those residues at a percentage of total height. If a U shows up a lot at a position, it will have a higher height than other residues at that position.

To see an example of this, pick any example from here. Render it normally; the default plot units are bits. Then change bits to probability and re-render.

ADD COMMENT

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6