Rscript To Heatmap The Overall Gc/At Content Of Several Sequences
2
0
Entering edit mode
12.0 years ago

The situation is the following:

I have several DNA FASTA sequences that want to know the GC/AT content using for example a sliding window of 10bp. The resulting heatmap it has to represent the overall resulting sequence with the GC/AT content in every 10bp of all sequences (like an average).

For example the first row of heatmap could be GC content and second row the AT content

This can be possible to do it with R? I also thank suggestions in BioPerl.

Thanks for any help you could provide.

gc dna heatmap fasta • 4.3k views
ADD COMMENT
2
Entering edit mode
12.0 years ago
Vikas Bansal ★ 2.4k

For starting, calculate the length of DNA sequences from Fasta file using perl or python or any other (according to your preference) and then use bedtools makewindows for making sliding windows which will give you bed file and then use nuc to compute the GC/AT content of each window. A very good answer by Aaron (he designed bedtools) is here. Now from the output take the column of GC content and calculate the avg (sum/total number of rows) and AT content will simply be 100-GC content (if not in percent then 1-GC). But after that I did not understand why you want to create heatmap as it will not make sense for this kind of data (unless I misunderstood your question, sorry for that) . Just in the case - > you can import your data in R and use heatmap function and you can also use ggplot2 package for heatmaps.

ADD COMMENT
0
Entering edit mode
12.0 years ago

Not sure how long your sequence is, but I almost think a %GC line plot might be easier to look at in this case... And if you do want to make a heatmap (without clustering) in R, try image().

ADD COMMENT

Login before adding your answer.

Traffic: 3222 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6