How to get coverage for specific base(s), and for specific mismatch(es) per genomic range
0
0
Entering edit mode
3.2 years ago
lechu ▴ 20

Hi,

I would like to get coverage per a set of genomic ranges, with a little complication that I need coverage over T, G, A, and C provided separately. My idea to do it is first add feature names to mpileup file (with bedtools intersect), and then do something akin to R dyplyr::summarize(). But maybe there is bash alternative for that?

The second step of my struggle is to count specific mismatches based on the mpileup code. This I thought of doing in R, because I know how, but maybe someone could help me to get started with awk on that. A one-liner to count the number of occurrences of "g" in column 5 (see below) and print this number instead in the same column would help to get me started.

Of course, if there is a more efficient way to accomplish the task - let me know (I am sure there is)!

slam_500_spike  390 A   46  .,..,.,.g..,,g,.,,,,.g.,,,,,,,g,,....,,,,,,,.   FFFF:FFFFFFFFFJJFFFJFJJJJFJJJJFFJJJJJJJFFJJJJJ
RNA-Seq • 436 views
ADD COMMENT

Login before adding your answer.

Traffic: 1093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6