I am working on an assembly of a genome and currently am trying to annotate and visualize the regions of consecutive Ns. I would like to see the regions of my newly assembled genome, that are gaps (NNNNn) .
The way i tried to do it is with Letterfrequencyinslidingwindow command of Biostrings, but it takes forever (more than half an hour) for 1 scaffold, and have many of them. Later on, I make a dataframe out of the output matrix and try to plot it in order to see the regions where Ns are consecutive. Same goes for plotting it. It really takes a lot of time.
The command I use is:
Freq_N_758 <- sapply(chromium.assembly["758"], letterFrequencyInSlidingView, 1, "N")
I am sure that I miss a very important point here and do it the wrong way. What is the correct way to do it in R?
Many thanks, Alex