Hello,
Sorry in advance for my English.
I want extract all positions of non-sequenced nucleotides for each chromosomes from a genome sequence file (Fasta).
To do that, I have a fasta file with the genome sequences, like this:
>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNtaaattgttt
taaattgtttctgtttgcagttgacatgatcttatatatagaaaacacca
ataactctgccaaaaaatttagaattcataaatgaatttagtaaagttgc
I want find all N positions and obtain this position in bed format, e.g.:
chr1 1 200 ...
I didn't found how to do that in Bedtools.
If you have any idea, could you help me?
Thank you
may be you're looking for http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gap.txt.gz ?
Thank you, I think that resolve my problem.