This site is a beta test.
Question: How to extract variable nucleotide regions from a list of contigs
0
Entering edit mode
23 months ago
kayrouz.1 • 0

I have a list of about 6000 NCBI contig accession numbers and I'd like to extract a specific 30kb region from each contig. I have, in a separate file, a list of "begin" and "end" indices that represent the region of interest for each contig. Is there a way to retrieve a fasta file of these trimmed contigs using an Entrez query? Given that I'm not a very skilled programmer, I would have just put the sequences in an excel spreadsheet and trimmed accordingly, but the sequence strings are too long to fit in an excel cell. Is there a simple way to do this via E-Utilities?

ADD COMMENTlink 23 months ago kayrouz.1 • 0 • updated 23 months ago Vijay Lakhujani 4.1k
Entering edit mode
0

Do you have a reference genome? If you do you could use bedtools getfasta.

ADD REPLYlink 23 months ago
Sinji
♦ 2.8k
0
Entering edit mode
23 months ago
Vijay Lakhujani 4.1k
India

To the point explanation of bedtools getfasta is here

ADD COMMENTlink 23 months ago Vijay Lakhujani 4.1k

Login before adding your answer.

Powered by the version 1.5.2