Biostar Beta. Not for public use.
Retrieving sequences from Ensembl Archive
0
Entering edit mode
2.0 years ago
biostR • 0

Hi,

I am looking for a way to retrieve DNA sequences from Ensembl May 2017 archive, based on coordinates. I thought using Biomart package would be useful for getting DNA sequences, however, it did not work. Apparently, sequence type (seqType, type) is required for obtaining a sequence using getSequence function.

For example:

  ensembl<-useMart(host="may2017.archive.ensembl.org",
                     biomart="ENSEMBL_MART_ENSEMBL",
                     dataset="hsapiens_gene_ensembl")    
    seq<-biomaRt::getSequence(chromosome="X", start =  100639991, end = 100644991 , mart=ensembl )

This gives the following error:

Error in biomaRt::getSequence(chromosome = "X", start = 100639991, end = 100644991,  : 
  Please specify the type of sequence that needs to be retrieved when using biomaRt in web service mode.  Choose either gene_exon, transcript_exon,transcript_exon_intron, gene_exon_intron, cdna, coding,coding_transcript_flank,coding_gene_flank,transcript_flank,gene_flank,peptide, 3utr or 5utr

Is there a nice way for getting the DNA sequences of a large list of genomic coordinates?

Thank you very much.

ADD COMMENTlink
0
Entering edit mode

Did you check the documentation? Sequence type genomic is one of the allowed options.

ADD REPLYlink
0
Entering edit mode

biomaRt v2.32.1 is installed which does not allow "genomic" as the seqType. If I try I get the following:

Error in biomaRt::getSequence(chromosome = "X", start = 100639991, end = 100644991,  : 
Please specify the type of sequence that needs to be retrieved when using biomaRt in web service mode. 
Choose either gene_exon, transcript_exon,transcript_exon_intron,
gene_exon_intron, cdna, coding,coding_transcript_flank,coding_gene_flank,transcript_flank,
gene_flank,peptide, 3utr or 5utr
ADD REPLYlink
0
Entering edit mode

Tagging: Mike Smith to see if he can help.

ADD REPLYlink
2
Entering edit mode
14 months ago
EMBL-EBI

BioMart is gene-centric, it cannot get sequences of genomic regions. The easiest way to get what you need is using the REST API archive with the POST sequence/region endpoint. This will allow you to retrieve multiple sequences, and you can code around it in any language.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3