Biostar Beta. Not for public use.
Different filters for wheat in biomaRt package and website
1
Entering edit mode
18 months ago

Hello,

I'm looking to download the Wheat TILLING & SNP data from Ensembl from the biomaRt package in Bioconductor. However, I've noticed that the filters available in biomart website are more than what you have in the Bioconductor package. For e.g.

plantsDatabase <- useMart(biomart = 'plants_variations', host = 'plants.ensembl.org')
plantsDatasets <- listDatasets(plantsDatabase)    
mainOrgIndex <- GetDatasetIndex(organism = mainOrganism,plantsDatasets$description)
mainOrgDataset <- useDataset(mart = plantsDatabase,dataset = plantsDatasets$dataset[[mainOrgIndex]])
mainOrgFilters <- listFilters(mainOrgDataset)
mainOrgAttributes <- listAttributes(mainOrgDataset)
attributes=mainOrgAttributes$name[c(1:2,4:6,20:21,24,34:35)]
filter=c('variation_source','variation_set_name','chr_name')
values=list('EMS-induced mutation','EMS (Cadenza)','1A')

The GetDatasetIndex is a nifty function to fetch the index of the organism for which you are querying biomart, in this case 'Triticum aestivum'.

I wanted to filter the data by 'Variant consequence', filter available on biomart web portal but not in the biomaRt package (listFilters for this mart doesn't have this filter). Any pointers?

Best, Sandeep

ADD COMMENTlink
0
Entering edit mode

Tagging: Mike Smith

ADD REPLYlink
3
Entering edit mode
14 months ago
EMBL-EBI

The filter is "so_mini_parent_name". No, it's not obvious.

A cheat you can use: from the web-based BioMart results page, click on the XML button. The coded versions of the filter names will appear in the XML, eg:

<Filter name = "so_mini_parent_name" value = "feature_ablation"/>
ADD COMMENTlink
0
Entering edit mode

Perfect example of why you are vital to biostars. No chance anyone would have figured that one out.

Is functionality available via web BioMart completely equivalent to biomaRt?

ADD REPLYlink
1
Entering edit mode

Yes, it should be the same.

ADD REPLYlink
0
Entering edit mode

Thanks a lot Emily, I'll try it out!

ADD REPLYlink
3
Entering edit mode
13 months ago
Mike Smith ♦ 1.2k
EMBL Heidelberg / de.NBI

Emily's answer is exactly how I go about diagnosing problems with the biomaRt package. Checking the XML via the web interface is always my first port of call for something like this.

I thought I'd advertise the recently added the searchDatasets(), searchFilters() and searchAttributes() functions that try and make finding these a little easier. Rather than simply listing all the available properties for a mart, you can provide a search term and it will find relevant results. For example, to find the name of the dataset you want you could do something like:

> searchDatasets(mart = plantsDatabase, 'aestivum')
            dataset                                                                           description version
12 taestivum_eg_snp Triticum aestivum Short Variants (SNPs and indels excluding flagged variants) (IWGSC)   IWGSC

However they're useless in this instance, since none of the information behind the scenes regarding this filter mentions 'Variant' or 'consequence' so you wouldn't know what to search for!


It's also worth pointing out that the filter you're using isn't a free text filter, but takes a specific set of values (they're provided in a list when using the web interface). You can see the list of possible search terms in R using the function filterOptions() e.g.

filterOptions('so_mini_parent_name', mart = mainOrgDataset)
[1] "[3_prime_UTR_variant,5_prime_UTR_variant,coding_sequence_variant,coding_transcript_variant,downstream_gene_variant,exon_variant,feature_ablation,feature_amplification,feature_elongation,feature_truncation,feature_variant,frameshift_variant,gene_variant,incomplete_terminal_codon_variant,inframe_deletion,inframe_indel,inframe_insertion,inframe_variant,intergenic_variant,internal_feature_elongation,intron_variant,mature_miRNA_variant,missense_variant,NMD_transcript_variant,nonsynonymous_variant,non_coding_transcript_exon_variant,non_coding_transcript_variant,protein_altering_variant,sequence_comparison,sequence_variant,splice_acceptor_variant,splice_donor_variant,splice_region_variant,splice_site_variant,splicing_variant,start_lost,stop_gained,stop_lost,stop_retained_variant,structural_variant,synonymous_variant,terminator_codon_variant,transcript_ablation,transcript_amplification,transcript_variant,upstream_gene_variant,UTR_variant]"

I might need to improve the formatting here!

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3