Trouble with BioMart Perl API installation
3
0
Entering edit mode
8.4 years ago
tlorin ▴ 360

Dear all, I am trying to install BioMart Perl API (in order to automatize orthologs retrieval in other species for some gene), and I have followed the instructions here. And I'm almost done! Yet, I have the following error when I try to run my example script:

perl apiExample17112015.pl

junk after document element at line 18, column 0, byte 465 at /home/tlorin/perl5/perlbrew/perls/perl-5.20.2/lib/site_perl/5.20.2/x86_64-linux-thread-multi/XML/Parser.pm line 187.

Here is my example script:

# An example script demonstrating the use of BioMart API.
# This perl API representation is only available for configuration versions >=  0.5
use strict;
use lib '/home/tlorin/biomart-perl/lib/';
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile = "../conf/martURLLocation.xml";
#
# NB: change action to 'clean' if you wish to start a fresh configuration
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
#

my $action='clean';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');


$query->setDataset("drerio_gene_ensembl");
$query->addFilter("ensembl_gene_id", ["ENSDARG00000003732"]);
$query->addAttribute("ensembl_gene_id");
$query->addAttribute("ensembl_transcript_id");
$query->addAttribute("pformosa_homolog_ensembl_gene");
$query->addAttribute("gmorhua_homolog_ensembl_gene");
$query->addAttribute("amexicanus_homolog_ensembl_gene");
$query->addAttribute("cintestinalis_homolog_ensembl_gene");
$query->addAttribute("lchalumnae_homolog_ensembl_gene");
$query->addAttribute("trubripes_homolog_ensembl_gene");

$query->formatter("TSV");

my $query_runner = BioMart::QueryRunner->new();
############################## GET COUNT ############################
# $query->count(1);
# $query_runner->execute($query);
# print $query_runner->getCount();
#####################################################################


############################## GET RESULTS ##########################
# to obtain unique rows only
# $query_runner->uniqueRowsOnly(1);

$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();

I must precise that I am not familiar with Perl and I have no idea of what could happen here. Any idea? Has someone already experienced that?

Thanks a lot for your time!

perl biomart ensembl api • 2.6k views
ADD COMMENT
1
Entering edit mode
8.4 years ago
tlorin ▴ 360

I got it, thanks to @Jean-Karim Heriche:

My xml file was:

because I had pasted what was indicated on the Perl page (when you do a biomart search) at the end of the initial martURLLocation.xml file that was initially only:


<!DOCTYPE MartRegistry>
<MartRegistry>
<MartURLLocation
                    name         = "msd"
                    displayName  = "My BioMart Database"
                    host         = "www.biomart.org"
                    port         = "80"
                    visible      = "1"
                    default      = ""
                     includeDatasets = ""
                    martUser     = ""
>
</MartRegistry>

So I just deleted this first part and it's now working!


<!DOCTYPE MartRegistry>
<MartRegistry>
  <MartURLLocation database="ensembl_mart_82" default="1" displayName="Ensembl Genes 82" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_ENSEMBL" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
  <MartURLLocation database="sequence_mart_82" default="" displayName="Sequence" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_SEQUENCE" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="" />
  <MartURLLocation database="ontology_mart_82" default="" displayName="Ontology" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_ONTOLOGY" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="" />
  <MartURLLocation database="genomic_features_mart_82" default="" displayName="Genomic features 82" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_GENOMIC" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="" />
  <MartURLLocation database="snp_mart_82" default="" displayName="Ensembl Variation 82" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_SNP" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
  <MartURLLocation database="regulation_mart_82" default="" displayName="Ensembl Regulation 82" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_FUNCGEN" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
  <MartURLLocation database="vega_mart_82" default="" displayName="Vega 62" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_VEGA" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
  <MartURLLocation database="pride_mart_1" default="1" displayName="PRIDE (EBI UK)" host="www.ebi.ac.uk" includeDatasets="" martUser="" name="pride" path="/pride/biomart/martservice" port="80" redirect="1" serverVirtualSchema="default" visible="1" />
</MartRegistry>

Thanks again :D

ADD COMMENT
1
Entering edit mode
8.4 years ago

As the error message suggests, you have a malformed XML document. My guess is the culprit is martURLLocation.xml. Check that it is valid XML.

Edit: Just checked what EnsEMBL gives as martURLLocation.xml. There's a white line at the top, this usually causes XML parsers to throw an error.

ADD COMMENT
0
Entering edit mode

See my answer below, it was indeed a problem with this file but not due to initial blank line. Thanks :D

ADD REPLY
0
Entering edit mode
5.1 years ago
mcclintock ▴ 10

Processing Cached Registry: /share/home/software/biomart-perl/conf/cachedRegistries/martURLLocation.xml.cached

Problems with the web server: 302 Found

Gene stable ID Transcript stable ID HGNC symbol
UniProtKB/Swiss-Prot ID

why I got this with nearly the same code?

ADD COMMENT

Login before adding your answer.

Traffic: 2862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6