How to further investigate into genomic island with only hypothetical proteins
1
0
Entering edit mode
7.3 years ago
mschmid ▴ 180

We sequenced and assembled a Pseudomomas species with PacBio and Illumina and got one nice circularized high quality contig. Then we did submission to NCBI and got the annotation from NCBI. So far, so good.

To get further properties of the bacterial genome I also did an analysis for genomic islands. I find one very big GI (about 50kb) with extremely different GC content. There are a lot of genes called in the GI but all of them are hypothetical proteins (except one, which is a transposase).

I blasted (blastn and blastx) the region and I get NO hits (except the transposase). So this piece of DNA seems to be pretty interesting.

What follow-up strategies would you suggest to get to know more about this genomic island?

EDIT: The GI might be related to the following properties of the bacterium: 1) It is somehow a phytopathogen (or at least kills some parts of plants) 2) It is antagonistic to some type of bacteria

annotation genomic island bacteria • 1.6k views
ADD COMMENT
0
Entering edit mode

You must have a specific interest in this Pseudomonas. Is that genomic island contributing to that interest (by any chance or you know otherwise already)? You could try to look to see what happens by deleting the island for one.

But before you get deep into investigating this you are sure that it is actually present (i.e. you can PCR it (or parts of it) etc) from the strain and that it is not a contaminant that got introduced in some step of the sequencing process.

ADD REPLY
0
Entering edit mode

It's there for sure :-)

I am doing the bioinf analysis for some lab people. So we first want to have more hints what it could be. Based on that there might be follow-up experiments.

ADD REPLY
0
Entering edit mode

Have you (or have you not) excluded the possibility of it being related to the original interest the lab had in this strain? If it is not related to that interest then you may be embarking on a goose chase :)

ADD REPLY
0
Entering edit mode

Check out the EDIT in the original question :-)

ADD REPLY
2
Entering edit mode
7.3 years ago

As I am sure you have already realized, this genomic island is likely one that has been subject to lateral gene transfer (also known as horizontal gene transfer). It is in effect foreign DNA incorporated into the genome, which is why the GC content differs from the rest of the genome. The fact that the region contains a transposase makes this even more obvious.

By running BLASTN and BLASTX, you have already done the most obvious analysis. I would probably also run a BLASTP with the translation products of the called genes just to be sure, although I do not expect that to reveal anything new. Also, just to be sure, I would them all through Pfam to search for domains in case the NCBI annotation pipeline missed something, bug again I do not have high expectations.

Except from the transposase, you are thus likely stuck with a bunch of hypothetical protein-coding genes with no homology to things sequenced before. So what can you do with them?

The first thing I would do would be to make a BLAST database of the genes within the region and use BLASTP to do all-vs-all search of the proteins within the region. You may find that although they are not similar to things sequenced elsewhere, some of them are duplicates of each other.

You can also run a number of sequence-based prediction tools over the sequences to characterize them. Do they look like they may have a signal peptide and be secreted proteins (e.g. SignalP)? Do they look like they might be transmembrane proteins (e.g. TMHMM)? Or might they be intrinsically disordered proteins (e.g. IUPred)? Do they contain low-complexity regions (which might get masked in BLAST searches and thus why you get no matches)?

Beyond that, you can look at simple statistical characterization of them: protein length distribution, amino acid composition etc.

For inspiration, you may want to have a look at this old paper of mine, where we did similar analyses: Analysis of two large functionally uncharacterized regions in the Methanopyrus kandleri AV19 genome

ADD COMMENT
0
Entering edit mode

Many thanks! I will try the strategies you suggested and give feedback.

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6