How to find genes that differ between two genomes
3
1
Entering edit mode
7.6 years ago
efreed ▴ 10

Hi,

I am fairly new to bioinformatics. I have the genome sequences of two closely related bacteria (or, more accurately, both genome sequences are publicly available). One bacterium grows much faster under a particular set of growth conditions than the other. I'm trying to determine what genes may be responsible for this improved growth. How do I compare these two genomes to find out what genes are different between the two? And is there a way to find out what the functions (e.g. GO terms) for the different genes are or will I just get a list of gene names as output?

Thanks for any help you can offer.

genome gene alignment blast • 6.8k views
ADD COMMENT
1
Entering edit mode

It may depend on how "closely related" the genomes are. There are tools like Mauve that can give you a birds-eye view of the genome organization.

If the two genomes you are interested in are in MBGD then that would be one place to start.

ADD REPLY
0
Entering edit mode

Thanks! I tried using Mauve, but got an error when trying to align my two sequences. I'm trying to troubleshoot the error.

Both genomes are in MBGD, so I'll check that out too.

ADD REPLY
0
Entering edit mode

How would you define different? You are looking for point mutations, mutations in functional domains, copy number variants, regulatory variation,...? All are possible sources of difference between the two species. There is definitely a bioinformatics answer to your question, but you need to make sure you ask a "biologically-sound" question ;-)

ADD REPLY
0
Entering edit mode

That's a good question. The overall goal of this project is to engineer the strain that grows less well to see if we can get it to grow better (the strain that grows less well has some other properties that we want). I am trying to put together a list of genes to target for engineering. The method we are using for strain engineering will allow me to make tens of thousands of targeted mutations - point mutations, insertions, and/or deletions - in our strain and then screen for improved growth. I'm basically looking for changes in coding regions, functional domains or regulatory regions that might cause the one strain to grow better. I would also be interested in knowing if there are genes present in the organism that grows better that are not present in the one that grows worse or vice versa.

I am most interested in genes that are involved in membrane structure/composition, redox balance, photosystems, and the carbon monoxide oxidation pathway (which is present in both organisms), since there is some literature to support these genes/pathways playing a significant role in growth in the conditions I'm looking at.

ADD REPLY
0
Entering edit mode

It has been shown that that growth rates in bacteria largely depend on the codon usage and not much on replication related issues. see http://www.ncbi.nlm.nih.gov/pubmed/20090831 . There is also a software grownpred to predict gorwn rates. I would suggest looking at codon usage differences for high expressed genes for start. Genes involved in the condition that you found associated with growth rate could be tested for differing codon use.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion!

ADD REPLY
2
Entering edit mode
7.6 years ago
dago ★ 2.8k

Fast option is to look up DB that store informations about microbial genomes, i.e. IMG

A more detailed analysis would require a bit more time and some command line options. If your strains are related (i.e. same Genus or Species), I would say that the first thing to do is to calculate the genes that these two bacteria shares and the one that are unique. This means to identify orthologous genes. There are plenty of programs to do that out there. You can take a look at this tools: https://omictools.com/pangenomics-category. They are intended to study pan-genomes, but most of them will perfectly suit your use. I personally use GET_HOMOLOGUES, but I read that OrthoFINDER is really good and reliable.

After this analysis, you will have a list of genes that are shared between the two bacteria and a list of genes that are unique. Also, you will see which genes are present in duplicate (paralogues) in your genomes.

However, I am not sure that this analysis will answer your question. Especially if the strain are extremely closely related, you might need to do a more detailed analysis considering gene variants, i.e. SNPs. You might then see that some shared genes show differences in the two strains. You might find that these differences will influence the protein sequences as well (they are non synonymous mutations), indicating that the two strains have different version of shared proteins.

Hope this helps.

ADD COMMENT
0
Entering edit mode

IMG has been helpful for me. Thank you! Still working on getting GET_HOMOLOGUES working.

ADD REPLY
1
Entering edit mode
7.6 years ago
bioinfo17 ▴ 30

use EDGAR, https://edgar.computational.bio.uni-giessen.de/cgi-bin/edgar_login.cgi to compare two or more genomes (you will get a set of core genes and also strain-specific genes which may be of interest). If the genomes are annotated with an appropriate annotation tool, the GOs should be available during comparative analysis using EDGAR. Hope this helps!!!

ADD COMMENT
0
Entering edit mode

EDGAR looks like it would be really useful! Unfortunately, one of the genomes I'm interested in is not in the EDGAR database, so I can't use it.

ADD REPLY
0
Entering edit mode

You can create a private project in EDGAR by emailing them your genomes of interest (it's free) :) They are quick and awesome.

ADD REPLY
0
Entering edit mode
7.5 years ago

If you're reasonably bioinformatics savvy you could annotate both genomes using the excellent Prokka, and then do comparative blasting of all identified genes using Proteinortho to get a nice output CSV of orthologs and singletons.

I am not sure if Prokka is available on a Galaxy machine somewhere.

ADD COMMENT

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6