How to find a number of TE in a set of genomes?

0

Entering edit mode

5.1 years ago

little_more ▴ 70

I have a list of assemblies IDs (GenBank) and a list of corresponding chromosomes and plasmids. A toy example:

a = [GCA_000005845.2, GCA_000006925.2, GCA_000007405.1, GCA_000007445.1, ...]
b = [CP024720.1, CP024722.1, CP024721.1, LT601384.1, LT838196.1,...]

I'd like to find a number of IS in each genome. The only thing that have come to my mind: parse all CDS in each genome with BioPython and count the number of CDS with "IS ... transposase" in their "product" keys. Is there a better way to do this? Can I somehow use GO? Note that the lists are quite big so I need an automated way.

genome biopython • 895 views

ADD COMMENT • link 5.1 years ago by little_more ▴ 70

0

Entering edit mode

I' had to google this.

"TE" : Transposable Element

ADD REPLY • link 5.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

what you mean by "IS"? Before to perform any bioinformatic analysis I would recommend you answer these questions?

> what am I looking for? 
> It is possible to get by in silico analysis?  
> Am I able to perform these in silico analysis? or Do I need professional help?

ADD REPLY • link 5.1 years ago by Buffo ★ 2.4k

0

Entering edit mode

'IS' is a pretty common abbreviation for "insertion sequence" -- a type of mobile elements in prokaryotic genomes. I do not know what is the purpose of your comment because I am asking exactly "is there a way..." and one of the ways I've already tried.

ADD REPLY • link 5.1 years ago by little_more ▴ 70

Login before adding your answer.