I have genome and patterns with 2, 4, 8, 16 lengths. I want to calculate entropy of each pattern in genome? How to calculate this? If you have some advice pls put it here and is it right way to calculate entropy for each pattern?
You need to provide us with more detail. An example of your input would help. I have no idea what you mean by “patterns with 2, 4, 8, 16 lengths”.
In the mean time you could try https://github.com/jrjhealey/bioinfo-tools/blob/master/Shannon.py
My patterns are for example: AC, GGCC, GAAAGGCG, GGACTAAATCCAGTTT or some random ... I have 10 patterns by each length.
I have ecoli genome in siple text format (not fasta): AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATATCA ....
Now i want to find entropy for each pattern in this genome.
Second question is hot to interpret that values?
For shannon entropy, see the first answer to both these threads:
Shannon entropy of a DNA motif?
Presumably, you have a motif (pattern) in a position weight matrix format? The motif has it's own entropy, regardless of any genome; it's just a measure of how much information is encoded in a motif.
Note the "bits" on the left of the logo; that's your information.
Or were you thinking of something different relative to a genome? E.g. enrichment of the motif in a genome.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy