Biostar Beta. Not for public use.
Average intron length in Drosophila melagnoster
0
Entering edit mode
14 months ago
EVR • 530
Earth

Hi All,

What is average or median intron length in Drosophila melagnoster genome. Is it 65 nts long? kindly guide me

Thanks in advance.

ADD COMMENTlink
4
Entering edit mode
15 months ago
Malcolm.Cook ♦ 1.0k
kansas, usa

A little R/BioConductor code will tell you the median is 102 and the mean is 1609:

> library(TxDb.Dmelanogaster.UCSC.dm6.ensGene)
> library(GenomicRanges)
> i<-intronsByTranscript(TxDb.Dmelanogaster.UCSC.dm6.ensGene)
> i<-unlist(i)
> summary(width(i))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2      63     102    1609     742  257022
ADD COMMENTlink
0
Entering edit mode

Thanks for the code, do you know how to get these statistics but per transcript ?

so average intron length per transcript for instance ?

ADD REPLYlink
0
Entering edit mode

I'm not sure of what interest these summaries are to you, but, just call summary on the widths of each element of i., like this:

t(do.call(cbind,lapply(i,function(i) summary(width(i)))))

        Min.   1st Qu.   Median         Mean   3rd Qu.   Max.
1         76     76.00     76.0     76.00000     76.00     76
2         76     76.50     77.0     77.00000     77.50     78
3        112    112.00    112.0    112.00000    112.00    112
4         56     56.00     56.0     56.00000     56.00     56
...
ADD REPLYlink
1
Entering edit mode
16 months ago
Seattle, WA USA

Untested, but I think it should work, in theory: Retested with the updated link from genomax: $ wget -qO- ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/current/fasta/dmel-all-intron-r6.16.fasta.gz | gunzip -c - | awk '{ />/ && ++a || b += length(); } END { print b/a; }' 1639.2 The awk one-liner is pulled from: https://www.biostars.org/p/1758/

ADD COMMENTlink
0
Entering edit mode

Works when ftp is added before flybase.net in the URL above.

$ wget -qO- ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/current/fasta/dmel-all-intron-r6.16.fasta.gz | gunzip -c - | awk '{ />/ && ++a || b += length(); } END { print b/a; }'
1639.2
ADD REPLYlink
0
Entering edit mode
2.6 years ago
aka001 • 190
Sweden

You can start by looking at the UCSC table browser and choose your gene model. Then just take all the intron lengths from the results and get the median.

ADD COMMENTlink
0
Entering edit mode
14 months ago
India

Download intron fasta file from ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/current/fasta/dmel-all-intron-r6.16.fasta.gz, as posted by Alex Reynolds and run following command:

zgrep \> dmel-all-intron-r6.16.fasta.gz | awk -F '[;=]' '{print $14}'  | datamash mean 1 median 1

ouput:

1639.2038965026 98

1639.2038965026 is mean and 98 is median. Datamash CLI based GNU tool available in most of the repos.

ADD COMMENTlink
0
Entering edit mode

withou awk:

 zgrep  \> dmel-all-intron-r6.16.fasta.gz | cut -d '=' -f8 | tr -d ' ;' | datamash -s mean 1 median 1
ADD REPLYlink
0
Entering edit mode

with seqkit:

seqkit stats dmel-all-intron-r6.16.fasta.gz

output:

file                                            format  type  num_seqs      sum_len  min_len  avg_len  max_len
../example_data/dmel-all-intron-r6.16.fasta.gz  FASTA   DNA     71,654  117,455,516        2  1,639.2  268,107
ADD REPLYlink
0
Entering edit mode
14 months ago
India

Download source file: dmel-all-intron-r6.18.fasta.gz from : ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/current/fasta.

per gene intron median and mean length:

$ zgrep \> dmel-all-intron-r6.18.fasta.gz | awk -F '[;=]' '{print $6,$14}' | awk -F '[, ]' -v OFS="\t" '{print $1,$NF}' | datamash -s -g1 median 2 mean 2

ouput (tail):

FBgn0285950 155 155
FBgn0285952 151 263
FBgn0285954 177 611.2
FBgn0285955 3692    6778.4117647059
FBgn0285958 71  68
FBgn0285962 69  1625.1111111111
FBgn0285963 880 3596.4705882353
FBgn0285970 563 563
FBgn0285971 58  58
FBgn0285991 59.5    147.5

per transcript intron median and mean length

$ zgrep \> dmel-all-intron-r6.18.fasta | awk -F '[;=]' '{print $6,$14}' | cut -f2- -d"," | awk -F " "  '{split($1,a,",");for(i in a)print a[i]"\t"$2}'| datamash -si -g1 mean 2 median 2

output (tail):

FBtr0472917 177.6   60
FBtr0472918 216.33333333333 206
FBtr0472919 55  55
FBtr0472920 129.5   129.5
FBtr0472921 227.33333333333 202
FBtr0472922 387.5   387.5
FBtr0472923 863 863
FBtr0472955 209.33333333333 73
FBtr0472956 87.666666666667 73
FBtr0472957 65  63

more information: datastat is available in most of the linux repos. Ouput here has mean (average), standard deviation, 1st quartile, median, minimum, maximum and number of transcripts (in that order). Unfortunately output order is hard coded.

$ zgrep \> dmel-all-intron-r6.18.fasta | awk -F '[;=]' '{print $6,$14}' | cut -f2- -d"," | awk -F " "  '{split($1,a,",");for(i in a)print a[i]"\t"$2}'| datastat --cnt --dev --1qt --med --min --max  -k 1  | head
# avg dev 1qt 2qt min max cnt
FBtr0005088 272.333 297.499 61 64 59 701 6
FBtr0006151 254.2 246.737 65 70 51 659 5
FBtr0070000 406.75 676.554 63 69 60 2109 8
FBtr0070003 103 50 53 103 53 153 2
FBtr0070006 1144.8 1180.26 284 361 108 3171 5
FBtr0070007 92.3333 33.7672 68.5 71 66 140 3
FBtr0070008 70.3333 11.8977 62 64 60 87 3
FBtr0070025 275.333 137.679 217 352 82 392 3
FBtr0070026 57 0 57 57 57 57 1
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3