Is the union of cDNA sequences the exome?
2.3 years ago
@Bioaln10595
Hello all, I've recently started with DNA-related analysis, and was wondering, whether, if I take:
https://www.ensembl.org/info/data/ftp/index.html (cDNA)
does this represent, for example, the human exome? If not, what are the differences, and how can one obtain the missing information then?
Thank you!
DNA-seq
cDNA
exome
• 350 views
2.3 years ago
@lieven.sterck23882
the cDNA file will contain all mRNAs of the human genome. It will be CDS + UTR (if available) and thus represents the transcribed part of the genome that will eventually be translated into proteins.
It is not exactly clear to me why and what you want to do with it, but if you look at your same link https://www.ensembl.org/info/data/ftp/index.html in column "gene sets" you will find GTF and GFF3 annotation files with all exons (in coordinates).
Just to show the difference between exons, mRNA, and CDS here the info from such annotation file of mouse genome. Let's have a look at the gene ENSMUST00000130201
:
grep "ENSMUST00000130201" ensGene.gff3
chr1 ensGene mRNA 4773206 4785710 . - . Name=ENSMUST00000130201;Parent=ENSMUSG00000033845;ID=ENSMUST00000130201;Alias=ENSMUSG00000033845
chr1 ensGene exon 4773206 4774516 . - . Name=ENSMUST00000130201.exon4;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon4
chr1 ensGene exon 4777525 4777648 . - . Name=ENSMUST00000130201.exon3;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon3
chr1 ensGene exon 4782568 4782733 . - . Name=ENSMUST00000130201.exon2;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon2
chr1 ensGene exon 4783951 4784105 . - . Name=ENSMUST00000130201.exon1;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon1
chr1 ensGene exon 4785573 4785710 . - . Name=ENSMUST00000130201.exon0;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon0
chr1 ensGene three_prime_UTR 4773206 4774451 . - . Name=ENSMUST00000130201.utr4;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.utr4
chr1 ensGene five_prime_UTR 4785678 4785710 . - . Name=ENSMUST00000130201.utr0;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.utr0
chr1 ensGene CDS 4785573 4785677 . - 0 Name=ENSMUST00000130201.cds0;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds0
chr1 ensGene CDS 4783951 4784105 . - 0 Name=ENSMUST00000130201.cds1;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds1
chr1 ensGene CDS 4782568 4782733 . - 1 Name=ENSMUST00000130201.cds2;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds2
chr1 ensGene CDS 4777525 4777648 . - 0 Name=ENSMUST00000130201.cds3;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds3
chr1 ensGene CDS 4774452 4774516 . - 2 Name=ENSMUST00000130201.cds4;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds4
You'll see that the exons overlap the complete mRNA region, but not CDS.
Login before adding your answer.
To be more precise cDNA consists of all transcribed RNAs so mRNA as you say but also ncRNAs, pseudogenes, rRNAs, etc..
was thinking that as well but since they also offer a ncRNA fasta file I would assume they focus on the protein coding in the cDNA one but indeed possible it contains all transcribed things.
is what's written on their site but does not give much additional info
So, technically this is the exome, i.e., the set of all (known) exons?
I would say yes indeed.
Depends however how you define 'exons' , it might be that it is mainly/only the ones being part of an mRNA and thus not includes the non-translated ones (not sure if you're interested in those as well)
Currently I am not, so this seems to hold! Thanks.