Hi Community,
I have done my transcriptome assembly, my annotation and also my differential gene expression analysis (kallisto to get pseudocounts, tximport to give them to DESeq2, and DESeq2 to test differential expression).
Now when I'm trying to work with some of my extremely up-regulated genes, I have found their sequence match with some crustaceans genes. The problem is... I'm working with a mollusc, and for that specific crustanceans hits I know there are molluscan genes identified in C. gigas, M. galloprovincialis or O. bimaculoides. My organism under study its a cephalopod and they were fed with crustaceans larvae. I think I'm getting hits from the digestive content.
I have several doubts. I have to remove these reads coming from contamination and then repeat the assembly? I should remove the reads which match with that contigs which are contaminats (but the transcriptome don't need to be reassembled)? I just need to remove the contigs which match against crustaceans? and then repeat the pseudocount process against the same transcriptome? My organisms is a octopus, so I can map against O. bimaculoides and remove all the reads which don't get mapped? I don't know if I could potentially lost a lot of information following this...
What could be the best strategy to follow?
Thank you for your time
Pablo
Hi Pablo,
Were the biopsies from the digestive tract of your species of interest? Is that why you believe that there may be contamination?
Also, how did you determine that these sequences were aligning with other species? - BLAST? Is it not possible that they are just homologues from an early gene in a common ancestor?
Kevin
Oh sorry I have forgotten to explain better the sampling. We are working with a very little stage, we can call it larvae. As consequence, we are working with a complete individual.
So I have my mRNA coming from my individual (larva of one cephalopod species). The result of sequence that were a bunch of reads which I have assembled.
When I pick a particular contigs which is detected as differential expressed gene when I compare my "control" vs my "larva fed with crustaceans larva" and I blast it I found >50 hits against crustaceans before find the first hit against mollusc. The hits against crustaceans have a better %identity and coverage.
Let me know if I have to explain something in more details, thank you
Pablo
I see. I hate to ask, but, is doing RNA-seq on an entire organism's RNA content going to show much? What would be the interpretation of that? I understand that it may be difficult to extract certain tissues from larva.
Which program have you used for assembly? If there is a pre-existing reference genome that you could use as a guide sequence, then that may help to alleviate the 'problem' (we cannot yet confirm if this is a genuine problem). HISAT2 transcriptome assembler allows for the use of a guide reference FASTA, for example.
Let me know your thoughts.
I used to work with C. gigas, by the way, but we were searching for viral content in their stomachs as part of government regulation for water monitoring.
You have find my first issue. I'm in the middle of my PhD, I started without so much idea about this technology and I have trusted in my supervisor. As you said, nowadays I can understand the misleading to work with a complete organism...
The only explanation which I received to that design is just this, the difficulty to extract specific tissues. For me that is not a excuse, it's an argument to do other experimental design.
I have used Trinity (right now I'm trying to repeat the assembly including other assemblers with a flexible k-mer length). Yes I have some cephalopod genomes to use as reference. I have used HISAT2 to map my reads and I obtained less than 70-75% mapping. My first idea was "oh, it seems like a not very good reference" but now I suspect maybe I have 10% of reads, or a 5%, which are coming from the digestive content.
Thank you for your time, I'll try to give to you as much details as you could need to give me some advice ;)
Yes, you should search for your species at https://www.ncbi.nlm.nih.gov/genome/, and then choose the best genome.
Buena suerte / good luck!