RNAseq differential gene expression analysis yield results which come from digestive content
0
0
Entering edit mode
6.4 years ago
pablo61991 ▴ 90

Hi Community,

I have done my transcriptome assembly, my annotation and also my differential gene expression analysis (kallisto to get pseudocounts, tximport to give them to DESeq2, and DESeq2 to test differential expression).

Now when I'm trying to work with some of my extremely up-regulated genes, I have found their sequence match with some crustaceans genes. The problem is... I'm working with a mollusc, and for that specific crustanceans hits I know there are molluscan genes identified in C. gigas, M. galloprovincialis or O. bimaculoides. My organism under study its a cephalopod and they were fed with crustaceans larvae. I think I'm getting hits from the digestive content.

I have several doubts. I have to remove these reads coming from contamination and then repeat the assembly? I should remove the reads which match with that contigs which are contaminats (but the transcriptome don't need to be reassembled)? I just need to remove the contigs which match against crustaceans? and then repeat the pseudocount process against the same transcriptome? My organisms is a octopus, so I can map against O. bimaculoides and remove all the reads which don't get mapped? I don't know if I could potentially lost a lot of information following this...

What could be the best strategy to follow?

Thank you for your time

Pablo

differential-gene-expression RNA-Seq • 1.2k views
ADD COMMENT
0
Entering edit mode

Hi Pablo,

Were the biopsies from the digestive tract of your species of interest? Is that why you believe that there may be contamination?

Also, how did you determine that these sequences were aligning with other species? - BLAST? Is it not possible that they are just homologues from an early gene in a common ancestor?

Kevin

ADD REPLY
0
Entering edit mode

Oh sorry I have forgotten to explain better the sampling. We are working with a very little stage, we can call it larvae. As consequence, we are working with a complete individual.

So I have my mRNA coming from my individual (larva of one cephalopod species). The result of sequence that were a bunch of reads which I have assembled.

When I pick a particular contigs which is detected as differential expressed gene when I compare my "control" vs my "larva fed with crustaceans larva" and I blast it I found >50 hits against crustaceans before find the first hit against mollusc. The hits against crustaceans have a better %identity and coverage.

Let me know if I have to explain something in more details, thank you

Pablo

ADD REPLY
0
Entering edit mode

I see. I hate to ask, but, is doing RNA-seq on an entire organism's RNA content going to show much? What would be the interpretation of that? I understand that it may be difficult to extract certain tissues from larva.

Which program have you used for assembly? If there is a pre-existing reference genome that you could use as a guide sequence, then that may help to alleviate the 'problem' (we cannot yet confirm if this is a genuine problem). HISAT2 transcriptome assembler allows for the use of a guide reference FASTA, for example.

Let me know your thoughts.

I used to work with C. gigas, by the way, but we were searching for viral content in their stomachs as part of government regulation for water monitoring.

ADD REPLY
0
Entering edit mode

You have find my first issue. I'm in the middle of my PhD, I started without so much idea about this technology and I have trusted in my supervisor. As you said, nowadays I can understand the misleading to work with a complete organism...

The only explanation which I received to that design is just this, the difficulty to extract specific tissues. For me that is not a excuse, it's an argument to do other experimental design.

I have used Trinity (right now I'm trying to repeat the assembly including other assemblers with a flexible k-mer length). Yes I have some cephalopod genomes to use as reference. I have used HISAT2 to map my reads and I obtained less than 70-75% mapping. My first idea was "oh, it seems like a not very good reference" but now I suspect maybe I have 10% of reads, or a 5%, which are coming from the digestive content.

Thank you for your time, I'll try to give to you as much details as you could need to give me some advice ;)

ADD REPLY
0
Entering edit mode

Yes, you should search for your species at https://www.ncbi.nlm.nih.gov/genome/, and then choose the best genome.

Buena suerte / good luck!

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6