Sorting reads from host-pathogen interaction
1
0
Entering edit mode
6.2 years ago

I am working on rna-seq data for a host-pathogen interaction between a grass species and its fungal parasite. The ultimate goal is to do differential expression analysis and functional enrichment to see what genes and pathways are involved in parasitism.

I have:

  1. Draft genome of the fungus
  2. RNA-seq reads from non-infected grass
  3. RNA-seq reads from infected grass (contains grass and fungal transcripts)
  4. RNA-seq reads from the fungus growing in culture

I built the transcriptome of the fungus using just the reads from the culture grown fungus, and I also built the grass transcriptome with only the non-infected reads. Now im thinking it would be useful to rebuild those trascriptomes to include reads from the infected tissue to capture transcripts that are unique to the host-pathogen interaction.

Is there a way to filter the infected reads into grass and fungal groups using the resources I currently have?

Perhaps I could align the infected grass reads (#3) to the fungal transcriptome, and use only the un-mapped reads to rebuild the grass transcriptome? Maybe I can use BLAST, BBduk, or some other tool on the unmapped reads to further filter out fungal reads before using them to build the grass transcriptome.

RNA-Seq Assembly • 1.7k views
ADD COMMENT
0
Entering edit mode

valid approach indeed. I could consider aligning them to the fungal genome (as well?) in order to filter out the fungal ones.

ADD REPLY
0
Entering edit mode

Hey lieven.sterck,

Thanks for the response! Ive considered using BBsplit to further sort, but unfortunately I dont have genomic sequence of the plant.

Does anyone know a tool that can sort RNA-seq data using the genome of one of the host-pathogen species?

ADD REPLY
0
Entering edit mode

Can't you just align them to the fungal genome and then use the ones that do not map (== likely to be plant ones) ?

ADD REPLY
0
Entering edit mode

That would be the way to go.

ADD REPLY
0
Entering edit mode
6.2 years ago

Its the novel transcripts that im concerned about. If reads don't map to the fungus or the plant, then they correspond to a transcript that is specifically expressed at the host-pathogen interaction; either plant or fungus. For example, if I map infected grass reads to the fungal transcriptome and use the unmapped reads to build the grass transcriptome, I would still have the novel fungal transcripts present in my grass assembly.

I dont know if its possible to further sort unmapped reads using the fungal genome, or maybe its not even worth troubling myself over.

ADD COMMENT
0
Entering edit mode

not worth troubling yourself over I would say ;-)

you will likely always end up with more or less a mixture of sequence-origins.

On the other hand if you map to the fungal genome you should be able to remove all fungal derived reads (regardless at what stage or infection they are expressed ) since all these reads should be derived from the genome somewhere so even the 'novel ones' in your denovo transcriptome. I understand that you only have a draft genome so some might slip through at this stage but nothing to cause a big fuzz about i think.

ADD REPLY
0
Entering edit mode

Fantastic! Thanks for all the help!

ADD REPLY

Login before adding your answer.

Traffic: 2303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6