Question

Sorting reads from host-pathogen interaction

0

Entering edit mode

6.2 years ago

cwbenson1993 • 0

I am working on rna-seq data for a host-pathogen interaction between a grass species and its fungal parasite. The ultimate goal is to do differential expression analysis and functional enrichment to see what genes and pathways are involved in parasitism.

I have:

Draft genome of the fungus
RNA-seq reads from non-infected grass
RNA-seq reads from infected grass (contains grass and fungal transcripts)
RNA-seq reads from the fungus growing in culture

I built the transcriptome of the fungus using just the reads from the culture grown fungus, and I also built the grass transcriptome with only the non-infected reads. Now im thinking it would be useful to rebuild those trascriptomes to include reads from the infected tissue to capture transcripts that are unique to the host-pathogen interaction.

Is there a way to filter the infected reads into grass and fungal groups using the resources I currently have?

Perhaps I could align the infected grass reads (#3) to the fungal transcriptome, and use only the un-mapped reads to rebuild the grass transcriptome? Maybe I can use BLAST, BBduk, or some other tool on the unmapped reads to further filter out fungal reads before using them to build the grass transcriptome.

RNA-Seq Assembly • 1.7k views

ADD COMMENT • link 6.2 years ago by cwbenson1993 • 0

0

Entering edit mode

valid approach indeed. I could consider aligning them to the fungal genome (as well?) in order to filter out the fungal ones.

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

Hey lieven.sterck,

Thanks for the response! Ive considered using BBsplit to further sort, but unfortunately I dont have genomic sequence of the plant.

Does anyone know a tool that can sort RNA-seq data using the genome of one of the host-pathogen species?

ADD REPLY • link 6.2 years ago by cwbenson1993 • 0

0

Entering edit mode

Can't you just align them to the fungal genome and then use the ones that do not map (== likely to be plant ones) ?

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

That would be the way to go.

ADD REPLY • link 6.2 years ago by GenoMax 142k

score 0 · Answer 1 · 2018-03-09

0

Entering edit mode

6.2 years ago

cwbenson1993 • 0

Its the novel transcripts that im concerned about. If reads don't map to the fungus or the plant, then they correspond to a transcript that is specifically expressed at the host-pathogen interaction; either plant or fungus. For example, if I map infected grass reads to the fungal transcriptome and use the unmapped reads to build the grass transcriptome, I would still have the novel fungal transcripts present in my grass assembly.

I dont know if its possible to further sort unmapped reads using the fungal genome, or maybe its not even worth troubling myself over.

ADD COMMENT • link 6.2 years ago by cwbenson1993 • 0

0

Entering edit mode

not worth troubling yourself over I would say ;-)

you will likely always end up with more or less a mixture of sequence-origins.

On the other hand if you map to the fungal genome you should be able to remove all fungal derived reads (regardless at what stage or infection they are expressed ) since all these reads should be derived from the genome somewhere so even the 'novel ones' in your denovo transcriptome. I understand that you only have a draft genome so some might slip through at this stage but nothing to cause a big fuzz about i think.