Question

Plant small RNA (sRNA) data analysis pipeline

0

Entering edit mode

5.1 years ago

K S • 0

I am interested in the identification of the 'Viromes' in the wheat samples infected with viruses. For this purpose, I am not sure about the pipeline to be used. I have sRNA data from illumina and I am following these steps

Quality check of the reads

a. Raw reads -> Trim adapters and filtered reads (FASTQC, cutadapt and Trimmomatic)

Mapping on the host genome to find host-specific reads

a. building the indexes from the whole wheat genome (bowtie2, GMAP) (getting an error due to the size of the genome)

b. Mapping of reads to the reference genome (Tophat, SAMTOOLS)

*. Would it be better to align them to RNA sequences from wheat instead of the whole genome?

De-no assembly of the unmapped reads (velvet, kmer - 17)
Mapping of contigs to the reference genome from step 2 (bowtie2, tophat, samtools)
BLASTN Unmapped contigs against virus databases in the NCBI/Genebank
BLASTX against virus protein database.

Thanks

sequencing next-gen virome • 1.3k views

ADD COMMENT • link updated 5.1 years ago by Fabio Marroni ★ 3.0k • written 5.1 years ago by K S • 0

score 0 · Answer 1 · 2019-03-06

Your pipeline is reasonable. I don't know how to help regarding the indexing problem; I understand that genomes larger than the human are very hardly managed by common index building tools. You might ask to someone working on wheat, I have no experience on that. As a shortcut you might map on the transcriptome. Another option would be to use some metagenomic classifier (e.g. kraken2) to remove all reads mapping to plants (you will have to use the nt database). However, I would also suggest giving a look at VirusDetect. Not sure if it can handle wheat!