Trimming of 16S reads
0
0
Entering edit mode
7.9 years ago
agata88 ▴ 870

Hi all,

I was wondering if you can help me to solve one problem. Here I describe the steps of analysis I've performed:

  1. Download all fastq files from SRA database for stool microbiome healthy set of patients, around 402 runs (around 20G together)
  2. I trimmed files with trimmomatic with command:

TrimmomaticSE: Started with arguments: -threads 12 -phred33 SRR040705.fastq /trimmed/SRR040705.fastq LEADING:30 TRAILING:30 SLIDINGWINDOW:4:30 MINLEN:30 Input Reads: 8397 Surviving: 5945 (70,80%) Dropped: 2452 (29,20%) TrimmomaticSE: Completed successfully

After checking the sizes of files I've discovered that after trimming the input size is around 3G (together all files!). So, is it possible that data stored at NCBI have such a low quality???

Or did I do something wrong?

  1. Perform metagenomic pipeline for taxonomii assignment

And here I have more than 50% of OTU not assigned to any tax ... is it because of very low quality of reads?

I've checked in FastQC that those reads are encoded Illumina 1.9, so, phred+33. Maybe I am too strict for trimming?

Any idea, help?

Best,

Agata

sra trimming trimmomatic 16S metagenome • 2.8k views
ADD COMMENT
0
Entering edit mode

You don't need a lot of reads per sample for taxonomic assignments. So you may be fine.

ADD REPLY
0
Entering edit mode

Is so, how can I understand the results below:

Unassigned: 44,91% Bacteroides (19,98%) Streptococcus (7,09%) Fusobacterium (2,79%)

I would like to compare those results to my patients results which has assigned 99% of OTUs.

Do you think I can transform them to 100% of assigned tax?

Fo example :

19,98% --- 54,09% (assigned) x --- 100%

And then compare? In box plots (CaseControl vs Patient)

Or should I use the same number of reads for patient/control comparison?

It may be an easy question but I just need a confirmation of my thinking ...

Thanks,

Agata

ADD REPLY
0
Entering edit mode

I am afraid someone else is going to have to help with this part.
Are you using exactly the same database/criteria to classify this sample compared to your own?

ADD REPLY
0
Entering edit mode

Yes, that is why I downloaded files from SRA and run it on my pipeline, the same as for the patients. Actually I figured out how to do that :) Thanks

ADD REPLY
0
Entering edit mode

Your patients microbiomes must be comparatively simple (if you are able to assign 99% of OTU's). If the SRA samples have significantly more reads then your own samples then you may want to try downsampling the SRA sample data to match your own.

ADD REPLY
0
Entering edit mode

I see that for 348 of patients I have 210746 reads. Funny think is that I have almost the same number of reads for my ONE sample (1451850).

But to be clear I merged all reads from SRA database into one file, so I have got now one result for agroup of patients.

In that case I think I can compare results, am I right?

ADD REPLY
0
Entering edit mode

You may be trying to compare apples to oranges. One can do that but the results may not mean much.
From the SRA# you have in your original post (Human Microbiome Project, pool of 85 runs) the following note should be taken into account

16S rRNA genes amplified from multiple body sites across hundreds of human subjects. There are two time points represented for a subset of these subjects.

You could at least use only those samples that are from the same site as your own samples. If you can figure them out.

ADD REPLY
0
Entering edit mode

Yes I used stool runs from SRA and have stool results for patients. Here is a project I used : SRP002395, Bioproject: PRJNA48333. Sorry for not mentioning that :)

So I have one result for Case Control samples from SRA for stool microbiome and one sample for stool microbiome for patient. And that is what I would like to compare ...

ADD REPLY

Login before adding your answer.

Traffic: 2987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6