I am attempting to run through a whole genome shotgun metagenomics pipeline on some raw metagenomic reads from a hypersaline soil microbial community study on NCBI, and I am stuck on the initial quality control step (I have very little experience in this field). I utilized FastQC and Prinseq to get a quick picture of the quality of my data. My per-base sequence quality falls off pretty quickly into the poor region (at about position 300 in the read, with read range being 52-1780). It appears I don't have any adapter contamination or duplicated sequences. My GC content panel was another one that did fail, but there is evidence in the literature that different salt-tolerant microorganisms can have widely varying GC content, so I am hesitant to correct for that.
I am looking for advice on how to quality correct my data, or if anyone could point me to some good resources for reading more on filtering/trimming data based on quality assessment results. I am not sure how much I should be trimming my reads, and if I did how I should decide what read length should be my trimming cutoff, because I don't want biased coverage. I will be using Prinseq for this quality processing.
Thanks much!