I am downloading samples from the metaHIT project (metagenomes from faecal samples). From the paper, it is said that the 'raw illumina reads are deposited at ENA with accession number ERP003612', so there
However when downloading the files (submitted fastq) the naming is of the form 'MetaHIT-MH0318_110425.clean.rmhost.1.fq.gz'.
The 'clean' and 'rmhost' makes me wonder if those have actually been filtered and contaminant DNA (especially human contaminants). Is this 'clean' and 'rmhost' is a common nomenclature such that I can safely assumed that those reads have already been filtered and that I can use them directly?
I am not a bioinformatician at all and so if possible I wish to avoid going through all the filtering process.
Alternatively, do you have advice on a all-in-one tool from which I could do this properly? I looked into BBmap but could not get it to work. I also heard about MOCAT, any return on this?
Many thanks, Camille