Running 3 FASTQ files for WGBS
19 months ago
Batu • 150

I want to run some FASTQ files for WGBS, but there are 3 files for each sample as here. These are SRRxxx, SRRxxx_1 and SRRxxx_2 files. 1 and 2 are supposed to be paired-end files (~25GB each), but ENA shows the library layout as single (I've seen some other paired-end-looking samples have single library layout in ENA, it's probably a mistake). And the other file has much lower size than the others (~2GB), and I couldn't figure out where this single smaller file should be used. Glad if you help. Thanks...

7 months ago
ATpoint 17k
Germany

This seems to be a case where (for a reason I do not know) there is a paired-end component of the run (which is the majority of reads) and a single-end component. This is rare but I've definitely seen it before. I would use fastqc on all files separately to see if something is odd on any of them. If not, align the paired-end files as paired and the single-end one as single-end and then merge the BAM files. If this causes downstream problems with the tools you intend to use for analysis, simply discard the single-end file. It probably doesn't add much coverage anyway.