I've been provided with more than a billion reads of RNAseq data for a poorly annotated nematode species. They appear to be 2x100 paired-end Illumina reads – I currently know frustratingly little about the RNAseq protocol used, but need to perform assemblies using Trinity.
Trinity demands that I specify whether or not the reads are strand–specific, and also which strand is which through the _--SS_lib_type_ parameter, which needs to be either _FR_ or _RF_.
For each tissue sample, I have been given paired _fwd_ and _rev_ FASTQ files. How can I tell i) whether the data is indeed strand-specific, and ii) which strand is which, so that I know whether to use FR or RF with Trinity.
Any thoughts much appreciated. Here are the top four lines from two corresponding FASTQ files I've been given:
head -n 4 Tmuris_adult_R4* ==> Tmuris_adult_R4_fwd.fastq <== @HS23_6814:1:1101:1592:2250#4/1 GCGGTATCAGTTGGTAAACCCTGCAGGCGCTCGCATAACGGTCGAAGGCTTTTTGCGGATCGTCGTCATTGTCGTTGACCTCAGCATCGCNCACCTCCTC + B3:64JGADLBACJHH3EACD@DJAHLJDIENFEKIJJ6LE-HFJH57H7L9=BAFI8@FK>,GBDH764,5,4A='+G+,+,*E++@+2!+:+1>1=+4 ==> Tmuris_adult_R4_rev.fastq <== @HS23_6814:1:1101:1592:2250#4/2 CGAACCCNGTATNTTTGCGCTACTNTGTCTCCTACGCCTTTGTCTGTCTTGCCTGCATGGCTAACACTGCCCTGTTGGTTCAAGTGTCGTCTGCCGGAAG + :ABEGGH!G8EJ!8EJE6IEFBIH!HF8EKDD66FFAMDCKE/5>D5LD?E=?AHG>=AE5@E5I@CGB<KK@GG<B2E:H@2I9ICI?C@HC2@2:0@2