HI, I have a question and need to solve it.
Followings are two fastq files.
File 1 includes all the forward read sequence (more than 400,000,000) produced by NGS Illumina platform.
File 2 includes all sequence reads (more then 4,800,000) from one specified barcode (TGACCTTG).
File 1 does not include barcode sequence in the ID identifier as showed as (#0/1) and File 2 has barcode sequence in the second line of sequence. So file 1 can not be split by barcode directly but by file 2 because file2 has similar ID identifier as file1.
Does anybody has script or tools to split file1 using corresponding ID identifiers in file2? I do not have strong bioinformatics background on this.
File1
@IPAR1:2:1:4029:1196:1#0/1
ATTTTGCCACATACAAAAGAATCTACGTTCTTCTCAGCACCTCATGGAATCTTCTCTAAAATATATCATATAATAGGACACAAAAGAA
+
BHGHHHHHHHHGDDFHHHGGDGHFHFHHHHGD>GEEG>GFHHHHFHBBHFHHHHEHHHHHHBAFHHBBEHHHFEHGBECEHFHHFAHF
File2
@IPAR1:2:1:4029:1196:1#0/2
TGACCTTGATCTCGT
+
HIHIIGIIIH8CCDC
do not add an answer to your question!
that just makes it look like it has been answered. Edit your question and add the new information then delete this answer!