Question

Changing FASTQ format from sense to antisense

0

Entering edit mode

5.2 years ago

miyagi • 0

Dear all,

I am trying to analyze some RNA-seq results from a method called SLAM-seq. Long story short, their recommended library prep is Quantseq whereas we used Kappa Poly-A. We figured out after sequencing that the difference between these includes using different strands for first strand synthesis leading the downstream analysis software to read the file opposite to how it should be (it counts T>C, but we got a higher A>G, which led us to figure out this was the problem). I transformed the data to the reverse-complement using seqtk and got the following.

$ head original_file.fastq
@HISEQ:326:HVL2VBCX2:2:1101:1771:1973 1:N:0:ATTGGCTTC
NAAAAAAGAAAACCAAAGTGGTCCACAAAACATTCTCCTTTCCTTCTGAAGGTTTTACGATGCATTGTTATCATTA
+
#<<DDHHHIEHIIIIGHIIHHHHIFHHIIIHEHHHHIGIHICHHHCHHIIIIIIIHHHEHGHEFHHHHEHHHIFHE
@HISEQ:326:HVL2VBCX2:2:1101:2172:1980 1:N:0:ATTGGCTTC
NAGACACATCAGGGTGGGGCCCAGCCGGCTGCCAGGCACCAGGTCCTCCACCACGAGCGCCGGAAACAGGTCGATG
+
#<<DDHHIIIIIIIIIHIIIIIHIIIIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH
@HISEQ:326:HVL2VBCX2:2:1101:2255:1977 1:N:0:ATTGGCTTC
NTCCTGCTCCATCTCCCACTTCCGCTCCCTCTCTTTTCCTCTGGTTCTCCAAGTCCAGGTCAGGCAAAGGGGCCAG

$ head reverse.fastq
@HISEQ:326:HVL2VBCX2:2:1101:1771:1973 1:N:0:ATTGGCTTC
TAATGATAACAATGCATCGTAAAACCTTCAGAAGGAAAGGAGAATGTTTTGTGGACCACTTTGGTTTTCTTTTTTN
+
EHFIHHHEHHHHFEHGHEHHHIIIIIIIHHCHHHCIHIGIHHHHEHIIIHHFIHHHHIIHGIIIIHEIHHHDD<<#
@HISEQ:326:HVL2VBCX2:2:1101:2172:1980 1:N:0:ATTGGCTTC
CATCGACCTGTTTCCGGCGCTCGTGGTGGAGGACCTGGTGCCTGGCAGCCGGCTGGGCCCCACCCTGATGTGTCTN
+
HIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIHIIIIIHIIIIIIIIIHHDD<<#
@HISEQ:326:HVL2VBCX2:2:1101:2255:1977 1:N:0:ATTGGCTTC
CTGGCCCCTTTGCCTGACCTGGACTTGGAGAACCAGAGGAAAAGAGAGGGAGCGGAAGTGGGAGATGGAGCAGGAN

It looks fine and using $wc -l on both files, it doesn't seem truncated or anything. Unfortunately the downstream Slam dunk software is resulting in a failed run without a very useful error. I'm trying to figure out if there is any obvious reason that this would be the case.. is there some other way the file is being recognized that could cause the reverse compliment file to not be taken as input? I have contacted Slam Dunk but haven't heard back yet. Just trying to troubleshoot. Unfortunately we did not do paired end reads so using the other R2 file is not an option.

Thanks in advance.

RNA-Seq • 1.3k views

ADD COMMENT • link updated 5.2 years ago by GenoMax 141k • written 5.2 years ago by miyagi • 0

0

Entering edit mode

Dear miyagi, welcome on Biostars. I admit, I fail on this formating, too. Please use the code button 101010 on the fastq part to make it better readable. And please add the error from slam dunk, in case people recognize the error.

ADD REPLY • link 5.2 years ago by Carambakaracho ★ 3.2k

0

Entering edit mode

Slam dunk software is resulting in a failed run without a very useful error

Without that information we don't have anything to go on to figure out why you are not getting any output. Do you have to use slam dunk for the analysis or can you use any other standard RNAseq software?

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY • link 5.2 years ago by GenoMax 141k

0

Entering edit mode

Thanks for the formatting tip. The issue is that I'm using their Bluebee Analysis Pipeline (https://www.lexogen.com/lexogen-and-bluebee-launch-slamdunk-data-analysis-pipeline/) and so the error in this picture enter image description here is totally useless unfortunately. I realize it is probably useless to anyone on this forum as well but in the meantime I've only started trying to use their github version while I see what the company has to say about their Bluebee version.

ADD REPLY • link 5.2 years ago by miyagi • 0