Entering edit mode
5.2 years ago
lkianmehr
▴
100
Hello,
Do you have any idea about How to get proportion of specific sequence (defined as a separate fastq file=
@1
TAACCCTAACCCTAACCCTAACCC
+
JJJJJJJJJJJJJJJJJJJJJJJJ
@2
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
) in several paired-read (fastq) or Bam file of RNA-seq
thanks in advance
Do you mean getting the first half of every read and save them as a separate file? Did you try anything yet?
I just want to know how many of these specific sequence are there in these paired-read sequence to get proportion of that between all? I have tried by
but this command just extract all reads contain the sequence I think. I also have used Salmon to make an Index of fasta of this sequence and quantify all reads on, but was not good trick
If
TAACCCTAACCCTAACCCTAACCC
is the full length sequence you are trying to identify you should be able to get the number that contain the sequence by just running the command and not providing anout=
file name. This will do the operation and generate relevant statistics file. Sounds like that is all you are interested in. There should be a number for total sequences processed.I have got the result like below, what does it mean? contaminants determine the number of reads including that specific sequence or something else?
Is that the output from the command you included above? BBDuk by default works in filter mode. Take a look at this guide to get additional details (and also look at the in-line help).
At first glance
Total Removed: 1040136 reads (1.58%)
should be the reads that have the sequence you specified in the literal directive. Since you are running this in PE mode either/both reads may have the sequence. If you want to be specific you could run the files individually to get a more precise number.