Removing reads without pairs from a FASTA file coming from FASTQ
0
0
Entering edit mode
4.1 years ago
antgomo ▴ 30

Hi all,

I converted convert paired-end fastq to fasta using fastx tooklit using fastq_to_fast

After converted, there are some reads mates that didn't complain the quality standards,here is a snippet of my FASTA file:

>A00323:108:H5W2TDSXX:4:1101:1090:32252_CAATAATAT/1
CCTGGCTAACACAGTGAAACCCTGTCTCTACTAAAAATATAAAAAATTAGCTGGGTGTGGTGGCGGGTGCCTGTAGTCCCAGCAGATCGGAAGAGCACACG
>A00323:108:H5W2TDSXX:4:1101:1090:32252_CAATAATAT/2
GCTGGGACTACAGGCACCCGCCACCACACCCAGCTAATTTTTTATATTTTTAGTAGAGACAGGGTTTCACTGTGTTAGCCAGGAGATCGGAAGAGCGTCGT
>A00323:108:H5W2TDSXX:4:1101:1145:26929_CCCCCCACA/1
TCTCTTGCTTCAGCCTGCTGAGTAGCTGGGACTACTGGCATGCACCACTACACTGGCTAATTTTTTTTTATTTTTAGTAGAAAAGATCGGAAGAGCACACG

What I want is to get the reads with both mates in and get rid of the ones without paird, in the above example, the desired output will be:

>A00323:108:H5W2TDSXX:4:1101:1090:32252_CAATAATAT/1
CCTGGCTAACACAGTGAAACCCTGTCTCTACTAAAAATATAAAAAATTAGCTGGGTGTGGTGGCGGGTGCCTGTAGTCCCAGCAGATCGGAAGAGCACACG
>A00323:108:H5W2TDSXX:4:1101:1090:32252_CAATAATAT/2
GCTGGGACTACAGGCACCCGCCACCACACCCAGCTAATTTTTTATATTTTTAGTAGAGACAGGGTTTCACTGTGTTAGCCAGGAGATCGGAAGAGCGTCGT

I am struggling with awk, but i am a newbie with, anyone has suggestions?

Thanks in advance

RNA-Seq FASTQ FASTA • 934 views
ADD COMMENT
0
Entering edit mode

If you have fastq reads, please fix the issues with missing mates first with repair.sh from BBMap suite (Guide here). Once that is done convert properly paired reads to fasta format using reformat.sh from the same suite. reformat.sh in1=R1.fq.gz in2=R2.fq.gz out1=R1.fa out2=R2.fa.

ADD REPLY
0
Entering edit mode

Hi genomx, yes the problem is that using FASTX is giving me this kind of files because it is non-paired aware

Do you think repair.sh can deal with FASTA instead FASTQ?

Thanks

ADD REPLY
0
Entering edit mode

If you have fastq files please follow my advice and fix those. If plain conversion with fastx gave you these problematic files then the problem exists in original dataset and should be fixed there.

I don't know if repair.sh can fix fasta files since I have never had to use it for that application. BBTools programs are smart and repair.fa may work with fasta. If it does not then back to my original advice.

ADD REPLY

Login before adding your answer.

Traffic: 2413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6