Trinity reads are not identical-
0
0
Entering edit mode
5.2 years ago

Hello All,

I am facing this issue I have treated my file with awk cmd but still the reads are non identical How can I run it please help me

9.7G file_1_awk.fastq 
9.7G file_2_awk.fastq

8.3G FP_awk.fastq   
62M RU_2_awk.fastq   
253M FU_awk.fastq       
8.3G RP_2_awk.fastq  

[sneharl@compute-0-8 nep11]$ head FU_1_awk.fastq 
@1_2/2
AAAGGGCATGGATTTGTATTTCAAGAGGCATNATGGGCAAGCTGTAACTTGTGAAGATTTNTTTGCTGCCATGAGAGATGCAAATGANGCA
+
ADHIIEHIIJIIJGIJAFIIGGIIIFJJJJI#07DHHIDGIHGIHGEHGEGEHEEEEEFE#,;ACACDDDDDDCCCDBDCCCCDDDD#+8?
@1_3/2
CTCTCCGCGTTGAAACCCTAAACCTACCCCCTACCTCAGGAATCGCCATGAAAGGAGGCAAATCGAAGGCTGAGCCGAAAAAGGCCGAC
+
HFGGIJIJJHIIJEGIJIB>DDGIC>FHIJJGDHIGIEEHHECCDEF?AC@CDDBB@@BBBDCA?8??<ACA@BCD<3>BB?>?ABB>B
@1_4/2
GTTGACTTCTCAAAGAGCAGTAAGTGTGCCCTTCAATGGGCGATCGATAATCTGGCCAACAAGGGAGATACCACACTCTTCATCATCCATG


[sneharl@compute-0-8 nep11]$ head RU_2_awk.fastq 
@2_78/2
AAGCATCACGTCAAATGAACAGCCGTACAATACGCAGCGCACCTTATTCCAACGCCTTTTCTCGTCAACGATTTACGATTGCAAATTATCA
+
HHHJJJIJJJJJJJJJJIJJJJJJJIJIJIIJEHJJIJJIHIHHHHHHFFFFFDDDDDDDDDDDDDDDDDDDDDDEBDDD?CCDDDDDEDC
@2_682/2
AATACAAGAAAATTTCGTCTCATTCAAAAGTCCCTT
+
A?D<<<:AFF3<FEFI@+A8?4?E@ECFC3*1?**0
@2_735/2
ACTATGACAGATATCGATACCGATATTTTCATCCATCCACCGGACCCAAAATATACTACCAAAAAGGAAATGATTTCTCTTCACTTCGTTC








Error, pairs.K25.stats is empty.  Be sure to check your fastq reads and ensure that the read names are identical except for the /1 or /2 designation. at /share/apps/trinityrnaseq-Trinity-v2.8.3/util/insilico_read_normalization.pl line 921.
Error, cmd: /share/apps/trinityrnaseq-Trinity-v2.8.3/util/insilico_read_normalization.pl --seqType fq --JM 96G  --max_cov 200 --min_cov 1 --CPU 10 --output /state/partition1/sneha2/nep11/trinity_out_dir/insilico_read_normalization   --max_pct_stdev 10000  --left /state/partition1/sneha2/nep11/SRR2551776_1.renamed.fastq --right /state/partition1/sneha2/nep11/SRR2551776_2.renamed.fastq --pairs_together --PARALLEL_STATS   died with ret 512 at /share/apps/trinityrnaseq-Trinity-v2.8.3/Trinity line 2684.
    main::process_cmd("/share/apps/trinityrnaseq-Trinity-v2.8.3/util/insilico_read_n"...) called at /share/apps/trinityrnaseq-Trinity-v2.8.3/Trinity line 3230
    main::normalize("/state/partition1/sneha2/nep11/trinity_out_dir/insilico_read_"..., 200, ARRAY(0x7f334980a5f0), ARRAY(0x7f334980a5d8)) called at /share/apps/trinityrnaseq-Trinity-v2.8.3/Trinity line 3177
    main::run_normalization(200, ARRAY(0x7f334980a5f0), ARRAY(0x7f334980a5d8)) called at /share/apps/trinityrnaseq-Trinity-v2.8.3/Trinity line 1314
rna-seq trinity Assembly error trimmomatic • 984 views
ADD COMMENT
0
Entering edit mode

Hi sneha.preha7, first of all your question is lacking essential details. What is awk cmd, so exact command and purpose of this is required in order to understand your question. Please also see Brief Reminder On How To Ask A Good Question. Second, your data are our of order, probably because of the awk cmd causing massive deletion of reads in the reverse file. Also, why is trimmomatic a tag, as you did not mention it in the post? Please edit your question and provide details to reproduce the problem.

ADD REPLY
0
Entering edit mode

Sorry for the inconvenience,

Question:- "Is there any way to make reverse read Identical, Should I merge both files with fastqjoiner and again split it will it affect the quality of data ??" What to do to make reverse read identical and the size of my reads (Reverse has decrease drastically)

I am trying to Assemble few SRA Fastq files, I have run the cmd awk in order to make the both reads identical (Read1 and Read 2 )

Here is the cmd which I use :

Before Awk cmd the file size was 12 Gb and after it is 9.7GB (I have checked the read quantity after running the fastqc it is same, only the white space has been removed from the files)

awk '{{print (NR%4 == 1) ? "@1_" ++i "/1": $0}}'  Read_1.fastq > Raed_1_renamed.fastq
  awk '{{print (NR%4 == 1) ? "@2_" ++i "/2": $0}}'  Raed_2.fastq > Raed_2_renamed.fastq

after this I have perform the Triimomatic cmd for trimming but in the out put I got very less number of reverse read here is a cmd

 java -jar /share/apps/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 Read_1_renamed.fastq Read_2_renamed.fastq  FP_Read_1.fastq FU_Read_1.fastq RP_Read_2.fastq  RU_Raed_2.fastq ILLUMINACLIP:/share/apps/Trimmomatic-0.38/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:10 MINLEN:36

Output:-

Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: ****47190247 Both Surviving: 44940671 (95.23%) Forward Only Surviving: 1504722 (3.19%) Reverse Only Surviving: 365037 (0.77%) Dropped: 379817 (0.80%)****
TrimmomaticPE: Completed successfully

after this the size of the file is

Read_1_renamed.fastq  9.7 GB
Read_2_renamed.fastq  9.7 GB


FP_Read_1.fastq    8.3G
FU_Read_1.fastq   253M
RP_Read_2.fastq    8.3G
RU_Raed_2.fastq   62M

After this I tried to run the trinity and it was aborted stating the following error

/share/apps/trinityrnaseq-Trinity-v2.8.3/Trinity --seqType fq --left FP_Read_1.fastq,FU_Read_1.fastq --right RP_Read_2.fastq,RU_Raed_2.fastq --CPU 10 --max_memory 96G

Error, pairs.K25.stats is empty.  Be sure to check your fastq reads and ensure that the read names are identical except for the /1 or /2 designation. at /share/apps/trinityrnaseq-Trinity-v2.8.3/util/insilico_read_normalization.pl line 921.

later i have checked tried to see the head so I found this

[sneharl@compute-0-8 nep11]$ head RP_Read_2.fastq 
@2_13/2
ACATAATCTCACTCGACGTACCAGGCATGAGGAGGGAGGATGTCAAGATAGAGGTGGAGGAGAACAGGGTGCTGAGGGTGAGCGGAGAGAG
+
FHHJJIIJJGIJJIJ?HIFFHJCHIIGDHBEH<DGA@FHCGC=AAEHEFDFFFE6=AABBB@?ADA?AB5?@CCCDDD088AA<<<958<<
@2_15/2
TCAATTCATCAACAACCTTAGATCTCAACTCATCAGCAAGCTCTTTCTGGGCAGTATCACTTGGAGATTTCTTCCCTGTCCTAAGACAAGC
+
HHHJJJJJIJJIJJIJJJJJJIIIIIJJJIGJIJJJJIJHHIIJIJJHIJJJFIDHIJJJJIJIJJHHHHHHFFFFFF;AECCEDDDDDDD
@2_16/2
GCAGAAACAGAAGGGTAATTCAGTGCTACTGCATCAAAGACTGTCTGCCTGACACAGTTGGGAGTTTGAACACCAACAATAACCAAGCTAT
[sneharl@compute-0-8 nep11]$ head RU_Raed_2.fastq 
@2_78/2
AAGCATCACGTCAAATGAACAGCCGTACAATACGCAGCGCACCTTATTCCAACGCCTTTTCTCGTCAACGATTTACGATTGCAAATTATCA
+
HHHJJJIJJJJJJJJJJIJJJJJJJIJIJIIJEHJJIJJIHIHHHHHHFFFFFDDDDDDDDDDDDDDDDDDDDDDEBDDD?CCDDDDDEDC
@2_682/2
AATACAAGAAAATTTCGTCTCATTCAAAAGTCCCTT
+
A?D<<<:AFF3<FEFI@+A8?4?E@ECFC3*1?**0
@2_735/2
ACTATGACAGATATCGATACCGATATTTTCATCCATCCACCGGACCCAAAATATACTACCAAAAAGGAAATGATTTCTCTTCACTTCGTTC
ADD REPLY
0
Entering edit mode

sneha.preha7 : Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode

How did you download the SRA files? Did you use -F option (to restore original illumina command lines) then? It looks like either your original fastq files had odd headers or your awk manipulation may have done that.

ADD REPLY
0
Entering edit mode

No I have downloaded using wget cmd and provided the link of ENA

ADD REPLY

Login before adding your answer.

Traffic: 2673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6