Biostar Beta. Not for public use.
Batch rename *fastq.gz files using regular expression
1
Entering edit mode
18 months ago
Brazil

I'm trying to get a regex to work with rename; I've tried the approach of similar answered questions here but couldn't get the results I wanted.

The files are named as such:

SR1_S90_L001_R1_001.fastq.gz 
SR1_S90_L001_R2_001.fastq.gz
Rinc_S96_L001_R1_001.fastq.gz 
Rinc_S96_L001_R2_001.fastq.gz

And I would like to retain only the information prior to the first underscore and the _R1_ or _R2_ tags, like this:

SR1_R1_.fastq.gz
SR1_R2_.fastq.gz
Rinc_R1_.fastq.gz 
Rinc_R2_.fastq.gz

Thanks in advance!

ADD COMMENTlink
4
Entering edit mode
18 months ago
China

Try safe-batch-rename tool brename ( https://github.com/shenwei356/brename )

brename -p '^(\w+?)_.+_(R[12])_.+' -r '${1}_$2.fq.gz'    # updated

# original answer
# brename -p '^(\w+)_.+_(R[12])_.+' -r '${1}_$2.fq.gz'
# if you have ran this, you can run 'brename -u' to undo.
ADD COMMENTlink
1
Entering edit mode

Almost there!

  • The first group was including the second tag in the filename (eg. _S90_), hence the addition of the second " _.+ "
  • Changed the structure of the expression to include the underscore after the _R[12]

The command with the final changes:

brename -p '^(\w+)_.+_.+(_R[12]_).+' -r '${1}$2.fastq.gz' -d
  • Included the -d for the dry run tests ;)

Thanks a bunch and congratulations on your software, Wei Shen

ADD REPLYlink
1
Entering edit mode

thanks for pointing out, if you have ran with the old command, you can run 'brename -u' to undo.

ADD REPLYlink
0
Entering edit mode

Yeah! I saw the parameters that after running the script and was amazed to see that option (couldn't test since I already had deleted the folder XD )

Thanks also for the seqkit software, Shen Wei!

ADD REPLYlink
4
Entering edit mode
20 months ago
st.ph.n ♦ 2.5k
Philadelphia, PA

Quick python solution.

#!/usr/bin/env python
import os, glob

for file in glob.glob("*.fastq.gz"):
    # test with print statement
    print file, '\t', file.split('_')[0] + '_' + file.split('_')[3] +  '_.fastq.gz'
    # uncomment to rename
    # os.rename(file, file.split('_')[0] + '_' + file.split('_')[3] +  '_.fastq.gz')

Save as rename_fastq.py; run as python rename_fastq.py in the directory containing fastq.gz files.

Not sure why you want to keep '_' after the R*

ADD COMMENTlink
0
Entering edit mode

Hello!

I want to keep the '_' after the R* just to keep my sanity while running other scripts (that check for the patter _R*_ )

I've got a syntax error while running your script:

    import os, glob for file in glob.glob("/*.fastq.gz"):
                      ^
SyntaxError: invalid syntax

I've tried to replace the double quotes for single ones, but to no avail.

ADD REPLYlink
1
Entering edit mode

the for statement should be on a new line from the import statement. Looks like it must not have copied/pasted correctly. I commented out the actually renaming part, so you could test first and review the lines that are printed.

ADD REPLYlink
0
Entering edit mode

When running on:

python --version
Python 3.6.5 :: Anaconda, Inc.

I've got:

  File "rename_fastq.py", line 6
    print file, '\t', file.split('_')[0] + '_' + file.split('_')[3] + '_.fastq.gz'
             ^
SyntaxError: invalid syntax

But, using a Python 2.7.15 environment the script runs perfectly and as intended :D Thanks for you time!

ADD REPLYlink
1
Entering edit mode

yes, i'm still writing 2.7 syntax.

ADD REPLYlink
3
Entering edit mode
18 months ago
India

rename -n 's/(\w_).*_(R[0-9])_.*(.fastq.gz)/$1$2$3/' *.fastq.gz or rename -n 's/(\w+_)\w+_\w+_(\w._)\w+(.\w+)/$1$2$3/' *.fastq.gz

-n runs the command in dummy mode and it is distro specific. Check the available for options for rename on your distro. -n option is available on ubuntu 18.04 and remove -n for final conversion.

ADD COMMENTlink
0
Entering edit mode

Thanks!

It works as intended! Just modified to include the underscore after the _(R[0-9])_ part {and changed the range to [1-2]}

rename -n 's/(\w_).*_(R[1-2]_).*(.fastq.gz)/$1$2$3/' *.fastq.gz
ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1