Biostar Beta. Not for public use.
changing the name of files
2
Entering edit mode
18 months ago
Sam • 110

Dear All

I have about 200 of libs with this naming format ALT1_1_clean.fq.gz but I have to change the name format to be recognized by pipeline. could you guide me about this?

Thanks

     "ALT1_1_clean.fq.gz" change to "ALT_1.R1.fq.gz"
    "ALT1_2_clean.fq.gz"  change to " ALT_1.R2.fq.gz"
    "ALT2_1_clean.fq.gz" change to " ALT_2.R1.fq.gz"
    "ALT2_2_clean.fq.gz" change to " ALT_2.R2.fq.gz"
    .
    .
    .
bash awk • 785 views
ADD COMMENTlink
4
Entering edit mode
19 months ago
France/Nantes/Institut du Thorax - INSE…
ls *_clean.fq.gz | while read F; do mv "$F" $( echo "${F}" | sed 's/_\([12]\)_clean.fq.gz/.R\1.fq.gz/;s/ALT/ALT_/') ; done
ADD COMMENTlink
4
Entering edit mode
2.2 years ago
igor 7.7k
United States

The easiest and most readable option (in my opinion):

rename ALT ALT_ *.fq.gz
rename _1_clean .R1 *.fq.gz
rename _2_clean .R2 *.fq.gz

Unfortunately, the rename utility may not be available on all systems.

ADD COMMENTlink
3
Entering edit mode
18 months ago
Eric Lim ♦ 1.4k
Boston

There are countless ways to accomplish such bash operation, but I always prefer to write simple rules in snakemake.

# mvfq.py
rule:
    input: expand('{samples}_{reads}.fq.gz', samples=['ALT_1', 'ALT_2'], reads=['R1', 'R2'])

rule move_fqs:
    output: mvto = '{sample}_{read}.fq.gz'
    run:
        mvfrom = '_'.join([wildcards.sample.replace('_',''), wildcards.read.replace('R',''), 'clean.fq.gz'])
        shell('mv {mvfrom} {output.mvto}')

I can dryrun it

snakemake -s mvfq.py --dryrun

or run a specific target to make sure everything is working

snakemake -s mvfq.py ALT_1_R1.fq.gz

or run it all on my laptop

snakemake -s mvfq.py

or run it using 4 cores

snakemake -s mvfq.py -j4

or in a cluster via qsub with 100 independent jobs

snakemake -s mvfq.py -j100 -c "qsub"

or using remote files at S3 (or dropbox, google drive, etc) in a cluster

snakemake -s mvfq.py -j100 -c "qsub" --default-remote-provider S3 --default-remote-prefix s3/location/

or I can restart from the last failure check points, and many more.

All without changing the underlying code.

ADD COMMENTlink
3
Entering edit mode
19 months ago
h.mon 25k
Brazil

Honestly, change the source code of the pipeline. If this is not possible, here is a one-liner rename (which, as igor noted, may not be available or installed on some systems):

rename 's/(\d)_(\d)_clean.fq.gz/_$1.R$2.fq.gz/' *.gz

Note the single quotes ', is you use double quotes " the capture will not work. As batch-renaming can have catastrophic consequences, I suggest you first perform a fry-run with -n, check if everything is good to go, then proceed with the renaming by not using -n.

ADD COMMENTlink
1
Entering edit mode

And to make things even more complicated, the rename tool linked by igor in another answer is not the same as the rename tool in this answer, which is available at https://metacpan.org/release/File-Rename, and in the rename package on Debian and related systems.

ADD REPLYlink
0
Entering edit mode

Indeed, good point, which I overlooked. There are renames and renames around, this one is a Perl script, that other one is a binary executable, and in Debian and relatives is called rename.ul.

That is a lot of answers for a "how to rename files" question...

ADD REPLYlink
0
Entering edit mode

I guess this can be further shortened (code) and extended (function) by:

$ rename -n 's/(\d+)_(\d+)_clean/_$1.R$2/' *.gz
ADD REPLYlink
0
Entering edit mode

To further complicate things, I don't think every rename has the -n flag. Mine (from util-linux-ng) does not.

ADD REPLYlink
2
Entering edit mode
17 months ago
India

Assuming that the files follow same pattern (esp digit_digit pattern)

$  parallel cp {} '{= s:([0-9]+)_([0-9]+)_clean:_$1\.R$2: =}' ::: *.gz
ADD COMMENTlink
1
Entering edit mode
18 months ago
China

---- corrected answer----

Try brename, a practical cross-platform command-line tool for safely batch renaming files/directories via regular expression.

$ brename -p "(\d+)_(\d+)_clean" -r "_\$1.R\$2"
[INFO] checking: [ ok ] 'ALT1_1_clean.fq.gz' -> 'ALT_1.R1.fq.gz'
[INFO] checking: [ ok ] 'ALT1_2_clean.fq.gz' -> 'ALT_1.R2.fq.gz'
[INFO] checking: [ ok ] 'ALT2_1_clean.fq.gz' -> 'ALT_2.R1.fq.gz'
[INFO] checking: [ ok ] 'ALT2_2_clean.fq.gz' -> 'ALT_2.R2.fq.gz'
[INFO] 4 path(s) to be renamed
[INFO] renamed: 'ALT1_1_clean.fq.gz' -> 'ALT_1.R1.fq.gz'
[INFO] renamed: 'ALT1_2_clean.fq.gz' -> 'ALT_1.R2.fq.gz'
[INFO] renamed: 'ALT2_1_clean.fq.gz' -> 'ALT_2.R1.fq.gz'
[INFO] renamed: 'ALT2_2_clean.fq.gz' -> 'ALT_2.R2.fq.gz'
[INFO] 4 path(s) renamed
ADD COMMENTlink
0
Entering edit mode

That is not quite what OP wanted.

ADD REPLYlink
0
Entering edit mode

Sorry for my carelessness, it's fixed.

ADD REPLYlink
0
Entering edit mode

No worries. Your software is always comprehensive. Nice that you have sanity check built in before the changes are made. I assume software will stop if a test fails?

ADD REPLYlink
0
Entering edit mode

Right, it detects potential conflicts (overwriting existed paths and overwriting newly renamed path) and errors (blank target).

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1