How to merge multiple fastq file using table?
3
0
Entering edit mode
5.0 years ago
rimgubaev ▴ 330

I got the following table containing sample names and corresponding replicates like this:

Sample Replicate
S1     r12
S1     r25
S1     r68
S2     r58
S2     r34
S4     r13
etc.

In the folder I got the corresponding fastq files (for example: r12.fastq). The total amount of replicates is around 300 so making the:

cat r12.fastq r25.fastq r68.fastq > S1.fastq

would be really time consuming and exhausting.

I wonder if someone already faced such problem and could share the solution. I understand that here should be some kind of bash script with for loop but I got no idea how to organize it + the number of replicates is not the same for each sample.

fastq cat bash • 2.3k views
ADD COMMENT
6
Entering edit mode
5.0 years ago
Asaf 10k

Didn't test but this should work:

awk '{print "touch "$1".fastq && cat "$2".fastq >> "$1".fastq"}' table.txt > runscript.sh
source runscript.sh

First generate a script of cat operations (look at it to see that it's valid!) and then run all the cats.

ADD COMMENT
1
Entering edit mode

Elegant :)

ADD REPLY
1
Entering edit mode

This is a nice one! I did exactly the same script containing many cat command rows in R since I'm not a good bash user.

ADD REPLY
4
Entering edit mode
5.0 years ago

using nextflow

usage:

nextflow run --input config.tsv --basedir ${PWD} biostar375624.nf
ADD COMMENT
3
Entering edit mode
5.0 years ago
ATpoint 81k

Given this list was called foo.txt you can use:

cut -f1 foo.txt | \
  sort -k1,1 -u | \
  while read p; do 
    grep "${p}" foo.txt | \
    awk '{print $2".fastq"}' | \
    xargs cat > ${p}.fastq
    done < /dev/stdin

It first extracts the unique sample names, then loop-wise collects the names of the replicates that belong to one sample and then uses xargs together with cat to concatenate them.

ADD COMMENT
1
Entering edit mode

Nested pipes :0

ADD REPLY
2
Entering edit mode

:-D

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6