Collecting columns from multiple files into one file
1
0
Entering edit mode
9 months ago

Dear all,

I hope you are all doing well. I'm new to bioinformatics and would be grateful if you could help me with the below issue.

I have 156 files named with sample_1_TEcounts.tsv, sample_2_TEcounts.tsv, ... and contain information as in the photo below (I showed it as Excel to make it clear but it is saved as tab-delimited files)

enter image description here

I'm only interested in the fifth column (the fpkm column). Is it possible to extract that column from every single file and collect them in one .tsv text file? I have tried to copy-paste it but with 156 files, it is a huge waste of time.

Many thanks for your help
Surar

tsv • 744 views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you very much for sharing these links. I just want to combine the columns with no need to track the file so I have tried the suggestion by Andrés Ribone and it worked fine.

ADD REPLY
3
Entering edit mode
9 months ago

Hi,

Do you want every original column as a separate column forming a matrix, correct?

If so, ¿Do all files have the same amount of lines, and in the same order?

For example, ¿do all the third lines correspond to the position chr1:632555-632703?

If so, you can do it in bash like this:

for i in sample_*_TEcounts.tsv; do
cut -f 5 $i > delete.$i
done
paste delete.test* > result.tsv
rm delete.*

If the number or order of lines differ, you will probably need to do it in R or python, let me know and I'll help you with that.

ADD COMMENT
1
Entering edit mode

Thanks dear Andrés, the code worked fine and I got the file now. many thanks for your help.

ADD REPLY
0
Entering edit mode

You can accept this answer to provide closure to this thread (green checkmark).

ADD REPLY

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6