Biostar Beta. Not for public use.
How to extract filename and change text in the same file
1
Entering edit mode
14 months ago

Hello,

I have about 30 VCF files with file names as ID_001.new.vcf. I want to extract only the "ID_001" part from the file name and change it in the header line of the VCF file where "Sample1" is given.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1

So that the result looks like that:

 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ID_001

How can I do it ? I tried to use echo in bash and extract the IDs from the Filename but I am unable to iterate it to change inside the file. Thanks for your help.

ADD COMMENTlink
0
Entering edit mode
  1. Extract sample names from VCF using bcftools (query -l)
  2. Prepare a new file with sample names (new names) one per line in the order of sample names from point 1
  3. Use bcftools reheader option to change the sample names from point 2.

Take a back up of original file before proceeding.

ADD REPLYlink
2
Entering edit mode
14 months ago
Jeffin Rockey ♦ 1.1k
Karimannoor

In bash this should do.

for i in *.new.vcf
do
        ID_NAME=$(basename "$i" .new.vcf)
        sed -i "1s|Sample1|$ID_NAME|g" $i
done

Caution: I have used -i with sed. So the actual files will get edited in place.

Now added 1s also as to limit the replacement to first line alone.

ADD COMMENTlink
2
Entering edit mode

I think would be better to use 'bcftools view --samples-file` than sed

ADD REPLYlink
0
Entering edit mode

Hi Pierre, I did not understand. Would bcftools view do any replacement ?

ADD REPLYlink
0
Entering edit mode

the option sample-file can be used to rename the samples. https://samtools.github.io/bcftools/bcftools.html

This file can also be used to rename samples by giving the new sample name as a second white-space-separated column, like this: "old_name new_name".

ADD REPLYlink
0
Entering edit mode

This works when all files have Sample1 in the file name. Will that be the case?

ADD REPLYlink
0
Entering edit mode

Yes all files have Sample1

ADD REPLYlink
0
Entering edit mode

@Jeffin , Thanks for your response. This line is not the first line within the file. How can I change sed in a way that it find the particular line where Sample1 is there and then change it to $ID_NAME ?

ADD REPLYlink
1
Entering edit mode

Changing 1s| to simply s| will do replacements for all Sample1 occurrences.

ADD REPLYlink
0
Entering edit mode

Thanks a lot. This worked !

ADD REPLYlink
3
Entering edit mode
15 months ago
Malcolm.Cook ♦ 1.0k
kansas, usa

If you have GNU parallel installed, you can use it instead of a bash for loop:

parallel 'sed -i "s|Sample1$|{=s/.new.vcf$//=}|"' {} ::: *.new.vcf
ADD COMMENTlink
0
Entering edit mode

Hi Malcom, The suggested command appears to be super efficient, even though I did not understand many of the usages. Can you please explain the {=s/.new.vc$f//=}, {}, ::: etc

ADD REPLYlink
1
Entering edit mode

Sure.

In general, in your command line:

  • {} gets replaced with the file being processed.
  • {=perl expression=} gets replaced with the value of a perl expression being evaluated in the context of the perl variable $_ being set to the name of the file being processed.

So, in my example, we are using sed to replace the word "Sample1' appearing at the end of line with the result of removing the trailing .new.vcf from each filename.

Documentation for this can be found in parallel's manpage by searching for "{=perl expression=}", and where you can also read

::: arguments
Use arguments from the command line as input source instead of stdin (standard input). 
ADD REPLYlink
0
Entering edit mode

Fix: vc$f -> vcf\$.

Also try: parallel --plus 'sed -i "s|Sample1$|{%.new.vcf}|"' {} ::: *.new.vcf

ADD REPLYlink
0
Entering edit mode

Hi Ole,

Could you please point me to some link or so which would help me understand the {},::: etc.

ADD REPLYlink
0
Entering edit mode

It is covered in GNU Parallel 2018 chapter 5 (Online https://doi.org/10.5281/zenodo.1146014, printed www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html)

ADD REPLYlink
0
Entering edit mode

thanks for the fix and the alternate!

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3