rename fasta headers in FastaAlternateReferenceMaker output
1
0
Entering edit mode
7.7 years ago
mosquitoes • 0

Hi,

I would like to create a new fasta file from the original genome fasta and a vcf file. The fasta file will only have full gene sequences included.

I can use the gatk FastaAlternateReferenceMaker to accomplish this:

java -jar -Xmx16g ~/bin/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R ref_genome.fasta -o sample_SNV.fasta -V sample_SNV_selected.vcf -L ref_gene.bed

But I would like the output fasta to have the gene names as the header. For instance the current fasta output from gatk is:

 >1 chr01:2350
AGAAAGGACAGAAAAAAAGATGGTGAAGTAGAAAGAGGGCGAAATGAAAAAAGGGAAAGC
AAAAGAGATGATGAAAGTCATAGAGAGAGAGATGAAAAAAGGGAAAGCAAAAGAGATGAT

I would like the output to 1) not have a sequential numerical output and 2) to contain the gene name from column 4 of the .bed file.

Is there a way to either modify 1) the input bed file or 2) the output fasta file by giving 'some tool' the fasta and the bed file?

Thanks!

bed fasta gatk fastaalternatereferencemaker header • 3.3k views
ADD COMMENT
1
Entering edit mode

There are many threads related to renaming fasta file headers on biostars. Here are a couple but search for others
renaming all fasta headers in a file
replace fasta headers with another name in a text file

ADD REPLY
0
Entering edit mode
7.7 years ago
mosquitoes • 0

Thanks!

I used this python script and it worked great:

fasta= open('file.fasta')
newnames= open('list.txt')
newfasta= open('file_annot.fasta', 'w')

for line in fasta:
    if line.startswith('>'):
        newname= newnames.readline()
        newfasta.write(newname)
    else:
        newfasta.write(line)

fasta.close()
newnames.close()
newfasta.close()
ADD COMMENT

Login before adding your answer.

Traffic: 1430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6