extracting some part of a gtf file
2
1
Entering edit mode
8.5 years ago
zizigolu ★ 4.3k

I have gtf file containing 7 columns from which i am going extract only gene name part from column 7th.

The 7th column contains some information including gene_id, gene_name and so on that I posted one row of column 7th below:

gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "NAC001"; oId "AT1G01010.1"; nearest_ref "AT1G01010.1"; class_code "="; tss_id

I need only gene_name from this column, for example "NAC001", how can I extract which I need please?

gtf bed • 6.7k views
ADD COMMENT
5
Entering edit mode
8.5 years ago
GenoMax 142k

Sorry I missed the part that you only posted column 7 from your file. Either use @dschika's solution below or cut the 7th column and then use the one liner below.

$ awk -F ";" '{sub(/gene_name/,""); print $4}' your_file

If you don't need the quotes around the gene name

$ awk -F ";" '{sub(/gene_name/,""); print $4}' your_file | sed 's/"//g'
ADD COMMENT
1
Entering edit mode

thank you

your code extracted what I need in cmd, but how I can write the result in a txt file?

ADD REPLY
1
Entering edit mode

Redirect the output to a new file with >. See my note above.

ADD REPLY
1
Entering edit mode

thank you very much for your quick and worth help

ADD REPLY
4
Entering edit mode
8.5 years ago
dschika ▴ 320

Assuming the fields in your gtf are tab separated you could try something like this:

awk 'BEGIN{FS="\t"}{print $7}' YOURFILE | awk 'BEGIN{FS="gene_name"}{print $2}' | awk 'BEGIN{FS=";"}{print $1}' > OUTFILE

genomax2 solution with sub is nicer.

ADD COMMENT

Login before adding your answer.

Traffic: 1478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6