I have a test.bed file that contains the following:
chr12 85430091 85657002 ENSG00000133640.14 . + HAVANA gene . ID=ENSG00000133640.14;gene_id=ENSG00000133640.14;transcript_id=ENSG00000133640.14;gene_type=protein_coding;gene_status=KNOWN;gene_name=LRRIQ1;transcript_type=protein_coding;transcript_status=KNOWN;transcript_name=LRRIQ1;level=1;havana_gene=OTTHUMG00000166185.4
chr12 85657321 85658235 ENSG00000269916.1 . + HAVANA gene . ID=ENSG00000269916.1;gene_id=ENSG00000269916.1;transcript_id=ENSG00000269916.1;gene_type=lincRNA;gene_status=NOVEL;gene_name=RP11-193M21.1;transcript_type=lincRNA;transcript_status=NOVEL;transcript_name=RP11-193M21.1;level=2;havana_gene=OTTHUMG00000184141.1
chr12 85673884 85695562 ENSG00000180318.3 . + HAVANA gene . ID=ENSG00000180318.3;gene_id=ENSG00000180318.3;transcript_id=ENSG00000180318.3;gene_type=protein_coding;gene_status=KNOWN;gene_name=ALX1;transcript_type=protein_coding;transcript_status=KNOWN;transcript_name=ALX1;level=2;havana_gene=OTTHUMG00000169820.1
chr12 85711837 85736690 ENSG00000258815.1 . + HAVANA gene . ID=ENSG00000258815.1;gene_id=ENSG00000258815.1;transcript_id=ENSG00000258815.1;gene_type=lincRNA;gene_status=NOVEL;gene_name=RP11-408B11.2;transcript_type=lincRNA;transcript_status=NOVEL;transcript_name=RP11-408B11.2;level=2;havana_gene=OTTHUMG00000170606.1
I would like to extract gene names from the field "gene_name =" and sort into a new bed file. How would I go about doing that in shell/bash using bedops or not. I have multiple .bed files that I need to extract gene_name and sort into a new file. Thank you very much
What do you want the output to look like? Can you provide a line or two of where you want the name to go etc.?
the new file should contain list of all gene names (from gene_name field), nothing else. Sorry, I just realized while copy pasting it cut off some fields. Here is the complete details for one gene from which i need just the gene name
chr12 85065376 85065805 ENSG00000257296.1 . + HAVANA gene . ID=ENSG00000257296.1;gene_id=ENSG00000257296.1;transcript_id=ENSG00000257296.1;gene_type=pseudogene;gene_status=KNOWN;gene_name=RP11-701B6.1;transcript_type=pseudogene;transcript_status=KNOWN;transcript_name=RP11-701B6.1;level=1;tag=pseudo_consens;havana_gene=OTTHUMG00000169741.1