I would like to extract a row from a excel sheet based on a key word (string). So I have used the following command in R
x = data.frame(read.csv("t3.csv"))
y = grep("avrbs2|avrbs3|xop", x$Gene_Name.ID)
z = x[y, ]
It extract the entire row based on the key word. However I need to skip the redundancy of the keyword. For example the avrbs2 and avrbs3 repeated over 50 to 60 times in the excel/csv sheet, from that I need to extract the same without redundancy.
for example, I have a data like this in my excel sheet
avrbs3_AAM39226_1_avirulence_protein_plasmid_Xanthomonas_citri_pv_citri_str_306_avrbs3 198 100 66 0 0 822 887 2 199 2.45E-37 131
avrbs3_AAM39226_1_avirulence_protein_plasmid_Xanthomonas_citri_pv_citri_str_306_avrbs3 250 100 21 0 0 867 887 2 64 3.44E-05 45.4
avrbs3_AAM39226_1_avirulence_protein_plasmid_Xanthomonas_citri_pv_citri_str_306_avrbs3 113 100 15 0 0 1 15 46 2 0.005 34.7
avrbs3_AAM39243_1_avirulence_protein_plasmid_Xanthomonas_citri_pv_citri_str_306_avrbs3 265 100 260 0 0 837 1096 2 781 8.91E-158 467
avrbs3_AAM39243_1_avirulence_protein_plasmid_Xanthomonas_citri_pv_citri_str_306_avrbs3 198 100 66 0 0 792 857 2 199 2.38E-37 131
avrbs3_AAM39243_1_avirulence_protein_plasmid_Xanthomonas_citri_pv_citri_str_306_avrbs3 250 100 21 0 0 837 857 2 64 3.42E-05 45.4
avrbs3_AAM39243_1_avirulence_protein_plasmid_Xanthomonas_citri_pv_citri_str_306_avrbs3 113 100 15 0 0 1 15 46 2 0.004 35
avrbs3_AAM39261_1_avirulence_protein_plasmid_Xanthomonas_citri_pv_citri_str_306_avrbs3 198 100 66 0 0 792 857 2 199 2.67E-37 131
From this, I need to extract the first row which is having the key word avrbs3, like wise I have to extract other key word containg rows aswell. Therefore, Please help me to do the same in R or shell script etc.,