I have gff files and files containing intergenic regions for different bacterial genomes. I want to extract the adjacent genes and their co-ordinate which corresponding those intergenic regions for all of these genomes. Are there any scripts or tools are available?
Example:
For Bacillus anthracis, I have two files:
- gff file
- files containing coordinates of intergenic regions:
intergenic_region.txt
185457-185562
320958-321064
1146951-1147049
1285399-1285500
3894344-3894451
4075706-4075815
I want to extract the coordinate of adjacent genes for each of those intergenic regions from gff files.
I have to do such kind of process for 300+ bacterial genomes.
It can be done but you'll have to do some pre-processing of the data first. Convert both your GFF and text file into BED format, then use bedtools closest to find the nearest gene and it's coordinate.