why the same gene is located at different chromosomes ?
3
1
Entering edit mode
7.8 years ago
winter_li ▴ 60

HI , I got all human gene region file from UCSC http://genome.ucsc.edu/cgi-bin/hgTables. I find that the same gene is located at different chromosomes , like

   585     NR_106918       **chr1**    -       17368   17436   17436   17436   1       17368,  17436,  0       **MIR6859-1**       unk     unk     -1,

   1367    NR_106918       **chr15**   +       102513726       102513794       102513794       102513794       1       102513726,      102513794,      0       **MIR6859-1**       unk     unk     -1,

the MIR6859-1 gene is at both chr1 and chr15, why ????what happened ???

gene rna-seq next-gen genome • 3.4k views
ADD COMMENT
0
Entering edit mode

Also on chr16:

at chr16:67052-67119 - (NR_106918)
at chr15:102513727-102513794 - (NR_106918)
at chr1:17369-17436 - (NR_106918)

Remarkable

ADD REPLY
0
Entering edit mode

I have exactly the same question, for example, OR4F3 gene, encoding Olfactory receptor 4F3/4F16/4F29 protein. I found if using NCBI hg38.gtf, it only locates in chr5, but if using ucsc.genes.gtf or encode.hg38.gtf, it locates in both chr1 and chr5.

grep -w "OR4F3" UCSC/hg38/Annotation/Genes/genes.gtf
chr1    unknown exon    450740  451678  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P26249"; transcript_id "NM_001005224"; tss_id "TSS18830";
chr1    unknown stop_codon  450740  450742  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P26249"; transcript_id "NM_001005224"; tss_id "TSS18830";
chr1    unknown CDS 450743  451678  .   -   0   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P26249"; transcript_id "NM_001005224"; tss_id "TSS18830";
chr1    unknown start_codon 451676  451678  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P26249"; transcript_id "NM_001005224"; tss_id "TSS18830";
chr1    unknown exon    685716  686654  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P6827"; transcript_id "NM_001005224_1"; tss_id "TSS33796";
chr1    unknown stop_codon  685716  685718  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P6827"; transcript_id "NM_001005224_1"; tss_id "TSS33796";
chr1    unknown CDS 685719  686654  .   -   0   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P6827"; transcript_id "NM_001005224_1"; tss_id "TSS33796";
chr1    unknown start_codon 686652  686654  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P6827"; transcript_id "NM_001005224_1"; tss_id "TSS33796";
chr5    unknown CDS 181367287   181368222   .   +   0   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P25445"; transcript_id "NM_001005224_2"; tss_id "TSS8523";
chr5    unknown exon    181367287   181368225   .   +   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P25445"; transcript_id "NM_001005224_2"; tss_id "TSS8523";
chr5    unknown start_codon 181367287   181367289   .   +   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P25445"; transcript_id "NM_001005224_2"; tss_id "TSS8523";
chr5    unknown stop_codon  181368223   181368225   .   +   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P25445"; transcript_id "NM_001005224_2"; tss_id "TSS8523";
ADD REPLY
0
Entering edit mode

Hello hudiejie and welcome to biostars,

please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

This reply is better suited as a comment on the original question. Answers in biostars are meant only for (full) solutions to the problem of the OP. This is why I moved your answer to a comment.

Thank you!

fin swimmer

ADD REPLY
3
Entering edit mode
7.8 years ago
Denise CS ★ 5.2k

Entries such as NRs are not genes (loci). They are RNA sequence for a non-coding locus. If I search for MIR6859-1 in UCSC I get one entry only under known genes.

ADD COMMENT
2
Entering edit mode
7.8 years ago
Satyajeet Khare ★ 1.6k

It looks like the sequence is perfect match (just tried BLAT search) on both Chr1, Chr15, Chr16. But the gene IDs are different [Mir6859-1 (Chr1), -2 (Chr1), -3 (Chr15) and -4 (Chr16)] on Entrez. NR IDs also appear to be different on NCBI (NR_106918, NR_107062, NR_107063, NR_128720). If you are getting only one NR ID, It could an annotation issue.

ADD COMMENT
2
Entering edit mode
5.8 years ago
tdmurphy ▴ 190

UCSC has two types of RefSeq tracks. The old "RefSeq Genes" or refgene track is based on alignments generated by UCSC, and can't distinguish between different locations with the same sequence. The newer "NCBI RefSeq" tracks are based on annotation imported from NCBI's RefSeq project, which uses additional information to distinguish ambiguous locations, as well as some other differences and including additional features and genes not available in the refgene track. UCSC posted a blog about it: http://genome.ucsc.edu/blog/the-new-ncbi-refseq-tracks-and-you/

For the microRNAs, the four identical locations are assigned separate identifiers by miRBase, HGNC, and NCBI Gene, and each location has a separate RefSeq NR transcript. The same is true for some protein-coding genes, such as CALM1, CALM2, and CALM3.

ADD COMMENT
0
Entering edit mode

Thank you for your answer! So it is better to use ucbi.hg38.gtf to avoid some ambiguous locations for the same gene. How about miRNAs, I am also interested which database should I use (I guess I should use the NCBI one as well if using ncbi.hg38.gtf)?

ADD REPLY

Login before adding your answer.

Traffic: 2656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6