Question

Genomic Coordinate Annotation

0

Entering edit mode

5.0 years ago

ari_sh70 • 0

Hi everyone!

I have a question that I would really appreciate it if you help me out, please!

I have a list of data that I would like to know how to find out the gene's names. The dataset which is a text file has the following format:

Segment               Count                           First                    End
 0                     258                           1_1960674            1_2013259
 1                     85                            1_3057480            1_3257840
 2                     185                           1_3340901            1_3783903
 3                     215                           1_209363247          1_209995470

In this dataset, the first column is the number of segment, the second column is representing the number of SNPs per segment, and the third and fourth columns are representing the smallest and largest SNPs number for each segment. I should note that the values of the third and fourth columns are the combination of the chromosome number and its position Now, how can I understand the gene names?

Thank you very much

SNP genome sequencing gene • 1.6k views

ADD COMMENT • link updated 4.9 years ago by lieven.sterck 15k • written 5.0 years ago by ari_sh70 • 0

0

Entering edit mode

Hi ari_sh70 , I've changed the 'tag' of your post to Question as the 'Tutorial' one is reserved for tutorials where people explain or showcase the use of a tool or pipeline.

ADD REPLY • link 5.0 years ago by lieven.sterck 15k

0

Entering edit mode

dear ari_sh70

there's a few shortcomings to your question:

Your question is about gene names and/or annotation, but you don't show any
administrative: use the code button for pasting a few lines of your dataset

ADD REPLY • link 5.0 years ago by Carambakaracho ★ 3.2k

0

Entering edit mode

Ok sure, thanks for the tips

ADD REPLY • link 5.0 years ago by ari_sh70 • 0

0

Entering edit mode

My apologies, I just realized you do show a few "lines" within that one line, but it's really hard to read...

ADD REPLY • link 5.0 years ago by Carambakaracho ★ 3.2k

0

Entering edit mode

You are totally right! This is my first time writing a post here. Thanks again for telling me how to do that!

ADD REPLY • link 5.0 years ago by ari_sh70 • 0

0

Entering edit mode

Now, how can I understand the gene names?

what exactly do you mean by that? are you looking to find the genes that are located in that region?

ADD REPLY • link 5.0 years ago by lieven.sterck 15k

0

Entering edit mode

Yes, exactly I am looking for that...

ADD REPLY • link 5.0 years ago by ari_sh70 • 0

0

Entering edit mode

BEDtools (more specifically bed-intersect ) will be your friend.

With a little reformat of those columns and given you have a gff (or bed) file of the annotation, this should be pretty straightforward

ADD REPLY • link 5.0 years ago by lieven.sterck 15k

0

Entering edit mode

Thank you very much for your answer, can you tell me a bit more information about it, please?

ADD REPLY • link 5.0 years ago by ari_sh70 • 0

0

Entering edit mode

sure, can you however first confirm that you have an annotation of the genes in gff or bed format

ADD REPLY • link 5.0 years ago by lieven.sterck 15k

0

Entering edit mode

Thank you. Firstly, I would like to apologize for my delay respond because I am a new user and the system did not let me to reply anymore yesterday. I want to do the annotation for the SNPs based on their location as I brought the data in my post

ADD REPLY • link 4.9 years ago by ari_sh70 • 0

0

Entering edit mode

No worries.

see my answer below

ADD REPLY • link 4.9 years ago by lieven.sterck 15k

score 0 · Answer 1 · 2019-05-07

0

Entering edit mode

4.9 years ago

lieven.sterck 15k

OK, first you will need to transform the list you have in bed format, eg by using the following linux cmdline:

sed 's/_/\t/g' <your file> | awk '{ print $3,$4,$6}' | sed '1d' > your_file.bed

then you use bedtools intersect and provide the file created above and the bed (or gff) file of your annotation. Depending on your settings this will give you the list of genes that overlap with your snp interval regions

Word of caution : it's critical you use the same sequence name IDs in both files so perhaps you will need to modify them slightly so they correspond to each other

ADD COMMENT • link 4.9 years ago by lieven.sterck 15k

0

Entering edit mode

Thank you very much for your answer. My problem here is the annotation. I don't have any annotation ....

ADD REPLY • link 4.9 years ago by ari_sh70 • 0

0

Entering edit mode

Well... this is when things get more complicated. This is why lieven.sterck asked you specifically for annotation.

Based on the limited information you disclosed, I see the following options:

Find annotation in a public database
Get annotation from a colleague
Annotate yourself

ADD REPLY • link 4.9 years ago by Carambakaracho ★ 3.2k

0

Entering edit mode

Thank you for your reply, Can you please tell me what kind of information are you looking for? If you read my previous comments precisely I emphasised that I am looking for annotation and the gene's name. That's what I am here and I put this post to know that otherwise I'm not looking for wasting my time here....

ADD REPLY • link 4.9 years ago by ari_sh70 • 0

1

Entering edit mode

Let's take a step backward: which organism are you working on? We need to know what is available for this organism. If nobody has generated annotation then there is not much we can do. Annotation would tell us which genes are where.

That's what I am here and I put this post to know that otherwise I'm not looking for wasting my time here....

Excuse me, wasting your time? Right now you are using the time of a bunch of volunteers who are trying to help you. If you don't feel like wasting your time, then you don't have to post here and are free to solve your problems on your own. Please be as complete as possible when asking questions.

ADD REPLY • link 4.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Well, I've read you all your posts precisely but this was not very clear at all. Moreover, you have 'annotation' and 'annotation' it can be interpreted in a very broad sense.

So basically your question is how to annotate a genome (in the first place)? If so, we would need much more info on your data at hand to advise you. What data do you have, what kind of genome, ...

ADD REPLY • link 4.9 years ago by lieven.sterck 15k

0

Entering edit mode

I acknowledge your politeness (and I believe so does everybody, that replied)

When you don't have gene model annotation and you don't disclose the organism you're working on how do you suggest anyone can help you?

I processed this morning 48 Propionibacterium freudenreichii genomes, annotated yesterday a dozen Acinetobacter baumannii genomes and will continue to work on a data analysis pipline based on the Cricetulus griseus genome this afternoon. It's unlikely you're interested in any of these, and I can not guess your genome

ADD REPLY • link 4.9 years ago by Carambakaracho ★ 3.2k

0

Entering edit mode

exactly!

Perhaps (but not really the advised way ) to reduce your workload a little you could focus your annotation (in case you will have to do it yourself) to the regions you are really interested in.

What kind of setting are we talking about here? small genome <-> large genome?

ADD REPLY • link 4.9 years ago by lieven.sterck 15k